CN107544968B - Method and device for determining website availability - Google Patents

Method and device for determining website availability Download PDF

Info

Publication number
CN107544968B
CN107544968B CN201610466058.4A CN201610466058A CN107544968B CN 107544968 B CN107544968 B CN 107544968B CN 201610466058 A CN201610466058 A CN 201610466058A CN 107544968 B CN107544968 B CN 107544968B
Authority
CN
China
Prior art keywords
target website
website
availability
determining
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610466058.4A
Other languages
Chinese (zh)
Other versions
CN107544968A (en
Inventor
王春侠
李新国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201610466058.4A priority Critical patent/CN107544968B/en
Publication of CN107544968A publication Critical patent/CN107544968A/en
Application granted granted Critical
Publication of CN107544968B publication Critical patent/CN107544968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a device for determining website availability, wherein the method comprises the following steps: scanning the webpage links on each page of the target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website; the ratio of the number of web page links that can be normally accessed to the total number of web page links is used as the first usability index. Acquiring user access history data of a target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when accessed; and taking the ratio of the difference between the total number of the accessed pages and the number of the pages with problems when accessed to the total number of the accessed pages as a second usability index. And determining the website availability of the target website according to the first availability index and the second availability index. The invention can provide a website availability index for the user and provide effective reference for the user to visit the website.

Description

Method and device for determining website availability
Technical Field
The invention relates to the field of data processing, in particular to a method and a device for determining website availability.
Background
With the development of information technology, various websites appear like bamboo shoots in the spring after rain. However, for various reasons, a user may encounter website unavailability during the process of accessing the website, which affects the user's experience to some extent.
At present, an effective website availability calculation method is lacked, and effective reference indexes cannot be provided for a user to access websites.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for determining website availability, which can provide a website availability index for a user and provide an effective reference for the user to access a website.
The invention provides a method for determining website availability, which comprises the following steps:
scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website;
taking the ratio of the number of the normally visited webpage links in the target website to the total number of the webpage links in the target website as a first availability index of the target website;
acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data;
taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second availability index of the target website;
and determining the website availability of the target website according to the first availability index and the second availability index of the target website.
Preferably, the method further comprises:
acquiring high-quality access probabilities of various browsers for accessing the target website and acquiring preset high-quality access probability threshold values of the various browsers from the user access historical data; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
determining the browser with the high-quality access probability higher than a preset high-quality access probability threshold value as a browser compatible with the target website, calculating the compatibility probability of the target website to the browser, and taking the compatibility probability of the target website to the browser as a third availability index of the target website;
correspondingly, the determining the website availability of the target website according to the first availability index and the second availability index of the target website specifically includes:
and determining the website availability of the target website according to the first availability index, the second availability index and the third availability index of the target website.
Preferably, the method further comprises:
acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access history data, and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining the website availability probability of the target website according to the first availability index and the second availability index of the target website specifically includes:
determining website availability of the target website according to the first availability index, the second availability index and the fourth availability index of the target website;
or determining the website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
Preferably, the determining the website availability of the target website according to the first availability index and the second availability index of the target website includes:
respectively setting weight values for the first availability index and the second availability index of the target website;
and determining the website availability of the target website according to the weighted values of the first availability index and the second availability index.
Preferably, the scanning, by using the crawler system, the web page links on each page of the target website to obtain the number of web page links that can be normally accessed in the target website and the total number of web page links in the target website includes:
scanning the webpage links on each page of the target website by using a crawler system;
when the webpage link on any page of the target website is scanned, if the status code of the received returned Http request is 200, determining the webpage link as a webpage link capable of being normally accessed;
and counting the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
Preferably, the acquiring the user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data includes:
acquiring user access historical data of the target website, and acquiring a page with the refreshing time meeting a preset threshold value in the same session from the user access historical data as a page with a problem when being accessed;
and counting the total number of the pages accessed by the target website and the number of the pages with problems when the target website is accessed from the user access history data.
The invention also provides a device for determining the availability of the website, which comprises the following components:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally visited in the target website and the total number of the webpage links in the target website;
the first calculation module is used for taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website;
the second acquisition module is used for acquiring user access history data of the target website and acquiring the total page number of the accessed target website and the page number of the problem when the target website is accessed from the user access history data;
the second calculation module is used for taking the ratio of the difference between the total page number of the accessed target websites and the page number with problems when the target websites are accessed to the total page number of the accessed target websites as a second usability index of the target websites;
and the determining module is used for determining the website availability of the target website according to the first availability index and the second availability index of the target website.
Preferably, the apparatus further comprises:
the third acquisition module is used for acquiring the high-quality access probability of various browsers for accessing the target website and acquiring the preset high-quality access probability threshold of various browsers; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
the third calculation module is used for determining the browser with the high-quality access probability higher than a preset high-quality access probability threshold value as the browser compatible with the target website, calculating the compatibility probability of the target website to the browser, and taking the compatibility probability of the target website to the browser as a third availability index of the target website;
correspondingly, the determining module is specifically configured to:
and determining the website availability of the target website according to the first availability index, the second availability index and the third availability index of the target website.
Preferably, the apparatus further comprises:
the fourth acquisition module is used for acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access historical data and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
the fourth calculation module is used for determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining module is specifically configured to:
determining website availability of the target website according to the first availability index, the second availability index and the fourth availability index of the target website;
or determining the website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
Preferably, the determining module includes:
the setting sub-module is used for respectively setting weighted values for the first availability index and the second availability index of the target website;
the first determining submodule is used for determining website availability of the target website according to the weighted values of the first availability index and the second availability index.
Preferably, the first obtaining module includes:
the scanning sub-module is used for scanning the webpage links on each page of the target website by using a crawler system;
the second determining submodule is used for determining the webpage link as a webpage link which can be normally accessed if the status code of the received returned Http request is 200 when the webpage link on any page of the target website is scanned;
the first statistic submodule is used for counting the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
Preferably, the second obtaining module includes:
the acquisition submodule is used for acquiring user access historical data of the target website, and acquiring a page with the refresh times meeting a preset threshold value in the same session from the user access historical data as a page with a problem when being accessed;
and the second statistic submodule is used for counting the total number of the pages accessed by the target website and the number of the pages with problems when the target website is accessed from the user access historical data.
According to the technical scheme, in the method for determining the website availability, firstly, a crawler system is utilized to scan the webpage links on each page of a target website, and the number of the webpage links which can be normally visited in the target website and the total number of the webpage links in the target website are obtained; and taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website. Secondly, acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data; and taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second usability index of the target website. And finally, determining the website availability of the target website according to the first availability index and the second availability index of the target website. The invention can provide a website availability index for the user and provide effective reference for the user to visit the website.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating a method for determining website availability according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating another method for determining website availability provided by embodiments of the present invention;
FIG. 3 is a schematic structural diagram illustrating an apparatus for determining website availability according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating another apparatus for determining website availability according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following description will be made of specific contents of examples.
An embodiment of the present invention provides a method for determining website availability, and referring to fig. 1, is a flowchart of a method for determining website availability provided by the present invention. The method for determining the website availability specifically comprises the following steps:
s101: and scanning the webpage links on each page of the target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
The crawler system is a program or script for automatically capturing world wide web information according to a certain rule. When the embodiment of the invention is used for calculating the website availability of the target website, the crawler system can be utilized to scan the webpage links on each page of the target website. Specifically, when the crawler system scans a web page link on a certain page of the target website, if the status code of the Http request returned by the received web page corresponding to the web page link is 200, it indicates that the web page link can be normally accessed, otherwise, it indicates that the web page link cannot be normally accessed. The crawler system obtains the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website by scanning the webpage links on each page in the target website.
S102: and taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website.
Since whether the web page links on the respective pages in the website can be normally accessed by the user affects the usability of the website, the probability that the web page links on the pages in the website can be normally accessed can be used as an index when calculating the usability of the website. In the embodiment of the invention, the ratio of the number of the normally visited web links in the target website acquired by using the crawler system to the total number of the web links in the target website is used as the first availability index for calculating the availability of the target website.
S103: and acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data.
In actual operation, a Tracker, which is a JS script for collecting user access history data of the target website, may be deployed in advance on the server of the target website. The user access history data is data which is stored in the server and is generated when the user accesses the target website. After the Tracker acquires the user access history data of the target website, the total page number of the target website accessed, namely the total number of pages accessed by the user in the acquired user access history data, is acquired through analysis of the user access history data. In addition, the number of pages with problems when the target website is accessed is also acquired, for example, a page with refresh times meeting a preset threshold (for example, 5 times) in the same session is acquired as a page with problems when accessed, and the number of pages meeting the above conditions is counted.
S104: and taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second usability index of the target website.
Since whether each page in the website can be normally accessed by the user also affects the availability of the website, the probability that the page in the website can be normally accessed can be used as another index when calculating the availability of the website.
In the embodiment of the invention, the difference between the total page number of the accessed target website and the page number with problems when the target website is accessed is calculated; and secondly, taking the value obtained by taking the ratio of the calculated difference value to the total number of the pages visited by the target website as a second availability index of the target website.
S105: and determining the website availability of the target website according to the first availability index and the second availability index of the target website.
Since the first availability index and the second availability index obtained by calculation in the embodiment of the present invention are both used for calculating the website availability of the target website, the website availability of the target website can be determined according to the first availability index and the second availability index.
In one implementation, the first availability indicator and the second availability indicator are directly averaged, i.e., website availability is (first availability indicator + second availability indicator)/2.
In another implementation, weighted values are set for the first availability indicator and the second availability indicator, such as 30% and 70%, respectively, according to the actual application requirements, and then the website availability of the target website is calculated in combination with the weighted values, that is, the website availability is the first availability indicator 30% + the second availability indicator 70%.
In practical applications, the means for calculating the website availability are not limited to the above two, and are not exhaustive here. In addition, S101 to S105 are not intended to limit the execution order of the steps in the embodiment.
In the method for determining the website availability provided by the embodiment of the invention, firstly, a crawler system is utilized to scan the webpage links on each page of a target website, and the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website are obtained; and taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website. Secondly, acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data; and taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second usability index of the target website. And finally, determining the website availability of the target website according to the first availability index and the second availability index of the target website. The embodiment of the invention can provide a website availability index for the user and provide effective reference for the user to visit the website.
The embodiment of the present invention further provides a method for determining website availability, and with reference to fig. 2, is a flowchart of another method for determining website availability according to the embodiment of the present invention. The method for determining the website availability specifically comprises the following steps:
s201: and scanning the webpage links on each page of the target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
S202: and taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website.
S203: and acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data.
S204: and taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second usability index of the target website.
For an understanding of S201-S204, reference is made to the above explanations of S101-S104.
S205: acquiring high-quality access probabilities of various browsers for accessing the target website and preset high-quality access probability threshold values of the various browsers from the user access historical data; the high-quality access comprises access with access residence time and browsing page number respectively higher than a preset threshold value.
In practical application, from the obtained user access history data, the probability of high-quality access when the user accesses a target website by using various types of browsers is analyzed, for example, the Chrome is 5% and the IE is 32%. The high-quality access means that the access retention time in one session is higher than a certain threshold, and the number of pages browsed in the session is also higher than a certain threshold.
The embodiment of the invention provides a method for determining high-quality access, which comprises the following steps of firstly counting the average access residence time and the average browsing page number of all sessions in a period (in a certain quarter); and secondly, determining the visit with the visit dwell time higher than the average visit dwell time and the number of the browsed pages higher than the average browsed page number as a high-quality visit.
Specifically, the high-quality access probability of any browser for accessing the target website is a ratio of the number of people who access the target website with the browser in high quality to the total number of people who access the target website with the browser in the user access history data.
In addition, the preset high-quality access probability threshold for various browsers may be any value determined with reference to the overall access history data for the system-wide web site. In a preferred embodiment, the preset high quality access probability threshold of any browser may be an average high quality access probability of that browser. Specifically, the average high-quality access probability of the browser is a ratio of the number of people who access the website with the high quality of the browser in the historical data of the user accessing various websites to the total number of people who access the website with the browser.
S206: determining the browser with the high-quality access probability higher than a preset high-quality access probability threshold value as a browser compatible with the target website, calculating the compatibility probability of the target website to the browser, and taking the compatibility probability of the target website to the browser as a third availability index of the target website.
When a user accesses a website, various types of browsers may be used to access the website, and because the compatibility of the website to different browsers is different, for example, some websites may have strong compatibility with IE kernel browsers such as IE, 360, dog search, and the like, and have weak compatibility with non-IE kernel browsers such as FireFox, Opera, Chrome, and the like. Therefore, the embodiment of the invention takes the compatibility probability of the website to the browser as an index for calculating the usability of the website.
In one embodiment, the average high-quality access probability of various browsers, such as Chrome 25% and IE 30%, is counted in advance in website access history data of the whole network. Then, high access quality probabilities of various browsers in the access history data of the target website are counted, such as Chrome 5% and IE 32%. And comparing the high access quality probabilities of various browsers in the target website with the average high access quality probability, and determining the browser with the high access quality probability higher than the average high access quality probability as the browser compatible with the target website. E.g., an IE browser, is a browser compatible with the target web site. Thirdly, counting the number of the browsers compatible with the target website and the total number of the browsers accessing the target website, and calculating the probability of the number of the browsers compatible with the target website accounting for the total number to be used as the compatibility probability of the target website to the browsers.
S207: acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access history data, and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises access with access residence time and browsing page number respectively higher than a preset threshold value.
The high-quality access probability of the target website accessed by the screen with any resolution is the ratio of the number of people who access the target website with the screen with the resolution in the user access history data and the total number of people who access the target website with the screen with the resolution.
In addition, the preset high-quality access probability threshold for screens of various resolutions may be any value determined with reference to the overall access history data of the system website. In a preferred embodiment, the preset high quality access probability threshold for a screen of any one resolution may be an average high quality access probability for the screen of that resolution. Specifically, the average high-quality access probability of the screen with the resolution is a ratio of the number of people who access the website with the screen with the resolution in high quality to the total number of people who access the website with the screen with the resolution in the historical data of the user accessing various websites.
S208: determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website.
When a user accesses a website, the user may access the website by using a terminal with screens of various resolutions, and because the compatibility of the website with screens of different resolutions is different, the embodiment of the present invention uses the compatibility probability of the website with the screen resolution as an index for calculating the usability of the website.
In one embodiment, the average high-quality access probability of various resolution screens is counted in advance in website access history data of the whole network. Secondly, analyzing high-quality access probability of the user when the user accesses the target website by using screens with various resolutions from the acquired user access history data. Then, the high access quality probabilities of the various resolution screens in the target website are compared with the average high access quality probability, and the resolution screen with the high access quality probability higher than the average high access quality probability is determined as the screen compatible with the target website. And finally, counting the number of the resolution screens compatible with the target website and the total number of the screen resolutions used for accessing the target website, and calculating the ratio of the number of the screen resolutions compatible with the target website to the total number as the compatibility probability of the target website to the screen resolutions.
S209: determining website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
In one implementation, the website availability of the destination website is determined using the first, second, third, and fourth availability indices of the destination website. Specifically, the usability of the website can be calculated by using a weighted value; the website availability may also be calculated directly by averaging, or otherwise, and will not be further described herein.
In another implementation, the website availability of the target website may also be determined directly by using the first availability index, the second availability index, and the fourth availability index of the target website. Specifically, the usability of the website can be calculated by using a weighted value; the website availability may also be calculated directly by averaging, or otherwise, and will not be further described herein.
In another implementation, the website availability of the target website may also be determined directly by using the first availability index, the second availability index, and the third availability index of the target website. Specifically, the usability of the website can be calculated by using a weighted value; the website availability may also be calculated directly by averaging, or otherwise, and will not be further described herein.
In the method for determining website availability provided by the embodiment of the invention, the website availability of the target website is calculated by acquiring the first availability index, the second availability index, the third availability index and the fourth availability index of the target website. Because the factors influencing the website availability are considered relatively comprehensively by the method for calculating the website availability, the embodiment of the invention can provide a more accurate method for calculating the website availability for the user and provide a more reliable reference for the user to visit the website.
An embodiment of the present invention further provides a device for determining website availability, and referring to fig. 3, the device for determining website availability according to the embodiment of the present invention is schematically illustrated in a structural diagram, and the device includes:
a first obtaining module 310, configured to scan, by using a crawler system, web page links on each page of a target website, and obtain the number of web page links that can be normally accessed in the target website and the total number of web page links in the target website;
a first calculating module 320, configured to use a ratio of the number of web page links in the target website that can be normally accessed to the total number of web page links in the target website as a first availability index of the target website;
a second obtaining module 330, configured to collect user access history data of the target website, and obtain, from the user access history data, a total number of pages visited by the target website and a number of pages with problems when the target website is visited;
a second calculating module 340, configured to use a ratio of a difference between a total number of pages visited by the target website and a number of pages with problems when visited to the total number of pages visited by the target website as a second usability index of the target website;
a determining module 350, configured to determine website availability of the target website according to the first availability index and the second availability index of the target website.
In order to fully consider the factors affecting the website availability, an embodiment of the present invention further provides an apparatus for determining website availability, and referring to fig. 4, a schematic structural diagram of another apparatus for determining website availability provided in an embodiment of the present invention, where the apparatus may further include, in addition to the modules in fig. 3:
a third obtaining module 410, configured to obtain high-quality access probabilities that various browsers access the target website, and obtain preset high-quality access probability thresholds of the various browsers; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
a third calculating module 420, configured to determine, as a browser compatible with the target website, a browser with a high quality access probability higher than a preset high quality access probability threshold, calculate a compatibility probability of the target website for the browser, and use the compatibility probability of the target website for the browser as a third availability index of the target website;
correspondingly, the determining module is specifically configured to:
and determining the website availability of the target website according to the first availability index, the second availability index and the third availability index of the target website.
A fourth obtaining module 430, configured to obtain, from the user access history data, high-quality access probabilities for accessing the target website using screens of various resolutions, and obtain preset high-quality access probability thresholds for the screens of various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
a fourth calculating module 440, configured to determine a screen with a resolution where the high-quality access probability is higher than a preset high-quality access probability threshold as a screen compatible with the target website, calculate a compatibility probability of the target website to the screen resolution, and use the compatibility probability of the target website to the screen resolution as a fourth availability index of the target website;
correspondingly, the determining module 350 is specifically configured to:
determining website availability of the target website according to the first availability index, the second availability index and the fourth availability index of the target website;
or determining the website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
In practical applications, the determining module may include:
the setting sub-module is used for respectively setting weighted values for the first availability index and the second availability index of the target website;
the first determining submodule is used for determining website availability of the target website according to the weighted values of the first availability index and the second availability index.
The first obtaining module may include:
the scanning sub-module is used for scanning the webpage links on each page of the target website by using a crawler system;
the second determining submodule is used for determining the webpage link as a webpage link which can be normally accessed if the status code of the received returned Http request is 200 when the webpage link on any page of the target website is scanned;
the first statistic submodule is used for counting the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
The second obtaining module may include:
the acquisition submodule is used for acquiring user access historical data of the target website, and acquiring a page with the refresh times meeting a preset threshold value in the same session from the user access historical data as a page with a problem when being accessed;
and the second statistic submodule is used for counting the total number of the pages accessed by the target website and the number of the pages with problems when the target website is accessed from the user access historical data.
The device for determining the availability of the website comprises a processor and a memory, wherein the first acquiring module, the first calculating module, the second acquiring module, the second calculating module, the determining module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, a website availability index is provided for a user by adjusting kernel parameters, and an effective reference is provided for the user to visit the website.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The device for determining the website availability provided by the embodiment of the invention can realize the following functions: scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website; and taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website. Acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data; and taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second usability index of the target website. And determining the website availability of the target website according to the first availability index and the second availability index of the target website. The embodiment of the invention can provide a website availability index for the user and provide effective reference for the user to visit the website.
The present application further provides a computer program product adapted to perform program code for initializing the following method steps when executed on a data processing device:
scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website;
taking the ratio of the number of the normally visited webpage links in the target website to the total number of the webpage links in the target website as a first availability index of the target website;
acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data;
taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second availability index of the target website;
and determining the website availability of the target website according to the first availability index and the second availability index of the target website.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (12)

1. A method for determining website availability, the method comprising:
scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website;
taking the ratio of the number of the normally visited webpage links in the target website to the total number of the webpage links in the target website as a first availability index of the target website;
acquiring user access history data of the target website, and acquiring the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data;
taking the ratio of the difference between the total page number accessed by the target website and the page number with problems when accessed to the total page number accessed by the target website as a second availability index of the target website;
and determining the website availability of the target website according to the first availability index and the second availability index of the target website.
2. The method of determining website availability of claim 1, further comprising:
acquiring high-quality access probabilities of various browsers for accessing the target website and acquiring preset high-quality access probability threshold values of the various browsers from the user access historical data; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
determining the browser with the high-quality access probability higher than a preset high-quality access probability threshold value as a browser compatible with the target website, calculating the compatibility probability of the target website to the browser, and taking the compatibility probability of the target website to the browser as a third availability index of the target website;
correspondingly, the determining the website availability of the target website according to the first availability index and the second availability index of the target website specifically includes:
and determining the website availability of the target website according to the first availability index, the second availability index and the third availability index of the target website.
3. The method of determining website availability of claim 1, further comprising:
acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access history data, and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining the website availability probability of the target website according to the first availability index and the second availability index of the target website specifically includes:
and determining the website availability of the target website according to the first availability index, the second availability index and the fourth availability index of the target website.
4. The method of determining website availability of claim 2, further comprising:
acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access history data, and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining the website availability probability of the target website according to the first availability index and the second availability index of the target website specifically includes: and determining the website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
5. The method of claim 1, wherein determining the website availability of the destination website according to the first availability index and the second availability index of the destination website comprises:
respectively setting weight values for the first availability index and the second availability index of the target website;
and determining the website availability of the target website according to the weighted values of the first availability index and the second availability index.
6. The method for determining website availability according to claim 1, wherein the scanning, by using a crawler system, the web page links on each page of the target website to obtain the number of web page links that can be normally accessed in the target website and the total number of web page links in the target website includes:
scanning the webpage links on each page of the target website by using a crawler system;
when the webpage link on any page of the target website is scanned, if the status code of the received returned Http request is 200, determining the webpage link as a webpage link capable of being normally accessed;
and counting the number of the webpage links which can be normally accessed in the target website and the total number of the webpage links in the target website.
7. The method for determining website availability according to claim 1, wherein the collecting user access history data of the target website and obtaining the total number of pages accessed by the target website and the number of pages with problems when the target website is accessed from the user access history data comprises:
acquiring user access historical data of the target website, and acquiring a page with the refreshing time meeting a preset threshold value in the same session from the user access historical data as a page with a problem when being accessed;
and counting the total number of the pages accessed by the target website and the number of the pages with problems when the target website is accessed from the user access history data.
8. An apparatus for determining website availability, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for scanning webpage links on each page of a target website by using a crawler system, and acquiring the number of the webpage links which can be normally visited in the target website and the total number of the webpage links in the target website;
the first calculation module is used for taking the ratio of the number of the webpage links which can be normally accessed in the target website to the total number of the webpage links in the target website as a first availability index of the target website;
the second acquisition module is used for acquiring user access history data of the target website and acquiring the total page number of the accessed target website and the page number of the problem when the target website is accessed from the user access history data;
the second calculation module is used for taking the ratio of the difference between the total page number of the accessed target websites and the page number with problems when the target websites are accessed to the total page number of the accessed target websites as a second usability index of the target websites;
and the determining module is used for determining the website availability of the target website according to the first availability index and the second availability index of the target website.
9. The apparatus for determining website availability of claim 8, wherein the apparatus further comprises:
the third acquisition module is used for acquiring the high-quality access probability of various browsers for accessing the target website and acquiring the preset high-quality access probability threshold of various browsers; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
the third calculation module is used for determining the browser with the high-quality access probability higher than a preset high-quality access probability threshold value as the browser compatible with the target website, calculating the compatibility probability of the target website to the browser, and taking the compatibility probability of the target website to the browser as a third availability index of the target website;
correspondingly, the determining module is specifically configured to:
and determining the website availability of the target website according to the first availability index, the second availability index and the third availability index of the target website.
10. The apparatus for determining website availability of claim 8, wherein the apparatus further comprises:
the fourth acquisition module is used for acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access historical data and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
the fourth calculation module is used for determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining module is specifically configured to:
and determining the website availability of the target website according to the first availability index, the second availability index and the fourth availability index of the target website.
11. The apparatus for determining website availability of claim 9, wherein the apparatus further comprises:
the fourth acquisition module is used for acquiring high-quality access probability of accessing the target website by using screens with various resolutions from the user access historical data and acquiring preset high-quality access probability threshold values of the screens with various resolutions; the high-quality access comprises the access with the access retention time and the number of the browsed pages respectively higher than a preset threshold value;
the fourth calculation module is used for determining a screen with a resolution ratio of which the high-quality access probability is higher than a preset high-quality access probability threshold value as a screen compatible with the target website, calculating the compatibility probability of the target website to the screen resolution ratio, and taking the compatibility probability of the target website to the screen resolution ratio as a fourth availability index of the target website;
correspondingly, the determining module is specifically configured to: and determining the website availability of the target website according to the first availability index, the second availability index, the third availability index and the fourth availability index of the target website.
12. The apparatus for determining website availability according to claim 8, wherein the determining module comprises:
the setting sub-module is used for respectively setting weighted values for the first availability index and the second availability index of the target website;
the first determining submodule is used for determining website availability of the target website according to the weighted values of the first availability index and the second availability index.
CN201610466058.4A 2016-06-23 2016-06-23 Method and device for determining website availability Active CN107544968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610466058.4A CN107544968B (en) 2016-06-23 2016-06-23 Method and device for determining website availability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610466058.4A CN107544968B (en) 2016-06-23 2016-06-23 Method and device for determining website availability

Publications (2)

Publication Number Publication Date
CN107544968A CN107544968A (en) 2018-01-05
CN107544968B true CN107544968B (en) 2019-12-24

Family

ID=60960473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610466058.4A Active CN107544968B (en) 2016-06-23 2016-06-23 Method and device for determining website availability

Country Status (1)

Country Link
CN (1) CN107544968B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656671B (en) * 2021-06-16 2024-05-24 北京百度网讯科技有限公司 Model training method, link scoring method, device, equipment, medium and product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073960A (en) * 2010-09-15 2011-05-25 江苏仕德伟网络科技股份有限公司 Method for assessing operation effect in website marketing process
CN102855256A (en) * 2011-06-29 2013-01-02 北京百度网讯科技有限公司 Method, device and equipment for determining evaluation information of websites
CN103929330A (en) * 2014-04-22 2014-07-16 中国科学院计算技术研究所 Domain name service quality evaluation method and system
CN104765881A (en) * 2015-04-28 2015-07-08 携程计算机技术(上海)有限公司 Assessment method for availability of website

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073960A (en) * 2010-09-15 2011-05-25 江苏仕德伟网络科技股份有限公司 Method for assessing operation effect in website marketing process
CN102855256A (en) * 2011-06-29 2013-01-02 北京百度网讯科技有限公司 Method, device and equipment for determining evaluation information of websites
CN103929330A (en) * 2014-04-22 2014-07-16 中国科学院计算技术研究所 Domain name service quality evaluation method and system
CN104765881A (en) * 2015-04-28 2015-07-08 携程计算机技术(上海)有限公司 Assessment method for availability of website

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于网站流量统计的网站可用性指标及计算方法研究;柯青 等;《图书情报工作》;20111031;第55卷(第20期);138-143 *

Also Published As

Publication number Publication date
CN107544968A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
RU2628127C2 (en) Method and device for identification of user behavior
JP6422617B2 (en) Network access operation identification program, server, and storage medium
EP3379788A1 (en) Network attacks identifying method and device
JP2019537115A (en) Method, apparatus and system for detecting abnormal user behavior
CN109450879A (en) User access activity monitoring method, electronic device and computer readable storage medium
KR20180004749A (en) Service scenario matching method and system
CN109598526B (en) Method and device for analyzing media contribution
CN108366012B (en) Social relationship establishing method and device and electronic equipment
EP3293642A1 (en) Method and apparatus for recording and restoring click position in page
CN102831218A (en) Method and device for determining data in thermodynamic chart
CN110334013B (en) Decision engine testing method and device and electronic equipment
CN105138675A (en) Database auditing method and device
CN110955846A (en) Propagation path diagram generation method and device
CN107220260B (en) Page display method and device
CN106937173B (en) Video playing method and device
CN107544968B (en) Method and device for determining website availability
CN110717122A (en) Page performance acquisition method and device and electronic equipment
CN109600272A (en) The method and device of crawler detection
CN106611118B (en) Method and device for applying login credentials
CN112749352A (en) Webpage skipping method and device, electronic equipment and readable storage medium
CN110119334B (en) Page script monitoring method and device
CN108062338B (en) Method and device for evaluating navigation capability of function page
CN110020332B (en) Event generation method and device based on circled elements
CN109426540B (en) Element click condition detection method and device, storage medium and processor
CN109656805B (en) Method and device for generating code link for business analysis and business server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant