WO2017101652A1 - 网站页面间访问路径的确定方法及装置 - Google Patents

网站页面间访问路径的确定方法及装置 Download PDF

Info

Publication number
WO2017101652A1
WO2017101652A1 PCT/CN2016/107106 CN2016107106W WO2017101652A1 WO 2017101652 A1 WO2017101652 A1 WO 2017101652A1 CN 2016107106 W CN2016107106 W CN 2016107106W WO 2017101652 A1 WO2017101652 A1 WO 2017101652A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
access
original
pages
path
Prior art date
Application number
PCT/CN2016/107106
Other languages
English (en)
French (fr)
Inventor
李新国
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2017101652A1 publication Critical patent/WO2017101652A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Definitions

  • the present application relates to the field of the Internet, and in particular, to a method and an apparatus for determining an access path between web pages.
  • the main purpose of the present application is to provide a method and a device for determining an access path between web pages, so as to solve the problem that the user cannot know the real access path between important pages on the website in the related art.
  • a method for determining an access path between web pages includes: obtaining an access log, wherein the access log is a log generated according to the access information of the target website; obtaining an original access path between the original pages of the website page according to the access log; filtering the original access path between the original pages to obtain a target The original access path between the pages; and the loop in the original access path between the target pages is removed, and the target access path between the target pages is determined according to the access log in the original access path between the target pages after the loop is removed.
  • the loop in the original access path between the target pages is removed, and the target access path between the target pages is determined according to the access log in the original access path between the target pages after the loop is removed, including: traversing the original access between the target pages according to the access order
  • the path is divided into loops in the original access path between the target pages to obtain a set of original access sub-paths between the target pages; in the original visited sub-path set between the target pages, the sub-paths included in the other sub-paths are deleted, and the obtained
  • the number of sessions included; sorting the original access subpath between each target page in the original access subpath set between the deleted target pages according to the number of sessions; and determining the target page from the original access subpath between the sorted target pages The target access path.
  • the original access path between the original pages is filtered, and the original access path between the target pages is obtained by: determining a target page set in advance; extracting a path of the continuous access target page from the original access path between the original pages, and obtaining at least one continuous The path to the target page; and the path of at least one consecutive access to the target page as the original access path between the target pages.
  • the original access path between the original pages is filtered, and the original access path between the target pages is obtained: determining a preset target page; filtering the non-target page in the original access path between the original pages according to the preset target page. Processing; and the original access path between the filtered original pages as the original access path between the target pages.
  • the method further includes: collecting access information for the target website according to the preset script code; sending the access information of the target website to the target address; and generating an access according to the access information of the target website on the target address Log.
  • obtaining the original access path between the original pages of the website page according to the access log includes: acquiring a preset target page; determining all sessions in the access log; and filtering and accessing the preset target page from all the sessions in the access log.
  • the session, the target session is obtained; and the access order of the accessed page in the target session is determined separately, and the original access path between the original pages is obtained.
  • an apparatus for determining an access path between web pages includes: a first obtaining unit, configured to acquire an access log, where the access log is a log generated according to the access information of the target website; and the second obtaining unit is configured to obtain an original access path between the original pages of the website page according to the access log.
  • a processing unit configured to filter the original access path between the original pages to obtain an original access path between the target pages; and a determining unit, configured to remove a loop in the original access path between the target pages, and remove the ring according to the access log
  • the target access path between the target pages is determined in the original access path between the target pages after the road.
  • the determining unit includes: a segmentation module, configured to traverse the original access path between the target pages according to the access order, and divide the loop in the original access path between the target pages to obtain a set of original access sub-paths between the target pages;
  • the module is configured to: in the original access sub-path set between the target pages, delete the sub-paths included in the other sub-paths, and obtain the original access sub-path set between the deleted target pages;
  • the statistics module is configured to separately delete according to the access log.
  • the processing unit includes: a second determining module, configured to determine a target page set in advance; and an extracting module, configured to extract a path of the continuous access target page from the original access path between the original pages, to obtain at least one consecutive access target page a path; and a third determining module, configured to use at least one path that continuously accesses the target page as the original access path between the target pages.
  • the processing unit includes: a fourth determining module, configured to determine a target page that is set in advance; and a second processing module, configured to perform filtering processing on the non-target page in the original access path between the original pages according to the preset target page; And a fifth determining module, configured to use the original access path between the filtered original pages as the original access path between the target pages.
  • the following steps are taken: obtaining an access log, wherein the access log is a log generated according to the access information of the target website; obtaining an original access path between the original pages of the website page according to the access log; and filtering the original access path between the original pages Processing, obtaining the original access path between the target pages; and removing the loop in the original access path between the target pages, and determining the target access path between the target pages according to the access log in the original access path between the target pages after the loop is removed, and solving the problem
  • the problem that the user has a real access path between important pages on the website cannot be known, and the problem that the user cannot know the real access path between important pages on the website in the related art is solved.
  • FIG. 1 is a flowchart of a method for determining an access path between web pages according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an apparatus for determining an access path between web pages according to an embodiment of the present application.
  • a method for determining an access path between web pages is provided.
  • FIG. 1 is a flowchart of a method for determining an access path between web pages according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S101 Acquire an access log, where the access log is a log generated according to the access information of the target website.
  • the method before acquiring the access log, further includes: collecting the access information for the target website according to the preset script code; and sending the access of the target website. The information is sent to the target address; and an access log is generated on the target address according to the access information of the target website.
  • Deploy Tracker (JS script) on the target website. After the deployment is completed, all the access data of the user on the website will be sent to the specified server, and an access log will be generated on the specified server according to the access information of the target website to obtain the target time period.
  • the access log where the target time is the time at which the user wishes to determine the access path between the pages of the website.
  • Step S102 Obtain an original access path between the original pages of the website page according to the access log.
  • obtaining the original access path between the original pages of the website page according to the access log includes: acquiring a preset target page; determining all sessions in the access log. Filtering the sessions that have accessed the pre-set target pages from all the sessions in the access log to obtain the target session; and determining the access order of the accessed pages in the target session separately, and obtaining the original access paths between the original pages.
  • the preset target page is an important page that the customer wants to count, such as p1, p2, p3, and p4. From all the sessions in the access log, the session that has accessed the important page is filtered and As Target session.
  • the access path of a target session is p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8, which is the original access path between the original pages of the target session.
  • Step S103 filtering the original access path between the original pages to obtain an original access path between the target pages.
  • the original access path between the original pages is filtered, and obtaining the original access path between the target pages includes: determining a preset target page; The path of the continuous access target page is extracted from the original access path between the pages, and at least one path for continuously accessing the target page is obtained; and the path of at least one consecutive access target page is used as the original access path between the target pages.
  • the preset target page is an important page that the customer wants to count, such as four target pages p1, p2, p3, and p4. If the user only counts the paths of consecutively accessing the target page, according to p1, p2, p3, and p4 P5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8 extracts the path of consecutive access to the target page, and obtains access paths of three consecutive accesses: p1-p3, p4-p1 and p3-p2, P1-p3, p4-p1 and p3-p2 are used as the original access paths between the target pages.
  • the original access path between the original pages is filtered, and obtaining the original access path between the target pages includes: determining a preset target page; The set target page filters the non-target pages in the original access path between the original pages; and uses the filtered original access path between the original pages as the original access path between the target pages.
  • the preset target page is an important page that the customer wants to count, such as p1, p2, p3, and p4. If the user does not require only counting the paths of consecutively accessing the target page, then according to p1, p2, p3, and p4. Filtering non-target pages in p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8 to remove p5-p1-p3-p7-p6-p4-p1-p9-p3 Non-target page in -p2-p8, obtained after processing: p1-p3-p4-p1-p3-p2. P1-p3-p4-p1-p3-p2 is used as the original access path between the target pages.
  • Step S104 the loop in the original access path between the target pages is removed, and the target access path between the target pages is determined according to the access log in the original access path between the target pages after the loop is removed.
  • the loop in p1-p3-p4-p1-p3-p2 is removed, and the target access path between the target pages is determined according to the access log in the original access path between the target pages after the loop is removed.
  • the loop in the original access path between the target pages is removed, and according to the access log, the original access path between the target pages after the loop is removed.
  • Determining the target access path between the target pages includes: traversing the original access path between the target pages according to the access order, and dividing the loops in the original access path between the target pages to obtain a set of original access sub-paths between the target pages; Accessing the sub-path set, deleting the sub-paths included in the other sub-paths, and obtaining the original access sub-path set between the deleted target pages; respectively, counting each target in the original access sub-path set between the deleted target pages according to the access log The number of sessions included in the original access subpath between pages; sorting the original access subpath between each target page in the original access subpath set between the deleted target pages according to the number of sessions; and original access between the sorted target pages Determining the target access path between target pages in the sub path
  • the above-mentioned extracted path p1-p3-p4-p1-p3-p2 is segmented, and the purpose of the segmentation is to remove the loop from the p1-p3-p4-p1-p3-p2 path, from the path
  • An element starts looking for the longest acyclic path in turn, for example, for p1-p3-p4-p1-p3-p2, starting with the first one, finding p1-p3-p4, and then finding p3 from the second element.
  • -p4-p1 find p4-p1-p3-p2 from the third element, always find the end of the path.
  • the obtained paths are de-merged.
  • the above steps collect Tracker (preset script code) on the target website, collect user access information on the target website, count each user's access behavior in the website, and find out the access to the specified page (important page).
  • the session removes the non-important pages in the session, then splits the loops included in the session, and finally counts the target access paths between the target pages, thereby achieving the effect of knowing the true access path between the important pages of the user on the website.
  • the method for determining the access path between the website pages obtains the access log, wherein the access log is a log generated according to the access information of the target website; and the original access path between the original pages of the website page is obtained according to the access log; The original access path between the original pages is filtered to obtain the original access path between the target pages; and the loop in the original access path between the target pages is removed, and the target is determined according to the access log in the original access path between the target pages after the loop is removed.
  • the problem of the target access path between the pages solves the problem that the user cannot know the real access path between the important pages on the website in the related art, and solves the problem that the user cannot know the real access path between the important pages on the website in the related art.
  • the embodiment of the present application further provides a device for determining an access path between the pages of the website. It should be noted that the device for determining the access path between the pages of the website in the embodiment of the present application may be used to execute the website provided by the embodiment of the present application. The method of determining the access path between pages. The following describes an apparatus for determining an access path between web pages provided by an embodiment of the present application.
  • the apparatus includes: a first acquisition unit 10, a second acquisition unit 20, a processing unit 30, and a determination unit 40.
  • the first obtaining unit 10 is configured to acquire an access log, where the access log is a log generated according to the access information of the target website.
  • the second obtaining unit 20 is configured to obtain an original access path between the original pages of the website page according to the access log.
  • the processing unit 30 is configured to filter the original access path between the original pages to obtain an original access path between the target pages.
  • the determining unit 40 is configured to remove a loop in the original access path between the target pages, and determine a target access path between the target pages according to the access log in the original access path between the target pages after the loop is removed.
  • first obtaining unit 10, second obtaining unit 20, processing unit 30 and determining unit 40 may be operated in a computer terminal as part of the device, and the above module may be executed by a processor in the computer terminal.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • the device for determining the access path between the pages of the website obtains the access log by the first obtaining unit 10, wherein the access log is a log generated according to the access information of the target website; the second obtaining unit 20 obtains the website according to the access log.
  • the log determines the target access path between the target pages in the original access path between the target pages after the loop is removed, and solves the problem that the user cannot know the real access path between the important pages on the website in the related art, and collects the user on the target website.
  • Access information that is, statistics of each user's access behavior in the website
  • find the session to access the specified page remove the non-important page in the session, then split the ring included in the session, and finally count the target page Target access path, which can be used to know The effect of the actual access path between the important pages of the user on the website.
  • the determining unit 40 includes: a segmentation module, configured to traverse the original access path between the target pages according to the access order, and the original access path between the target pages.
  • the loop in the segment is segmented to obtain the original access subpath set between the target pages;
  • the delete module is used to delete the subpath included in the other subpaths in the original access subpath set between the target pages, and obtain the deleted target.
  • a set of original access sub-paths between pages a statistical module, configured to separately count, according to the access log, the number of sessions included in the original access sub-path between each target page in the original visited sub-path set between the deleted target pages; the first processing module uses Sorting the original access subpath between each target page in the original access subpath set between the deleted target pages according to the number of sessions; and the first determining module, configured to use the original access subpath between the sorted target pages Determine the target access path between target pages.
  • the above-mentioned splitting module, deleting module, statistics module, first processing module and first determining module may be run in a computer terminal as part of the device, and the above module may be executed by a processor in the computer terminal.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • the processing unit 30 includes: a second determining module, configured to determine a target page set in advance; and an extracting module, configured to use the original between the original pages
  • the path of the continuous access target page is extracted from the access path to obtain at least one path for continuously accessing the target page; and the third determining module is configured to use the path of the at least one consecutive access target page as the original access path between the target pages.
  • the foregoing second determining module, the extracting module, and the third determining module may be run in a computer terminal as part of the device, and the functions implemented by the foregoing module may be performed by a processor in the computer terminal, and the computer terminal is also It can be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • a smart phone such as an Android phone, an iOS phone, etc.
  • a tablet computer such as an iOS phone, etc.
  • a palm computer such as a tablet computer
  • PAD mobile Internet device
  • the processing unit 30 includes: a fourth determining module, configured to determine a target page set in advance; and a second processing module, configured to preset according to the preset The target page filters the non-target page in the original access path between the original pages; and the fifth determining module is configured to use the original access path between the filtered original pages as the original access path between the target pages.
  • the fourth determining module, the second processing module, and the fifth determining module may be run in a computer terminal as part of the device, and the function implemented by the module may be executed by a processor in the computer terminal, the computer
  • the terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • the determining device of the inter-page access path includes a processor and a memory, and the first acquiring unit, The second acquisition unit, the processing unit, the determination unit, and the like are each stored as a program unit in a memory, and the above-described program unit stored in the memory is executed by the processor to implement a corresponding function.
  • the first preset condition, the second preset condition, the preset splitting rule, the preset script code, and the like may be stored in the memory.
  • the processor contains a kernel, and the kernel removes the corresponding program unit from the memory.
  • the kernel can set one or more, and determine the access path between the pages of the website by adjusting the kernel parameters.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • the various functional modules provided by the embodiments of the present application may be run in a mobile terminal, a computer terminal, or the like, or may be stored as part of a storage medium.
  • embodiments of the present invention may provide a computer terminal, which may be any computer terminal device in a group of computer terminals.
  • a computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the data processing device When executed on the data processing device, it is adapted to perform program code for initializing a method of obtaining an access log, wherein the access log is a log generated according to access information of the target website; and determining from the access log that the target page has been accessed All sessions, get at least one target session; determine the access order of the accessed pages in each target session separately, and obtain the original access path between the original pages; process the original access path between the original pages according to the first preset condition, and obtain the target The original access path between pages; and determining the target access path between the target pages according to the original access path between the target pages.
  • the computer terminal can include: one or more processors, memory, and transmission means.
  • the memory can be used to store the software program and the module, such as the method for determining the access path between the website pages in the embodiment of the present invention and the program instruction/module corresponding to the device, and the processor runs the software program and the module stored in the memory, thereby
  • the implementation of various functional applications and data processing that is, the method for determining the access path between the above-mentioned website pages is implemented.
  • the memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above transmission device is for receiving or transmitting data via a network.
  • the above specific examples of the network can be packaged Including wired and wireless networks.
  • the transmission device includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • the transmission device is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the memory is used for the first preset condition, the second preset condition, the preset splitting rule, the preset script code, and the application.
  • the processor can call the memory stored information and the application by the transmitting device to execute the program code of the method steps of each of the alternative or preferred embodiments of the above method embodiments.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • a smart phone such as an Android phone, an iOS phone, etc.
  • a tablet computer such as a Samsung Galaxy Tab, etc.
  • a palm computer such as a Samsung Galaxy Tab, etc.
  • MID mobile Internet device
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save the program code executed by the method for determining the access path between the website pages provided by the foregoing method embodiment and the device embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: acquiring an access log, wherein the access log is a log generated according to access information of the target website; determining access from the access log Passing all the sessions of the target page, obtaining at least one target session; respectively determining the access order of the accessed pages in each target session, obtaining the original access path between the original pages; performing the original access path between the original pages according to the first preset condition Processing, obtaining the original access path between the target pages; and determining the target access path between the target pages according to the original access path between the target pages.
  • the storage medium may also be configured as program code for storing various preferred or optional method steps provided by the determining method of the access path between the pages of the website.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • modules or steps of the present application can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device by a computing device, or they may be fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the application is not limited to any particular combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请公开了一种网站页面间访问路径的确定方法及装置。该方法包括:获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;根据访问日志获取网站页面的原始页面间原始访问路径;对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。通过本申请,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题。

Description

网站页面间访问路径的确定方法及装置 技术领域
本申请涉及互联网领域,具体而言,涉及一种网站页面间访问路径的确定方法及装置。
背景技术
目前,在对网站数据进行分析时,通常需要获知用户在网站的指定的几个重要页面间最常使用的访问路径。例如,网站中有A、B、C、D四个重要页面,预期用户会按照A->B->C->D页面间的顺序进行访问(忽略中间访问了其它页面),而且A->B->C->D这个路径也跟网站的具体业务处理路径相符合。然而,用户在重要页面间真实的访问路径不一定与网站预期的访问路径相同,而相关技术中无法获知用户在网站上重要页面间的真实访问路径。
针对相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题,目前尚未提出有效的解决方案。
发明内容
本申请的主要目的在于提供一种网站页面间访问路径的确定方法及装置,以解决相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题。
为了实现上述目的,根据本申请的一个方面,提供了一种网站页面间访问路径的确定方法。该方法包括:获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;根据访问日志获取网站页面的原始页面间原始访问路径;对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
进一步地,去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径包括:按照访问顺序遍历目标页面间原始访问路径,对目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;在目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;根据访问日志分别统计删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径 包含的会话数量;根据会话数量对删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
进一步地,对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:确定预先设置的目标页面;从原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及将至少一条连续访问目标页面的路径作为目标页面间原始访问路径。
进一步地,对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:确定预先设置的目标页面;根据预先设置的目标页面对原始页面间原始访问路径中的非目标页面进行过滤处理;以及将过滤后的原始页面间原始访问路径作为目标页面间原始访问路径。
进一步地,在获取访问日志之前,该方法还包括:根据预设脚本代码采集针对目标网站的访问信息;发送目标网站的访问信息至目标地址;以及在目标地址上根据目标网站的访问信息生成访问日志。
进一步地,根据访问日志获取网站页面的原始页面间原始访问路径包括:获取预先设置的目标页面;确定访问日志中的所有会话;从访问日志中的所有会话中筛选访问过预先设置的目标页面的会话,得到目标会话;以及分别确定目标会话中对被访问页面的访问顺序,得到原始页面间原始访问路径。
为了实现上述目的,根据本申请的另一方面,提供了一种网站页面间访问路径的确定装置。该装置包括:第一获取单元,用于获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;第二获取单元,用于根据访问日志获取网站页面的原始页面间原始访问路径;处理单元,用于对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及确定单元,用于去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
进一步地,确定单元包括:切分模块,用于按照访问顺序遍历目标页面间原始访问路径,对目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;删除模块,用于在目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;统计模块,用于根据访问日志分别统计删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径包含的会话数量;第一处理模块,用于根据会话数量对删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及第一确 定模块,用于从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
进一步地,处理单元包括:第二确定模块,用于确定预先设置的目标页面;提取模块,用于从原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及第三确定模块,用于将至少一条连续访问目标页面的路径作为目标页面间原始访问路径。
进一步地,处理单元包括:第四确定模块,用于确定预先设置的目标页面;第二处理模块,用于根据预先设置的目标页面对原始页面间原始访问路径中的非目标页面进行过滤处理;以及第五确定模块,用于将过滤后的原始页面间原始访问路径作为目标页面间原始访问路径。
通过本申请,采用以下步骤:获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;根据访问日志获取网站页面的原始页面间原始访问路径;对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题。通过收集用户在目标网站上的访问信息,找出访问指定页面的会话,去除会话中的非重要页面,然后对会话中包含的环进行切分,最后统计出目标页面间目标访问路径,进而达到了能够获知用户在网站上重要页面间的真实访问路径的效果。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的网站页面间访问路径的确定方法的流程图;以及
图2是根据本申请实施例的网站页面间访问路径的确定装置的示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例 仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请的实施例,提供了一种网站页面间访问路径的确定方法。
图1是根据本申请实施例的网站页面间访问路径的确定方法的流程图。如图1所示,该方法包括以下步骤:
步骤S101,获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志。
可选地,在本申请实施例提供的网站页面间访问路径的确定方法中,在获取访问日志之前,该方法还包括:根据预设脚本代码采集针对目标网站的访问信息;发送目标网站的访问信息至目标地址;以及在目标地址上根据目标网站的访问信息生成访问日志。
在目标网站上部署Tracker(JS脚本),部署完成之后,用户在该网站的所有访问数据都会被发送到指定服务器,在指定服务器上根据目标网站的访问信息生成访问日志,获取目标时间段内的访问日志,其中,目标时间是用户希望在具体哪段时间内确定网站页面间访问路径的时间。
步骤S102,根据访问日志获取网站页面的原始页面间原始访问路径。
可选地,在本申请实施例提供的网站页面间访问路径的确定方法中,根据访问日志获取网站页面的原始页面间原始访问路径包括:获取预先设置的目标页面;确定访问日志中的所有会话;从访问日志中的所有会话中筛选访问过预先设置的目标页面的会话,得到目标会话;以及分别确定目标会话中对被访问页面的访问顺序,得到原始页面间原始访问路径。
例如,预先设置的目标页面为客户想要统计的重要页面,如p1、p2、p3和p4四个页面,从访问日志中的所有会话中,筛选访问过所设重要页面的会话,并将其作为 目标会话。
分别确定上述得到的至少一个目标会话中每个目标会话中对被访问页面的访问顺序,得到原始页面间原始访问路径。例如,某个目标会话的访问路径为p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8,即其为该目标会话的原始页面间原始访问路径。
步骤S103,对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径。
可选地,在本申请实施例提供的网站页面间访问路径的确定方法中,对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:确定预先设置的目标页面;从原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及将至少一条连续访问目标页面的路径作为目标页面间原始访问路径。
例如,预先设置的目标页面为客户想要统计的重要页面,如p1、p2、p3和p4四个目标页面,若用户只统计连续访问目标页面的路径,则根据p1、p2、p3和p4从p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8中提取连续访问目标页面的路径,得到:p1-p3,p4-p1和p3-p2三条连续访问的访问路径,将p1-p3,p4-p1和p3-p2作为目标页面间原始访问路径。
可选地,在本申请实施例提供的网站页面间访问路径的确定方法中,对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:确定预先设置的目标页面;根据预先设置的目标页面对原始页面间原始访问路径中的非目标页面进行过滤处理;以及将过滤后的原始页面间原始访问路径作为目标页面间原始访问路径。
例如,预先设置的目标页面为客户想要统计的重要页面,如p1、p2、p3和p4四个页面,若用户不要求只统计连续访问目标页面的路径,则根据p1、p2、p3和p4对p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8中的非目标页面进行过滤处理,去除掉p5-p1-p3-p7-p6-p4-p1-p9-p3-p2-p8中非目标页面,处理后得到:p1-p3-p4-p1-p3-p2。将p1-p3-p4-p1-p3-p2作为目标页面间原始访问路径。
通过该步骤,可以根据用户需求只统计连续访问的访问路径或者统计所有访问目标页面的访问路径作为目标页面间原始访问路径。
步骤S104,去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
例如,去除p1-p3-p4-p1-p3-p2中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
可选地,在本申请实施例提供的网站页面间访问路径的确定方法中,去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径包括:按照访问顺序遍历目标页面间原始访问路径,对目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;在目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;根据访问日志分别统计删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径包含的会话数量;根据会话数量对删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
具体地,对上述取出的路径p1-p3-p4-p1-p3-p2进行切分,切分的目的是从p1-p3-p4-p1-p3-p2路径中去除环路,从路径中第一个元素开始依次寻找最长无环路径,例如对p1-p3-p4-p1-p3-p2,先从第一个开始,找到p1-p3-p4,然后从第二个元素开始找得到p3-p4-p1,然后从第三个元素开始找得到p4-p1-p3-p2,一直找到路径的末尾。最后对得到的路径进行去重合并。即假设最终得到的路径中既有p4-p1-p3-p2又有p3-p2,由于前者包含后者,则将后者舍去,最终返回p1-p3-p4和p4-p1-p3-p2两条路径。再解析目标时间内访问日志中的所有访问信息,得到此段时间的所有访问路径,并统计各个路径包含的会话数量,根据会话数对各个路径进行排名,根据排名结果得到目标页面间目标访问路径。
综上所述,以上步骤通过在目标网站添加Tracker(预设脚本代码),收集用户在目标网站的访问信息,统计每个用户在网站中的访问行为,找出访问指定页面(重要页面)的会话,去除会话中的非重要页面,然后对会话中包含的环进行切分,最后统计出目标页面间目标访问路径,进而达到了能够获知用户在网站上重要页面间的真实访问路径的效果。
本申请实施例提供的网站页面间访问路径的确定方法,通过获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;根据访问日志获取网站页面的原始页面间原始访问路径;对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题。通过收集用户在目标网站上的访问信息,找出访问指定页面的会话,去除会话中的非重要页面,然后对会话中包含的环进行切分,最后统计出目标页面间目标访问路径,进而达到了能够获知用户在网站上重要页面间的真实访问路径的效果。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机***中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例还提供了一种网站页面间访问路径的确定装置,需要说明的是,本申请实施例的网站页面间访问路径的确定装置可以用于执行本申请实施例所提供的用于网站页面间访问路径的确定方法。以下对本申请实施例提供的网站页面间访问路径的确定装置进行介绍。
图2是根据本申请实施例的网站页面间访问路径的确定装置的示意图。如图2所示,该装置包括:第一获取单元10、第二获取单元20、处理单元30和确定单元40。
第一获取单元10,用于获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志。
第二获取单元20,用于根据访问日志获取网站页面的原始页面间原始访问路径。
处理单元30,用于对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径。
确定单元40,用于去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
此处需要说明的是,上述第一获取单元10、第二获取单元20、处理单元30和确定单元40可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本申请实施例提供的网站页面间访问路径的确定装置,通过第一获取单元10获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;第二获取单元20根据访问日志获取网站页面的原始页面间原始访问路径;处理单元30对原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及确定单元40去除目标页面间原始访问路径中的环路,并根据访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径,解决了相关技术中无法获知用户在网站上重要页面间的真实访问路径的问题,通过收集用户在目标网站上的访问信息(即统计每个用户在网站中的访问行为),找出访问指定页面的会话,去除会话中的非重要页面,然后对会话中包含的环进行切分,最后统计出目标页面间目标访问路径,进而达到了能够获知用户在网站上重要页面间的真实访问路径的效果。
可选地,在本申请实施例提供的网站页面间访问路径的确定装置中,确定单元40包括:切分模块,用于按照访问顺序遍历目标页面间原始访问路径,对目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;删除模块,用于在目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;统计模块,用于根据访问日志分别统计删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径包含的会话数量;第一处理模块,用于根据会话数量对删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及第一确定模块,用于从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
此处需要说明的是,上述切分模块、删除模块、统计模块、第一处理模块和第一确定模块可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,在本申请实施例提供的网站页面间访问路径的确定装置中,处理单元30包括:第二确定模块,用于确定预先设置的目标页面;提取模块,用于从原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及第三确定模块,用于将至少一条连续访问目标页面的路径作为目标页面间原始访问路径。
此处需要说明的是,上述第二确定模块、提取模块和第三确定模块可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,在本申请实施例提供的网站页面间访问路径的确定装置中,处理单元30包括:第四确定模块,用于确定预先设置的目标页面;第二处理模块,用于根据预先设置的目标页面对原始页面间原始访问路径中的非目标页面进行过滤处理;以及第五确定模块,用于将过滤后的原始页面间原始访问路径作为目标页面间原始访问路径。
此处需要说明的是,上述第四确定模块、第二处理模块和第五确定模块可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
所述网站页面间访问路径的确定装置包括处理器和存储器,上述第一获取单元、 第二获取单元、处理单元和确定单元等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元实现相应功能。上述第一预设条件、第二预设条件、预设切分规则、预设脚本代码等都可以存储在存储器中。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数确定网站页面间访问路径。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。
本申请实施例所提供的各个功能模块可以在移动终端、计算机终端或者类似的运算装置中运行,也可以作为存储介质的一部分进行存储。
由此,本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序代码:获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;从访问日志中确定访问过目标页面的所有会话,得到至少一个目标会话;分别确定每个目标会话中对被访问页面的访问顺序,得到原始页面间原始访问路径;按照第一预设条件对原始页面间原始访问路径进行处理,得到目标页面间原始访问路径;以及根据目标页面间原始访问路径确定目标页面间目标访问路径。
可选地,该计算机终端可以包括:一个或多个处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的网站页面间访问路径的确定方法及装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的网站页面间访问路径的确定方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置用于经由一个网络接收或者发送数据。上述的网络具体实例可包 括有线网络及无线网络。在一个实例中,传输装置包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器用于第一预设条件、第二预设条件、预设切分规则、预设脚本代码以及应用程序。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行上述方法实施例中的各个可选或优选实施例的方法步骤的程序代码。
本领域普通技术人员可以理解,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述方法实施例和装置实施例所提供的网站页面间访问路径的确定方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:获取访问日志,其中,访问日志为根据目标网站的访问信息生成的日志;从访问日志中确定访问过目标页面的所有会话,得到至少一个目标会话;分别确定每个目标会话中对被访问页面的访问顺序,得到原始页面间原始访问路径;按照第一预设条件对原始页面间原始访问路径进行处理,得到目标页面间原始访问路径;以及根据目标页面间原始访问路径确定目标页面间目标访问路径。
可选地,在本实施例中,存储介质还可以被设置为存储网站页面间访问路径的确定方法提供的各种优选地或可选的方法步骤的程序代码。
如上参照附图以示例的方式描述了根据本发明的网站页面间访问路径的确定方法及装置。但是,本领域技术人员应当理解,对于上述本发明所提出的网站页面间访问路径的确定方法及装置,还可以在不脱离本发明内容的基础上做出各种改进。因此, 本发明的保护范围应当由所附的权利要求书的内容确定。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (12)

  1. 一种网站页面间访问路径的确定方法,其特征在于,包括:
    获取访问日志,其中,所述访问日志为根据目标网站的访问信息生成的日志;
    根据所述访问日志获取网站页面的原始页面间原始访问路径;
    对所述原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及
    去除所述目标页面间原始访问路径中的环路,并根据所述访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
  2. 根据权利要求1所述的方法,其特征在于,去除所述目标页面间原始访问路径中的环路,并根据所述访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径包括:
    按照访问顺序遍历所述目标页面间原始访问路径,对所述目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;
    在所述目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;
    根据所述访问日志分别统计所述删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径包含的会话数量;
    根据所述会话数量对所述删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及
    从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
  3. 根据权利要求1所述的方法,其特征在于,对所述原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:
    确定预先设置的目标页面;
    从所述原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及
    将所述至少一条连续访问目标页面的路径作为所述目标页面间原始访问路径。
  4. 根据权利要求1所述的方法,其特征在于,对所述原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径包括:
    确定预先设置的目标页面;
    根据所述预先设置的目标页面对所述原始页面间原始访问路径中的非目标页面进行过滤处理;以及
    将过滤后的原始页面间原始访问路径作为所述目标页面间原始访问路径。
  5. 根据权利要求1所述的方法,其特征在于,在获取访问日志之前,所述方法还包括:
    根据预设脚本代码采集针对所述目标网站的访问信息;
    发送所述目标网站的访问信息至目标地址;以及
    在所述目标地址上根据所述目标网站的访问信息生成所述访问日志。
  6. 根据权利要求1所述的方法,其特征在于,根据所述访问日志获取网站页面的原始页面间原始访问路径包括:
    获取预先设置的目标页面;
    确定所述访问日志中的所有会话;
    从所述访问日志中的所有会话中筛选访问过所述预先设置的目标页面的会话,得到目标会话;以及
    分别确定所述目标会话中对被访问页面的访问顺序,得到所述原始页面间原始访问路径。
  7. 一种网站页面间访问路径的确定装置,其特征在于,包括:
    第一获取单元,用于获取访问日志,其中,所述访问日志为根据目标网站的访问信息生成的日志;
    第二获取单元,用于根据所述访问日志获取网站页面的原始页面间原始访问路径;
    处理单元,用于对所述原始页面间原始访问路径进行过滤处理,得到目标页面间原始访问路径;以及
    确定单元,用于去除所述目标页面间原始访问路径中的环路,并根据所述访问日志在去除环路后的目标页面间原始访问路径中确定目标页面间目标访问路径。
  8. 根据权利要求7所述的装置,其特征在于,所述确定单元包括:
    切分模块,用于按照访问顺序遍历所述目标页面间原始访问路径,对所述目标页面间原始访问路径中的环路进行切分,得到目标页面间原始访问子路径集合;
    删除模块,用于在所述目标页面间原始访问子路径集合中,删除包含在其他子路径中的子路径,得到删除后的目标页面间原始访问子路径集合;
    统计模块,用于根据所述访问日志分别统计所述删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径包含的会话数量;
    第一处理模块,用于根据所述会话数量对所述删除后的目标页面间原始访问子路径集合中每条目标页面间原始访问子路径进行排序处理;以及
    第一确定模块,用于从排序后的目标页面间原始访问子路径中确定目标页面间目标访问路径。
  9. 根据权利要求7所述的装置,其特征在于,所述处理单元包括:
    第二确定模块,用于确定预先设置的目标页面;
    提取模块,用于从所述原始页面间原始访问路径中提取连续访问目标页面的路径,得到至少一条连续访问目标页面的路径;以及
    第三确定模块,用于将所述至少一条连续访问目标页面的路径作为所述目标页面间原始访问路径。
  10. 根据权利要求7所述的装置,其特征在于,所述处理单元包括:
    第四确定模块,用于确定预先设置的目标页面;
    第二处理模块,用于根据所述预先设置的目标页面对所述原始页面间原始访问路径中的非目标页面进行过滤处理;以及
    第五确定模块,用于将过滤后的原始页面间原始访问路径作为所述目标页面间原始访问路径。
  11. 一种移动终端,用于执行所述权利要求1-6中任意一项所述的网站页面间访问路径的确定提供的步骤的程序代码。
  12. 一种存储介质,用于保存所述权利要求1-6中任意一项所述的网站页面间访问路径的确定所执行的程序代码。
PCT/CN2016/107106 2015-12-17 2016-11-24 网站页面间访问路径的确定方法及装置 WO2017101652A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510955078.3A CN106897196B (zh) 2015-12-17 2015-12-17 网站页面间访问路径的确定方法及装置
CN201510955078.3 2015-12-17

Publications (1)

Publication Number Publication Date
WO2017101652A1 true WO2017101652A1 (zh) 2017-06-22

Family

ID=59055778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107106 WO2017101652A1 (zh) 2015-12-17 2016-11-24 网站页面间访问路径的确定方法及装置

Country Status (2)

Country Link
CN (1) CN106897196B (zh)
WO (1) WO2017101652A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414243A (zh) * 2020-03-19 2020-07-14 北京明略软件***有限公司 确定访问路径的方法和装置、存储介质及电子装置
CN112328934A (zh) * 2020-10-16 2021-02-05 上海涛飞网络科技有限公司 访问行为路径分析方法、装置、设备及存储介质
CN112632446A (zh) * 2020-12-30 2021-04-09 江苏苏宁云计算有限公司 页面访问路径的构建方法及***

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020074B (zh) * 2017-10-13 2021-04-23 北京国双科技有限公司 确定网页流失率的方法及装置
CN110020364B (zh) * 2017-11-27 2021-11-30 北京京东尚科信息技术有限公司 确定页面访问的流量来源的方法和装置
CN111131388A (zh) * 2019-11-25 2020-05-08 上海风秩科技有限公司 用户行为路径分析方法、装置、电子设备和存储介质
CN113692014B (zh) * 2021-08-30 2023-10-27 中国平安人寿保险股份有限公司 App流量分析方法、装置、计算机设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133671A1 (en) * 2003-01-08 2004-07-08 David Taniguchi Click stream analysis
CN102122291A (zh) * 2011-01-18 2011-07-13 浙江大学 一种基于树形日志模式分析的博客好友推荐方法
CN103605742A (zh) * 2013-11-20 2014-02-26 北京搜狗科技发展有限公司 识别网络资源实体目录页的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112010004284T5 (de) * 2009-11-06 2013-01-24 International Business Machines Corporation Verfahren und System zum Verwalten von Sicherheitsobjekten
CN103631828B (zh) * 2012-08-28 2017-05-24 阿里巴巴集团控股有限公司 确定访问路径的方法和装置、确定页面流失率的方法和***
CN103312716B (zh) * 2013-06-20 2016-08-10 北京蓝汛通信技术有限责任公司 一种访问互联网信息的方法及***

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133671A1 (en) * 2003-01-08 2004-07-08 David Taniguchi Click stream analysis
CN102122291A (zh) * 2011-01-18 2011-07-13 浙江大学 一种基于树形日志模式分析的博客好友推荐方法
CN103605742A (zh) * 2013-11-20 2014-02-26 北京搜狗科技发展有限公司 识别网络资源实体目录页的方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414243A (zh) * 2020-03-19 2020-07-14 北京明略软件***有限公司 确定访问路径的方法和装置、存储介质及电子装置
CN112328934A (zh) * 2020-10-16 2021-02-05 上海涛飞网络科技有限公司 访问行为路径分析方法、装置、设备及存储介质
CN112632446A (zh) * 2020-12-30 2021-04-09 江苏苏宁云计算有限公司 页面访问路径的构建方法及***

Also Published As

Publication number Publication date
CN106897196A (zh) 2017-06-27
CN106897196B (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2017101652A1 (zh) 网站页面间访问路径的确定方法及装置
CA2966757C (en) Method and device for social platform-based data mining
JP7013587B2 (ja) マルチメディアリソースのマッチング方法、装置、コンピュータプログラムおよび電子装置
CN109255346A (zh) 点读方法、装置及电子设备
CN106156055B (zh) 搜索引擎爬虫的识别、处理方法及装置
WO2017084579A1 (zh) 热力图生成方法和装置
WO2017080454A1 (zh) 网站访问路径的聚合方法和装置
CN115422463B (zh) 基于大数据的用户分析推送处理方法及***
CN110472154A (zh) 一种资源推送方法、装置、电子设备及可读存储介质
CN113505272B (zh) 基于行为习惯的控制方法和装置、电子设备和存储介质
CN106933916B (zh) Json字符串的处理方法及装置
CN110807180A (zh) 安全认证以及训练安全认证模型的方法、装置及电子设备
CN110708360A (zh) 一种信息处理方法、***和电子设备
CN103745383A (zh) 基于运营商数据实现重定向服务的方法和***
CN106897297B (zh) 网站栏目间访问路径的确定方法及装置
EP3357261B1 (fr) Dispositif pour l'association d'un identifiant de téléphonie mobile et d'un identifiant de réseau informatique
CN106156210B (zh) 一种确定应用标识匹配列表的方法和装置
US9118563B2 (en) Methods and apparatus for detecting and filtering forced traffic data from network data
CN109429296B (zh) 用于终端与上网信息关联的方法、装置及存储介质
CN110232393B (zh) 数据的处理方法、装置、存储介质和电子装置
CN110839167B (zh) 一种视频推荐方法、装置及终端设备
CN109583453B (zh) 图像的识别方法和装置、数据的识别方法、终端
CN111340114A (zh) 图像的匹配方法及装置、存储介质和电子装置
CN108629610B (zh) 推广信息曝光量的确定方法和装置
JP2013174959A (ja) アプリケーション検査システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16874709

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16874709

Country of ref document: EP

Kind code of ref document: A1