WO2020211249A1

WO2020211249A1 - Network shopping guiding method and apparatus based on data crawling

Info

Publication number: WO2020211249A1
Application number: PCT/CN2019/103201
Authority: WO
Inventors: 侯丽; 谈卓卓
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-04-19
Filing date: 2019-08-29
Publication date: 2020-10-22
Also published as: CN110189189A

Abstract

Disclosed are a network shopping guiding method and apparatus based on data crawling. The method comprises: in response to a price comparison operation regarding a commodity on a preset page, acquiring attribute information of the commodity (S21); searching a pre-constructed first database for commodity data matching the attribute information (S22); and extracting and displaying a shopping platform and the price of the commodity on the shopping platform in each piece of commodity data matching the attribute information (S23). The pre-construction process of the first database comprises: respectively using a pre-configured corresponding web open-source crawler to perform full crawling on a plurality of preset shopping platforms so as to obtain a plurality of pieces of commodity data, and storing the plurality of pieces of commodity data in the first database; and using an incremental web crawler to respectively perform incremental crawling on the plurality of shopping platforms so as to update the commodity data in the first database. The method can economize time and labor.

Description

基于数据爬数的网络购物引导方法、装置Online shopping guidance method and device based on data crawling

技术领域Technical field

本申请涉及大数据处理技术领域，特别是涉及一种基于数据爬数的网络购物引导方法、装置。This application relates to the technical field of big data processing, in particular to a method and device for guiding online shopping based on data crawling.

背景技术Background technique

目前，网络购物平台有很多，同样的产品在不同的网络购物平台上的价格也略有差别，用户想找到价格比较低的心仪产品需要登陆很多网络购物平台进行比价，耗时耗力。At present, there are many online shopping platforms, and the prices of the same products on different online shopping platforms are also slightly different. Users who want to find a favorite product with a lower price need to log in to many online shopping platforms to compare prices, which is time-consuming and labor-intensive.

发明内容Summary of the invention

本申请实施例提供一种基于数据爬数的网络购物引导方法、装置，可以节省时间和人力，提高购物效率。The embodiments of the present application provide a method and device for guiding online shopping based on data crawling, which can save time and manpower and improve shopping efficiency.

本申请实施例提供一种基于数据爬数的网络购物引导方法，所述方法包括：An embodiment of the present application provides a method for guiding online shopping based on data crawling, and the method includes:

响应于在预设页面上对一商品的比价操作，获取所述商品的属性信息；In response to a price comparison operation for a commodity on a preset page, acquiring attribute information of the commodity;

在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；Searching for product data matching the attribute information in the first pre-built database;

提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；Extracting and displaying the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

其中，所述第一数据库的预先构建过程包括：Wherein, the pre-construction process of the first database includes:

对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；Pre-configured corresponding online open source crawlers are used to crawl the preset multiple shopping platforms in full to obtain multiple pieces of product data, and the multiple pieces of product data are stored in the first database; wherein, the network open source crawler There is a one-to-one correspondence with the shopping platform. Each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform; the incremental web crawler is used for the multiple shopping The platform performs incremental crawling respectively to update the product data in the first database;

其中，预设的每一购物平台所对应的网络开源爬虫的预先配置过程包括：根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。Among them, the preset pre-configuration process of the open source crawler corresponding to each shopping platform includes: according to crawling requirements, selecting the required code block from the pre-built second database; and according to the selected code block The execution sequence is to sort the selected code blocks to obtain the corresponding code block sequence; according to the code block sequence, the network open source crawler corresponding to the shopping platform is configured; wherein, the second database includes multiple Code block; the pre-construction process of the second database includes: crawling data for multiple preset shopping platforms separately, and using the computer code corresponding to each crawling step in the data crawling process as a code block .

在一些实施例中，所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置，包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。In some embodiments, the configuring the online open source crawler corresponding to the shopping platform according to the code block sequence includes: determining the online open source corresponding to the shopping platform according to the code block sequence and a preset description document The configuration file of the crawler; wherein the description document stores the description information used to generate the configuration file.

在一些实施例中，所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。In some embodiments, performing data crawling on the plurality of preset shopping platforms respectively includes: writing corresponding computer codes for the plurality of preset shopping platforms, and using the corresponding computer code for each shopping platform The computer code crawls the website data.

在一些实施例中，所述采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，包括：采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。In some embodiments, the step of using an incremental web crawler to incrementally crawl the plurality of shopping platforms includes: using the incremental web crawler to scan each page in the local page set at the same preset frequency. Perform incremental crawling; or, use the incremental web crawler to incrementally crawl each page according to the respective change frequency of each page in the local page set; or use the incremental web crawler to preset The first frequency of incremental crawling of the first page subset, and the incremental crawling of the second page subset at the preset second frequency; wherein the first frequency is higher than the second frequency The local page set is a collection of pages visited by various open source crawlers on the multiple shopping platforms; the first page subset and the second page subset are based on the page change frequency In the two subsets obtained by dividing the local page set, the change frequency of any page in the first page subset is higher than the change frequency of any page in the second page subset.

在一些实施例中，所述展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，包括：按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。In some embodiments, the displaying the shopping platform in each item of product data that matches the attribute information and the price of the product on the shopping platform includes: displaying each item in order of price from low to high The shopping platform in the product data matching the attribute information and the price of the product on the shopping platform, and the shopping platform corresponding to the lowest price is marked.

在一些实施例中，所述方法还包括：根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。In some embodiments, the method further includes: providing purchase suggestion information based on the price in each piece of product data that matches the attribute information.

在一些实施例中，所述获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，所述方法还包括：若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。In some embodiments, the method further includes: if the shopping platform in each product data matching the attribute information and the price of the product on the shopping platform are acquired and displayed, the method further includes: If there is no product data matching the attribute information in a database, each preset shopping platform is crawled with a corresponding web open source crawler configured in advance to obtain product data matching the attribute information .

本申请实施例还提供一种基于数据爬数的网络购物引导装置，所述装置包括：An embodiment of the present application also provides an online shopping guidance device based on data crawling, the device includes:

属性获取模块，用于响应于在购物页面上对一商品的比价操作，获取所述商品的属性信息；The attribute obtaining module is used to obtain the attribute information of the commodity in response to the price comparison operation on the shopping page;

数据查找模块，用于在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；A data search module, which is used to search for product data matching the attribute information in the first pre-built database;

提取展示模块，用于提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；The extraction and display module is used to extract and display the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

第一数据库构建模块，用于预先构建所述第一数据库；The first database construction module is used to construct the first database in advance;

其中，所述第一数据库构建模块包括：Wherein, the first database building module includes:

第一爬取单元，用于对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；The first crawling unit is used for crawling a plurality of preset shopping platforms in full using corresponding pre-configured web open source crawlers to obtain multiple pieces of product data, and store the multiple pieces of product data in the first database Wherein, the online open source crawler has a one-to-one correspondence with a shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform;

第二爬取单元，用于采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新。The second crawling unit is used for incremental crawling of the multiple shopping platforms by using an incremental web crawler to update the product data in the first database.

爬虫配置单元，用于对预设的每一购物平台所对应的网络开源爬虫进行预先配置，具体用于：根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；The crawler configuration unit is used to pre-configure the open source crawler corresponding to each preset shopping platform, and is specifically used to: select the required code block from the pre-built second database according to the crawling requirements; and The execution order of each selected code block is sorted to obtain the corresponding code block sequence; according to the code block sequence, the network open source crawler corresponding to the shopping platform is configured; wherein, the The second database includes multiple code blocks;

第二数据库构建单元，用于对所述第二数据库进行预先构建，具体用于：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。The second database construction unit is used to construct the second database in advance, and is specifically used to: crawl data on a plurality of preset shopping platforms separately, and perform data crawling on each crawling step in the data crawling process. The corresponding computer code serves as a code block.

本申请实施例还提供一种存储有计算机可读指令的存储介质，所述计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器执行上述基于数据爬数的网络购物引导方法的步骤。The embodiment of the present application also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause one or more processors to execute the aforementioned online shopping based on data crawling Steps of the boot method.

本申请实施例提供的基于数据爬数的网络购物引导方法、装置，首先获取属性信息，然后在预先构建的第一数据库中查找与属性信息相匹配的商品数据，进而提取相匹配的商品数据中的购物平台和价格进行展示，可见本申请实施例可以对多个购物平台上的同款商品的价格进行汇总，进而便于用户进行选择购买，不需要用户分别在各个购物平台上搜索、比价等繁琐的操作，能够大大节省时间和人力，提高购物效率。而且，第一数据库是通过网络开源爬虫进行全量爬取来进行第一数据库构建并通过增量式网络爬虫进行增量爬取来进行第一数据库更新的，从而可以保证第一数据库中的商品数据比较全面且是最新商品数据，以保证比价的全面性和有效性。The method and device for guiding online shopping based on data crawling provided by the embodiments of the present application first obtain attribute information, and then search for product data matching the attribute information in the first pre-built database, and then extract the matching product data It can be seen that the embodiment of the application can summarize the prices of the same product on multiple shopping platforms, which is convenient for users to choose and purchase, without the need for users to search and compare prices on various shopping platforms. The operation can greatly save time and manpower and improve shopping efficiency. Moreover, the first database is fully crawled through the open source web crawler to construct the first database and incremental crawled through the incremental web crawler to update the first database, thereby ensuring the product data in the first database More comprehensive and up-to-date commodity data to ensure the comprehensiveness and effectiveness of the price comparison.

附图说明Description of the drawings

图1为一个实施例中计算机设备的内部结构框图；Figure 1 is a block diagram of the internal structure of a computer device in an embodiment;

图2为一个实施例中基于数据爬数的网络购物引导方法的流程图；Figure 2 is a flowchart of an online shopping guidance method based on data crawling in an embodiment;

图3为一个实施例中基于数据爬数的网络购物引导装置的结构框图。Fig. 3 is a structural block diagram of a device for guiding online shopping based on data crawling in an embodiment.

具体实施方式detailed description

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.

可以理解，本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种元件，但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。It can be understood that the terms "first", "second", etc. used in this application can be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish the first element from another element.

图1为本申请一个实施例中计算机设备的结构示意图。如图1所示，该计算机设备包括通过***总线连接的处理器、非易失性存储介质、存储器和网络接口。其中，该计算机设备的非易失性存储介质存储有操作***、数据库和计算机可读指令，数据库中可存储有控件信息序列，该计算机可读指令被处理器执行时，可使得处理器实现一种基于数据爬数的网络购物引导方法。该计算机设备的处理器用于提供计算和控制能力，支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令，该计算机可读指令被处理器执行时，可使得处理器执行一种基于数据爬数的网络购物引导方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解，图1中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。可理解的是，这里的数据库与下文中的第一数据库、第二数据库不同。Figure 1 is a schematic structural diagram of a computer device in an embodiment of the application. As shown in Figure 1, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. Wherein, the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor can realize a An online shopping guidance method based on data crawling. The processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment. A computer readable instruction may be stored in the memory of the computer device, and when the computer readable instruction is executed by the processor, the processor may execute an online shopping guidance method based on data crawling. The network interface of the computer device is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 1 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts. It is understandable that the database here is different from the first database and the second database below.

本申请实施例提供一种基于数据爬数的网络购物引导方法，该方法可以应用于图1所示出的计算机设备中，该方法包括：The embodiment of the present application provides an online shopping guidance method based on data crawling. The method can be applied to the computer device shown in FIG. 1, and the method includes:

S21、响应于在预设页面上对一商品的比价操作，获取所述商品的属性信息；S21: In response to a price comparison operation on a commodity on a preset page, obtain attribute information of the commodity;

可理解的是，上述预设页面可以是购物平台上的购物页面或者独立于各个购物平台的购物引导平台上的一个页面，当然还可以是其他页面。下面对具体场景进行举例说明：It is understandable that the foregoing preset page may be a shopping page on a shopping platform or a page on a shopping guidance platform independent of each shopping platform, and of course it may be other pages. The following are examples of specific scenarios:

例如，本申请实施例提供的购物引导方法对应的计算机程序被嵌入各个购物平台上，当用户在某一购物平台上浏览商品时，想要对购物页面上的某一商品进行比价，则点击购物页面上的比价按钮(例如，PC端用户将光标移到比价按钮上并点击)，或者执行某种指定的手势操作，进而触发了本申请实施例提供的购物引导方法。可见，此时的预设页面是购物平台上的购物页面。For example, the computer program corresponding to the shopping guide method provided by the embodiment of the present application is embedded in each shopping platform. When a user browses a product on a certain shopping platform and wants to compare the price of a certain product on the shopping page, click shopping The price comparison button on the page (for example, the PC user moves the cursor on the price comparison button and clicks), or performs a certain designated gesture operation, thereby triggering the shopping guide method provided in the embodiment of the present application. It can be seen that the default page at this time is the shopping page on the shopping platform.

再例如，本申请实施例提供的购物引导方法对应的计算机程序被构架成一个独立的购物引导平台，爬取到的商品数据都会显示在购物引导平台上。用户登陆购物引导平台，并在购物引导平台的购物引导页面上输入或找到想要的商品的相关信息，并点击购物引导页面上的比价按钮，进而触发本申请实施例提供的购物引导方法。可见，此时的预设页面是独立于各个购物平台的购物引导平台上的页面。For another example, the computer program corresponding to the shopping guidance method provided in the embodiment of the present application is structured into an independent shopping guidance platform, and the crawled product data will be displayed on the shopping guidance platform. The user logs into the shopping guidance platform, enters or finds relevant information about the desired product on the shopping guidance page of the shopping guidance platform, and clicks the price comparison button on the shopping guidance page, thereby triggering the shopping guidance method provided by the embodiment of the application. It can be seen that the preset page at this time is a page on the shopping guidance platform independent of each shopping platform.

可理解的是，上述比价操作的具体形式也有多种，例如，在移动终端上点击比价按钮，在PC端将光标移至比价按钮上并点击，或者，执行某种手势操作等。It is understandable that there are many specific forms of the above-mentioned price comparison operations, for example, clicking the price comparison button on the mobile terminal, moving the cursor to the price comparison button and clicking on the PC terminal, or performing certain gesture operations.

其中，属性信息，可以包括尺寸、颜色、品牌、容量、型号、名称、材质等。Among them, the attribute information may include size, color, brand, capacity, model, name, material, etc.

举例来说，用户想要购买一款奶瓶，在购物引导平台的购物引导界面上输入品牌为贝亲、容量为240ml、颜色为橘黄色的宽口径奶瓶，并点击了相应的比价按钮，进而获取奶瓶的属性信息，获取到的属性信息可包括：贝亲、240ml、橘黄色、宽口径奶瓶。可见本申请实施例是基于待比价的商品的属性信息进行购物平台和价格汇总的，因此可以减少一些无意义或不明确的参数对用户的错误引导。For example, a user wants to buy a milk bottle, enters a wide-caliber milk bottle with a brand of Pigeon, a 240ml capacity, and an orange color on the shopping guide interface of the shopping guide platform, and clicks the corresponding price comparison button to obtain the milk bottle The attribute information obtained can include: pigeon, 240ml, orange, wide-caliber baby bottle. It can be seen that the embodiment of the application collects the shopping platform and the price based on the attribute information of the commodities to be compared, so it can reduce the misguided users by some meaningless or unclear parameters.

S22、在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；S22: Search for product data matching the attribute information in the first pre-built database;

可理解的是，第一数据库是预先构建的，每次在进行购物引导时使用即可。It is understandable that the first database is pre-built and can be used every time for shopping guidance.

S201、对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；S201. Use pre-configured corresponding web open source crawlers to perform full crawling on multiple preset shopping platforms to obtain multiple pieces of product data, and store the multiple pieces of product data in a first database; wherein, the network The open source crawler has a one-to-one correspondence with the shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform;

可理解的是，这里利用网络开源爬虫进行的全量爬取为初级阶段的数据爬取，考虑到第一次爬取的数据量较大，可以对爬取的类别进行设置，例如，先只爬取一部分的类别，其他的类别可以以后再进行爬取。也就是说，分为多次爬取，每次只爬取一部分类别，避免一次性爬取的数据量过大造成网络拥堵等问题。It is understandable that the full amount of crawling performed by the open source web crawler here is the initial stage of data crawling. Considering the large amount of data crawled for the first time, the crawling category can be set, for example, only crawl first Take some categories, and other categories can be crawled later. In other words, it is divided into multiple crawls, and only a part of the categories are crawled each time, so as to avoid problems such as network congestion caused by the excessive amount of data crawled at one time.

可理解的是，上述多个购物平台可以为目前主流的一些购物平台，例如，淘宝、京东、唯品会、拼多多、1号店等。可以预先存储这些购物平台的网址，在构建第一数据库时提取使用。It is understandable that the above-mentioned multiple shopping platforms may be some of the current mainstream shopping platforms, such as Taobao, JD, Vipshop, Pinduoduo, Yihaodian, etc. The URLs of these shopping platforms can be pre-stored and extracted for use when constructing the first database.

可理解的是，针对每一个购物平台，预先构建一个一一对应的网络开源爬虫，并采用对应的网络开源爬虫对该购物平台进行全量爬取。下面对预设的每一购物平台所对应的网络开源爬虫的预先配置过程进行介绍：It is understandable that for each shopping platform, a one-to-one corresponding online open source crawler is constructed in advance, and the corresponding online open source crawler is used to crawl the shopping platform in full. The following describes the pre-configuration process of the web open source crawler corresponding to each preset shopping platform:

S2011、根据爬取要求，从预先构建的第二数据库中选择所需的代码块；根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；S2011: According to crawling requirements, select the required code blocks from the pre-built second database; according to the execution order of the selected code blocks, sort the selected code blocks to obtain the corresponding code block sequence;

其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。可理解的是，上述计算机代码为爬取步骤对应的代码，可以简称为爬取代码。Wherein, the second database includes a plurality of code blocks; the pre-building process of the second database includes: crawling data of a plurality of preset shopping platforms respectively, and crawling each of the data crawling processes Take the computer code corresponding to the step as a code block. It is understandable that the foregoing computer code is the code corresponding to the crawling step, and may be referred to as crawling code for short.

可理解的是，第二数据库中的代码块不仅仅可以包括对不同的购物平台爬取商品数据的代码块，还可以实现其他功能的代码块，例如，在新闻网站中爬取视频格式的新闻内容的代码块，这样第二数据库不仅可以用于配置购物网站对应的爬虫，还可以配置其他类型网站对应的爬虫，用于实现其他的爬取任务。It is understandable that the code blocks in the second database can not only include code blocks for crawling product data for different shopping platforms, but also code blocks for other functions, such as crawling news in video format on news websites. The code block of the content, so that the second database can be used not only to configure the crawler corresponding to the shopping website, but also to configure the crawler corresponding to other types of websites to implement other crawling tasks.

在实际应用中，爬取要求包括对哪种购物平台进行爬取、爬取何种内容。另外，不同的购物平台对商品的分类方式、类别层级等可能不同，也就是说，不同购物平台的分类特点不同，进而对数据爬取的要求不同。例如，在某些平台中，奶瓶属于婴童类别下的餐具类别，而在另一些平台上，奶瓶属于母婴类别下的喂养用品类别，即两者分类方式不同。再例如，购物平台A和B的大类别一般包括食品、生鲜、数码、母婴等；在平台A上，母婴这一大类别所包括的中间类别有婴儿用品、孕产妇用品，中间类别婴儿用品又包括婴儿喂养用品、婴儿洗护用品、玩具用品、尿裤纸巾等小类别。而平台B上母婴这一大类别包括婴儿喂养用品、婴儿洗护用品、孕产妇用品等小类别，可见两个平台的分类层级不同，而爬取的类别层级应当与购物平台的类别层级保持一致，因此爬取的类别层级也是不同的，也就是说，两个平台的爬取要求是不一样的。可见，爬取要求还包括所要进行爬取的购物平台的分类特点等。In practical applications, crawling requirements include which shopping platform to crawl and what content to crawl. In addition, different shopping platforms may have different classification methods, category levels, etc., that is to say, different shopping platforms have different classification characteristics, and thus have different requirements for data crawling. For example, on some platforms, milk bottles belong to the category of tableware under the category of babies and children, while on other platforms, bottles belong to the category of feeding supplies under the category of mothers and babies, that is, the two classification methods are different. For another example, the major categories of shopping platforms A and B generally include food, fresh food, digital, maternal and child, etc.; on platform A, the maternal and child category includes intermediate categories such as baby products, maternal products, and intermediate categories. Baby products include baby feeding products, baby toiletries, toy products, diapers and tissues. The maternal and infant category on platform B includes small categories such as baby feeding products, baby toiletries, and maternal products. It can be seen that the classification levels of the two platforms are different, and the category level of crawling should remain the same as the category level of the shopping platform The same, so the crawling category levels are also different, that is, the crawling requirements of the two platforms are different. It can be seen that the crawling requirements also include the classification characteristics of the shopping platform to be crawled.

可理解的是，在第二数据库构建过程中，将每一个爬取步骤对应的代码作为一个代码块，一个代码块也可以称之为一个组件，也就是说，一个步骤对应一个代码块或一个组件。所谓的步骤，例如，爬取网页时的登陆的步骤、进入列表的步骤、翻页的步骤、下拉翻滚的步骤等。可见，将每一个步骤对应的计算机代码作为一个代码块保存至第二数据库中，相当于将每一个步骤作为一个单独的组件保存下来。It is understandable that in the process of constructing the second database, the code corresponding to each crawling step is regarded as a code block, and a code block can also be called a component, that is, a step corresponds to a code block or a code block. Components. The so-called steps include, for example, the login steps when crawling a webpage, the steps to enter the list, the steps to turn pages, the steps to scroll down and so on. It can be seen that storing the computer code corresponding to each step as a code block in the second database is equivalent to storing each step as a separate component.

在实际应用中，上述对预设的多个购物平台分别进行数据爬取的过程可以包括：对所述预设的多个购物平台分别编写对应的计算机代码，并采用每一网站对应的计算机代码对该网站进行数据爬取。In practical applications, the aforementioned process of crawling data for multiple preset shopping platforms may include: writing corresponding computer codes for the multiple preset shopping platforms, and using the computer code corresponding to each website Crawl data on this website.

也就是说，针对每一个预设的购物平台先编写计算机代码，这样可以得到适合爬取该网站的爬虫，然后采用每一个预设的购物平台对应的计算机代码(即每一个预设的购物平台对应的爬虫)进行数据爬取，将爬取过程中的每一个步骤对应的代码作为一个代码块(也可以称之为一个组件)保存至第二数据库中。这种针对每一个预设的购物平台编写计算机代码的方式，能够得到非常适合该购物平台的爬虫，以便使得在数据爬取过程中各个步骤能够非常有效的完成爬取工作。That is to say, for each preset shopping platform, write computer code first, so that you can get a crawler suitable for crawling the website, and then use the computer code corresponding to each preset shopping platform (that is, each preset shopping platform The corresponding crawler) performs data crawling, and saves the code corresponding to each step in the crawling process as a code block (also called a component) in the second database. This way of writing computer code for each preset shopping platform can obtain a crawler that is very suitable for the shopping platform, so that each step in the data crawling process can effectively complete the crawling work.

举例来说，通过上述过程构建的第二数据库中的多个代码块所对应的多个步骤可以包括：(1)登录记录cookie；(2)进入列表页爬取网络地址URL；(3)进入文章页爬取用户ID；(4)点击next翻到下一页继续执行；(5)进入文章页爬取文章内容；(6)下拉滚动条出现下一页内容；(7)搜索框输入内容搜索；(8)进入文章页爬取大类别的分类信息；(9)进入文章页爬取中间类别的分类信息；(10)进入文章页爬取小类别中的商品数据。For example, the multiple steps corresponding to multiple code blocks in the second database constructed through the above process may include: (1) login record cookie; (2) enter the list page to crawl the network address URL; (3) enter The article page crawls the user ID; (4) Click next to turn to the next page to continue execution; (5) Enter the article page to crawl the content of the article; (6) The next page content appears on the drop-down scroll bar; (7) The search box input content Search; (8) Enter article page to crawl classification information of large categories; (9) Enter article page to crawl classification information of intermediate categories; (10) Enter article page to crawl product data in small categories.

可理解的是，本申请实施例根据数据爬取要求从第二数据库中选择出所需要的各个代码块，由于不同的代码块对应不同的步骤，也就是说，各个代码块的执行顺序对应各个步骤的执行顺序，因此需要对各个代码块进行排序，相当于按照执行顺序对各个步骤进行排序。It is understandable that the embodiment of the application selects the required code blocks from the second database according to the data crawling requirements, because different code blocks correspond to different steps, that is, the execution order of each code block corresponds to each step Therefore, it is necessary to sort the code blocks, which is equivalent to sorting the steps in the order of execution.

例如，用户想要爬取购物平台A的内容，根据这一购物平台A的类别层级，可以知道爬取要求为按照大类别-中间类别-小类别的顺序逐层爬取，具体的爬取步骤为：登录-搜索热词-爬取用户ID-爬取大类别的分类信息-爬取中间类别的分类信息-爬取小类别中的商品数据--翻页，可见依据上文举例，其步骤顺序大致是(1)-(7)-(3)-(8)-(9)-(10)-(4)，因此需要从第二数据库中选择出步骤(1)、(3)-(4)-(7)-(8)-(9)-(10)对应的代码块，然后将这四个代码块按照执行顺序(1)-(7)-(3)-(8)-(9)-(10)-(4)进行排序，得到对应的代码块序列。For example, a user wants to crawl the content of shopping platform A. According to the category level of this shopping platform A, it can be known that the crawling requirements are crawled layer by layer in the order of large category-middle category-small category, and the specific crawling steps It is: login-search hot words-crawl user ID-crawl classification information of large categories-crawl classification information of intermediate categories-crawl product data in small categories-turn pages, you can see the steps according to the above example The order is roughly (1)-(7)-(3)-(8)-(9)-(10)-(4), so you need to select steps (1), (3)-( from the second database 4)-(7)-(8)-(9)-(10) corresponding code blocks, and then these four code blocks in the order of execution (1)-(7)-(3)-(8)-( 9)-(10)-(4) are sorted to obtain the corresponding code block sequence.

S2012、根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；S2012, according to the code block sequence, configure the network open source crawler corresponding to the shopping platform;

可理解的是，对所述所需爬虫进行配置的过程实际上是生成配置文件的过程，得到配置文件后所需爬虫即配置完成。因此上述步骤S2012的具体过程可以包括：根据所述代码块序列和预设的说明文档，确定所述所需爬虫的配置文件。其中，说明文档中可以存储有一些说明信息，这些说明信息可以辅助用户生成配置文件，例如，生成配置文件的流程步骤，在每一步骤中需要那些信息等。It is understandable that the process of configuring the required crawler is actually a process of generating a configuration file, and the configuration of the required crawler is completed after the configuration file is obtained. Therefore, the specific process of the foregoing step S2012 may include: determining the configuration file of the required crawler according to the code block sequence and the preset description document. Among them, some descriptive information can be stored in the descriptive document, which can assist the user in generating the configuration file, for example, the process steps of generating the configuration file, and what information is needed in each step.

在实际应用中，可以通过可扩展标记语言XML的形式进行配置，也就是说，配置文件中的代码可以采用XML的形式，可以提高上述所需爬虫的通用性。In practical applications, it can be configured in the form of extensible markup language XML, that is, the code in the configuration file can be in the form of XML, which can improve the versatility of the crawler required.

可理解的是，数据爬取要求不仅仅包括爬取是哪个网站、爬取何种内容，还可以包括是全量爬取还是增量爬取、爬取javascript网页内容还是非javascript网页内容、从第几级网页开始抓取内容、翻页模式是不是下拉滑动、所要抓取字段有何属性等。此时，生成的配置文件中的片段代码包括如下内容：It is understandable that data crawling requirements include not only which website is crawled and what content is crawled, but also whether it is crawled in full or incrementally, crawled javascript web content or non-javascript web content, from the first Several levels of web pages start to fetch content, whether the page turning mode is a pull-down slide, and what attributes are the fields to be fetched. At this point, the fragment code in the generated configuration file includes the following:

可理解的是，上述片段代码的大致思路是：It is understandable that the general idea of the above fragment code is:

首先是seed(即种子，顾名思义是以种子为引进而发散抓取内容)–>url(即配置种子的地址，例如，http://www.chinanews.com/business/gd.shtml)–>fully(即是否为全量爬取，1为是，0为否)->javascript(即是否为javascript网页，1为是，0为否)->keyword(关键字，在上述片段代码中未设置关键字)->seedArea (即种子所在区域，如若不填则将全网页的URL地址全部取下，在上述片段代码中种子所在区域为！[CDATA[#content_right>div.content_list]])->start(即从第几级网页开始抓取内容，例如，上述片段代码是从第2级网页开始抓取)->turning(即翻页模式，slider为下拉滑动)–>meta(即需要抓取字段的属性，例如，field即领域、site即地址、tag即标签、index即索引、pic即图片)。The first is seed (that is, seed, as the name implies, it is used to introduce the seed to diverge to capture content) -> url (that is, the address of the configuration seed, for example, http://www.chinanews.com/business/gd.shtml) -> fully (I.e. whether it is a full crawl, 1 is yes, 0 is no) ->javascript (i.e. whether it is a javascript webpage, 1 is yes, 0 is no) ->keyword (keyword, no keyword is set in the above snippet code )->seedArea (ie the area where the seed is located. If you don’t fill it in, remove all the URL addresses of the entire webpage. In the above snippet code, the seed area is! [CDATA[#content_right>div.content_list]])->start( That is, start to grab the content from the level of the web page, for example, the above snippet code is grabbed from the second level web page) -> turning (that is, the page turning mode, the slider is the drop-down sliding) -> meta (that is, the fields that need to be crawled Attributes, for example, field means domain, site means address, tag means label, index means index, pic means picture).

从上述片段代码可知，可以选择javascript网页或非javascript网页，也就是说可以实现javascript网页抓取和非javascript页面抓取。当选择javascript网页时，可以精确解释javascript代码，进而转变为正常的带标签的html代码。可理解的是，javascript网页即为动态生成的页面，非javascript网页即为静态生成的页面。From the above snippet code, it can be known that javascript webpage or non-javascript webpage can be selected, which means that javascript webpage crawling and non-javascript webpage crawling can be realized. When the javascript web page is selected, the javascript code can be accurately interpreted and then converted into normal tagged html code. It is understandable that javascript web pages are dynamically generated pages, and non-javascript web pages are statically generated pages.

由于本申请实施例中可以根据数据爬取要求对不同的代码块进行组合排序(即对各种步骤进行任意组合配置)，并按照排序得到的代码块序列进行爬虫配置，因此配置得到的爬虫可以实现完整页面下载，也可以实现精准抓取，例如，只抓取图片。当然，通过对数据爬取需求的设置，还可以实现集群分布式爬取，以提高爬取速度。Since in the embodiment of this application, different code blocks can be combined and sorted according to data crawling requirements (that is, various steps can be combined and configured arbitrarily), and crawler configuration can be performed according to the code block sequence obtained by sorting, the configured crawler can be Achieve complete page download, but also accurate capture, for example, only capture images. Of course, by setting data crawling requirements, cluster distributed crawling can also be implemented to improve crawling speed.

可理解的是，上述网络开源爬虫是网络上开源的爬虫，例如，可以是聚焦网络爬虫。聚焦网络爬虫(Focused Crawler)，又称主题网络爬虫(Topical Crawler)，是指选择性地爬行那些与预先定义好的主题相关页面的网络爬虫。和通用网络爬虫(也即全网爬虫)相比，聚焦网络爬虫只需要爬行与主题相关的页面，极大地节省了硬件和网络资源，保存的页面也由于数量少而更新快。It is understandable that the above-mentioned web open source crawler is an open source crawler on the web, for example, it may be a focused web crawler. Focused Crawler, also known as Topical Crawler, refers to a web crawler that selectively crawls pages related to pre-defined topics. Compared with general web crawlers (that is, full web crawlers), focused web crawlers only need to crawl pages related to the subject, which greatly saves hardware and network resources. The saved pages are also updated quickly due to a small number.

在实际应用中，当采用聚焦网络爬虫进行全量爬取时，具体的爬取过程可以包括如下步骤：In practical applications, when a focused web crawler is used for full crawling, the specific crawling process may include the following steps:

a1、根据所述多个购物平台各自的网址，采用所述聚焦网络爬虫分别登陆所述多个购物平台；a1. Use the focused web crawler to log in to the multiple shopping platforms according to the respective URLs of the multiple shopping platforms;

这里，利用聚焦网络爬虫进行平台登陆。在登陆时可以会遇到购物平台的反爬机制，所谓的反爬机制是指一个IP地址对一个网站进行频繁访问，该网站就会对该IP地址进行访问限制。此时可以通过代理地址来实现登陆。也就是说，登陆购物平台的过程可以包括：通过聚焦网络爬虫向每一个购物平台的服务器发送平台登陆请求，所述平台登陆请求中携带有代理地址，且周期性通过所述聚焦网络爬虫对所述代理地址进行修改或者在遇到访问受限或访问错误时通过所述聚焦网络爬虫对所述代理地址进行修改。当代理地址被修改后，购物平台的服务器就不会进行拦截。例如，聚焦网络爬虫每隔半小时修改一次代理地址，再将修改后的代理地址存储起来，在需要访问购物平台时，提取修改后的代理地址即可。再例如，当聚焦网络爬虫向购物平台的服务器发送平台登陆请求后收到访问受限或者访问错误的反馈信息，此时聚焦网络爬虫对平台登陆请求中的代理地址进行修改，然后发送携带有修改后的代理地址的平台登陆请求，这样就会成功登陆购物网站。Here, use the focused web crawler for platform login. When logging in, you may encounter the anti-crawl mechanism of the shopping platform. The so-called anti-crawl mechanism refers to an IP address that frequently visits a website, and the website will restrict access to the IP address. At this time, you can log in through the proxy address. That is to say, the process of logging in to the shopping platform may include: sending a platform login request to the server of each shopping platform through the focused web crawler, the platform login request carries the proxy address, and periodically through the focused web crawler, Modify the proxy address or modify the proxy address through the focused web crawler when access is restricted or an access error is encountered. When the proxy address is modified, the server of the shopping platform will not intercept it. For example, focus web crawlers modify the proxy address every half an hour, and then store the modified proxy address, and when it needs to access the shopping platform, extract the modified proxy address. For another example, when the Focused web crawler sends a platform login request to the server of the shopping platform and receives the feedback information of restricted access or access error, the Focused web crawler modifies the proxy address in the platform login request, and then sends with the modification After the proxy address, the platform login request will successfully log in to the shopping website.

a2、在登陆每一个购物平台后，采用所述聚焦网络爬虫按照类别从大到小的方式逐层爬取，得到多条商品数据。a2. After logging in to each shopping platform, the focused web crawler is used to crawl layer by layer according to the category from large to small to obtain multiple pieces of product data.

例如，商品的类别有大类别、中间类别、小类别等多个类别层级，按照类别从大到小的方式可以理解为先爬取商品大类别，然后爬取商品中间类别，再爬取商品小类别，最后爬取具体商品。按照类别从大到小的方式逐层爬取的过程也是按照类别从大到小的方式逐层剥取的过程。剥取的过程实际上聚焦网络爬虫在模拟用户进行模拟点击从而能够进入到点击按键链接的页面。For example, there are multiple categories of product categories such as large categories, intermediate categories, and small categories. According to the way from large to small, it can be understood as crawling the large categories of products first, then crawling the intermediate categories of products, and then crawling the small products Category, and finally crawl specific products. The process of crawling layer by layer from large to small category is also a process of stripping layer by layer from large to small category. The process of stripping actually focuses on the web crawler simulating a user to perform a simulated click so that it can enter the page where the button link is clicked.

可理解的是，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格，当然，在每一条商品数据中还可以包括对应商品的所属类别。所属类别、购物平台、价格、属性均为商品的相关信息。按照类别从大到小的方式爬取到的关于奶瓶奶嘴的多条商品数据如下表1所示：It is understandable that each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform. Of course, each product data can also include the corresponding product information. category. The category, shopping platform, price, and attributes are all relevant information about the product. A number of product data about baby bottle pacifiers crawled according to categories from large to small are shown in Table 1 below:

表1关于奶瓶奶嘴的商品数据表Table 1 Product data sheet about baby bottle pacifiers

可理解的是，上述多级类别是根据爬虫的爬取情况来存储的，根据商品的划分不同，类别的级别名称可能不同。It is understandable that the above-mentioned multi-level categories are stored according to the crawling situation of the crawler, and the level names of the categories may be different according to the classification of the goods.

可理解的是，上述一级类别、二级类别和三级类别均为商品的所属类别，四级类别～七级类别均为商品属性，可见在一条商品数据中可以包括商品属性、所属类别、购物平台和价格。It is understandable that the above-mentioned first-level, second-level, and third-level categories are all categories of goods, and the fourth-level to seventh-level categories are all product attributes. It can be seen that a single product data can include product attributes, categories, Shopping platform and prices.

参见上述序号为1的商品数据，可知爬取路径：母婴-喂养用品-奶瓶奶嘴-贝亲-宽口径奶瓶-160ml-绿色，即可爬取到贝亲160ml绿色宽口径奶瓶。可见爬取路径中包含了商品的所属类别和商品的属性信息。Refer to the above product data with serial number 1, we can see the crawling path: mother and baby-feeding supplies-feeding bottle nipple-pigeon-wide-bore milk bottle-160ml-green, you can crawl to the pigeon 160ml green wide-bore milk bottle. It can be seen that the crawling path contains the category of the product and the attribute information of the product.

可理解的是，在步骤S22中查找与商品的属性信息相匹配的商品数据，实际上是在第一数据库中存储的商品数据中查找与商品的属性信息相同的属性信息。例如，在上表1中查找是否存在与待比价的商品的属性信息相同的四级类别～七级类别。It is understandable that, in step S22, searching for product data that matches the attribute information of the product is actually searching for the same attribute information as the attribute information of the product in the product data stored in the first database. For example, look up in Table 1 above whether there are four to seven categories that are the same as the attribute information of the commodity to be compared.

可理解的是，所谓的比价是指同款商品之间的比价，即具有相同属性信息的商品之间的比价，否则比价是没有意义的。It is understandable that the so-called price comparison refers to the price comparison between commodities of the same type, that is, the price comparison between commodities with the same attribute information, otherwise the price comparison is meaningless.

S202、采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新。S202. Use an incremental web crawler to incrementally crawl the multiple shopping platforms, respectively, to update the product data in the first database.

可理解的是，增量式网络爬虫(Incremental Web Crawler)是指对已下载网页采取增量式更新和只爬行新产生的或者已经发生变化网页的爬虫，它能够在一定程度上保证所爬行的页面是尽可能新的页面。与周期性爬行和刷新页面的网络爬虫相比，增量式爬虫只会在需要的时候爬行新产生或发生更新的页面，并不重新下载没有发生变化的页面，可有效减少数据下载量，及时更新已爬行的网页，减小时间和空间上的耗费。简而言之，增量式网络爬虫的目标是保持本地页面集中存储的页面为最新页面。为实现这一目标，增量式网络爬虫需要通过重新访问本地页面集中页面，以更新页面内容，进而更新第一数据库中存储的商品数据。It is understandable that Incremental Web Crawler refers to a crawler that takes incremental updates to downloaded web pages and only crawls newly generated or changed web pages. It can guarantee the crawled to a certain extent. The page is as new as possible. Compared with web crawlers that periodically crawl and refresh pages, incremental crawlers will only crawl newly generated or updated pages when needed, and will not re-download pages that have not changed, which can effectively reduce the amount of data downloads and timely Update the crawled web pages, reducing time and space consumption. In short, the goal of incremental web crawlers is to keep the pages stored in the local page centrally up-to-date. In order to achieve this goal, the incremental web crawler needs to revisit the local page collection page to update the page content, and then update the product data stored in the first database.

在实际应用中，采用增量式网络爬虫对所述多个购物平台分别进行增量爬取的具体过程可以包括：(1)采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，(2)采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，(3)采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取。其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。In practical applications, the specific process of using incremental web crawlers to incrementally crawl the multiple shopping platforms may include: (1) using the incremental web crawler to concentrate on local pages at the same preset frequency Incremental crawling of each page; or (2) using the incremental web crawler to incrementally crawl each page according to the change frequency of each page in the local page set; or, (3) using the The incremental web crawler incrementally crawls a first page subset at a preset first frequency, and incrementally crawls a second page subset at a preset second frequency. Wherein, the first frequency is higher than the second frequency; the local page set is a set of pages visited by each open source crawler on the multiple shopping platforms; the first page subset and the The second page subset is two subsets obtained by dividing the local page set according to the page change frequency, and the change frequency of any page in the first page subset is higher than that of any page in the second page subset The frequency of page changes.

可理解的是，上文提供了三种采用增量式网络爬虫进行增量爬取的具体方式，当然还可以采用其他方式。上述第(1)种方式针对所有页面均以同一种频率进行增量爬取，比较简单，容易实现，但是没有考虑到不同页面的改频频率之间的差异性。上述第(2)种方式根据不同页面的改变频率分别对不同页面进行增量爬取，这种方式考虑到不同页面的改频频率之间的差异性，但是比较复杂，尤其是页面比较多时，会大大增加增量爬取的复杂度。上述第(3)中方式均衡了上述两种方式的优缺点，将本地页面集分为两部分，一部分是第一页面子集，第二部分是第二页面子集，第一页面子集为改变频率较快的页面形成的集合，第二页面子集为改变频率较慢的页面形成的集合。对于改变频率较快的第一页面子集，以较大的第一频率进行增量爬取；对于改变频率较慢的第二页面子集，以较小的第二频率进行增量爬取。可见第(3)种方式在一定程度上考虑了不同页面的改频频率之间的差异性，增量爬取的过程也比较简单。It is understandable that the above provides three specific methods for incremental crawling using incremental web crawlers, of course, other methods can also be used. The above-mentioned method (1) performs incremental crawling for all pages at the same frequency, which is relatively simple and easy to implement, but does not take into account the difference in frequency of different pages. The above-mentioned method (2) performs incremental crawling of different pages according to the change frequency of different pages. This method takes into account the difference between the frequency change frequency of different pages, but it is more complicated, especially when there are many pages. Will greatly increase the complexity of incremental crawling. The above method (3) balances the advantages and disadvantages of the above two methods. The local page set is divided into two parts, one is the first page subset, the second is the second page subset, and the first page subset is A set formed by pages with a faster change frequency, and the second page subset is a set formed by pages with a slower change frequency. For the first subset of pages with a faster change frequency, the incremental crawl is performed with a larger first frequency; for the second page subset with a slower change frequency, the incremental crawl is performed with a smaller second frequency. It can be seen that the method (3) considers the difference in frequency of different pages to a certain extent, and the process of incremental crawling is relatively simple.

当采用增量式网络爬虫进行增量爬取之后，便可以根据爬取到的新的数据对第一数据库中的商品数据进行更新。After the incremental crawling is performed by the incremental web crawler, the product data in the first database can be updated according to the new data obtained by the crawling.

可见，本申请实施例是通过网络开源爬虫进行全量爬取来进行第一数据库构建，从而可以保证第一数据库中的商品数据比较全面，进而保证比价的全面性。而且本申请实施例是通过增量式网络爬虫进行增量爬取来进行第一数据库更新的，进而可以保证第一数据库中的商品数据为最新商品数据，进而保证比价的有效性。It can be seen that, in the embodiment of the present application, the first database is constructed through full crawling by a network open source crawler, so as to ensure that the product data in the first database is relatively comprehensive, thereby ensuring the comprehensiveness of price comparison. Moreover, in the embodiment of the application, the first database is updated by incremental crawling by an incremental web crawler, thereby ensuring that the product data in the first database is the latest product data, thereby ensuring the effectiveness of price comparison.

S23、提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；S23: Extract and display the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

可理解的是，步骤S23首先从与所述属性信息相匹配的多条商品数据中提取出购物平台和价格，然后展示所提取到的购物平台和价格。It is understandable that step S23 first extracts the shopping platform and price from multiple pieces of product data matching the attribute information, and then displays the extracted shopping platform and price.

举例来说，用户想要购买一款奶瓶，在购物引导平台上浏览到一款品牌为贝亲、容量为240ml、颜色为橘黄色的宽口径奶瓶时点击相应的比价按钮，然后购物引导平台便会确定其属性信息，进而在上述表1中查找相匹配的商品数据，可知序号为2和3的商品数据中的四级类别～七级类别与待比价的商品的属性信息相同，因此提取序号为2和3的商品数据中的购物平台和价格，进而展示给用户。例如，将淘宝-142元、京东-150元展示给用户。当然，在展示给用户时还可以附带上商品的属性信息，例如，将淘宝-贝亲-宽口径奶瓶-240ML-橘黄色142元、京东-贝亲-宽口径奶瓶-240ML-橘黄色-150元展示给用户。For example, if a user wants to buy a milk bottle and browses to a wide-caliber milk bottle with a brand name of Pigeon, a 240ml capacity, and an orange color on the shopping guide platform, click the corresponding price comparison button, and the shopping guide platform will Determine its attribute information, and then look up the matching product data in Table 1 above. It can be seen that the fourth-level to seventh-level categories in the product data with serial numbers 2 and 3 are the same as the attribute information of the product to be compared, so the extracted serial number is The shopping platforms and prices in the product data of 2 and 3 are then displayed to users. For example, showing Taobao -142 yuan and Jingdong -150 yuan to users. Of course, when displaying to users, you can also attach product attribute information. For example, Taobao-Pigeon-Wide-caliber milk bottle-240ML-Orange 142 yuan, Jingdong-Pigeon-Wide-caliber milk bottle-240ML-Orange-150 yuan To the user.

在实际应用中，当有多条商品数据与待比价的商品的属性信息相匹配时，还可以将需要展示的信息以表格的形式展示在页面上。无论以何种形式进行展示，都可以对各条需要展示的信息进行排序，按照一定的顺序进行展示。举例来说，按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。按照价格从低到高的顺序将各条相匹配的商品数据中的购物平台的和价格进行展示，以及对最低价格对应的购物平台进行展示，都是为了使用户能够第一眼就看到这条数据。In practical applications, when there are multiple pieces of product data that match the attribute information of the product to be compared, the information to be displayed can also be displayed on the page in the form of a table. Regardless of the form of display, each piece of information that needs to be displayed can be sorted and displayed in a certain order. For example, the shopping platform in each product data matching the attribute information and the price of the product on the shopping platform are displayed in descending order of price, and the shopping platform corresponding to the lowest price Mark it. Displaying the shopping platform and price of each matching product data in the order of price from low to high, and displaying the shopping platform corresponding to the lowest price are all in order to enable users to see this at first glance Article data.

在实际应用时，除了展示购物平台和价格等信息之外，还可以根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。In practical applications, in addition to displaying information such as shopping platforms and prices, it is also possible to provide purchase advice information based on the prices in each piece of product data matching the attribute information.

举例来说，待比价的商品在购物平台B上的价格最低，则提供建议用户去购物平台B上购买的信息。For example, if the commodity to be compared has the lowest price on shopping platform B, then information is provided to recommend users to buy on shopping platform B.

在一些实施例中，在步骤S23之前，若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。In some embodiments, before step S23, if there is no product data matching the attribute information in the first database, a pre-configured corresponding online open source crawler is used for each preset shopping platform. Crawling to obtain product data matching the attribute information.

也就是说，在预先构建的第一数据库中不存在与待比价的商品的属性信息相匹配的商品数据，则认为没有之前爬取到待比价的商品的相关信息，此时利用网络开源爬虫重新进行爬取，进而提取并展示购物平台和价格等信息。That is to say, if there is no product data matching the attribute information of the product to be compared in the first pre-built database, it is considered that there is no relevant information about the product to be compared. Crawl, and then extract and display information such as shopping platform and price.

上述网络开源爬虫有多种，例如，WebMagic等。There are many kinds of web open source crawlers, such as WebMagic.

本申请实施例提供的基于数据爬数的网络购物引导方法，首先获取属性信息，然后在预先构建的第一数据库中查找与属性信息相匹配的商品数据，进而提取相匹配的商品数据中的购物平台和价格进行展示，可见本申请实施例可以对多个购物平台上的同款商品的价格进行汇总，进而便于用户进行选择购买，不需要用户分别在各个购物平台上搜索、比价等繁琐的操作，能够大大节省时间和人力，提高购物效率。而且，第一数据库是通过网络开源爬虫进行全量爬取来进行第一数据库构建并通过增量式网络爬虫进行增量爬取来进行第一数据库更新的，从而可以保证第一数据库中的商品数据比较全面且是最新商品数据，以保证比价的全面性和有效性。The method for guiding online shopping based on data crawling provided by the embodiment of the present application first obtains attribute information, and then searches for product data matching the attribute information in a first pre-built database, and then extracts shopping in the matching product data Platforms and prices are displayed. It can be seen that the embodiment of this application can summarize the prices of the same product on multiple shopping platforms, thereby facilitating users to choose and purchase, without requiring users to search and compare prices on various shopping platforms. , Can greatly save time and manpower, and improve shopping efficiency. Moreover, the first database is fully crawled through the open source web crawler to construct the first database and incremental crawled through the incremental web crawler to update the first database, thereby ensuring the product data in the first database More comprehensive and up-to-date commodity data to ensure the comprehensiveness and effectiveness of the price comparison.

另外，由于本申请实施例是基于待比价的商品的属性信息进行购物平台和价格汇总的，因此可以减少一些无意义或不明确的参数对用户的错误引导。In addition, since the embodiment of the present application collects the shopping platform and price based on the attribute information of the commodity to be compared, it can reduce the misguided users by some meaningless or unclear parameters.

如图3所示，在一个实施例中，提供了一种基于数据爬数的网络购物引导装置30，该装置30可理解为上文中的购物引导平台，该装置30可以集成于上述的计算机设备中，具体可以包括：As shown in FIG. 3, in one embodiment, an online shopping guidance device 30 based on data crawling is provided. The device 30 can be understood as the shopping guidance platform mentioned above. The device 30 can be integrated into the above-mentioned computer equipment. , Which can specifically include:

属性获取模块32，用于响应于在购物页面上对一商品的比价操作，获取所述商品的属性信息；The attribute obtaining module 32 is configured to obtain attribute information of a commodity in response to a price comparison operation on a shopping page;

数据查找模块33，用于在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；The data search module 33 is configured to search for product data matching the attribute information in the first pre-built database;

提取展示模块34，用于提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；The extraction and display module 34 is configured to extract and display the shopping platform in each product data matching the attribute information and the price of the product on the shopping platform;

第一数据库构建模块31，用于预先构建所述第一数据库；所述第一数据库构建模块包括：The first database construction module 31 is used to construct the first database in advance; the first database construction module includes:

第二爬取单元，用于采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；The second crawling unit is configured to use incremental web crawlers to incrementally crawl the multiple shopping platforms to update the product data in the first database;

在一些实施例中，爬虫配置单元对该购物平台对应的网络开源爬虫进行配置的过程包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。In some embodiments, the process of the crawler configuration unit configuring the online open source crawler corresponding to the shopping platform includes: determining the configuration file of the online open source crawler corresponding to the shopping platform according to the code block sequence and a preset description document; Wherein, the description file stores description information for generating the configuration file.

在一些实施例中，第二数据库构建单元对预设的多个购物平台分别进行数据爬取的过程包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。In some embodiments, the process of the second database construction unit crawling data for the preset multiple shopping platforms respectively includes: writing the corresponding computer code for the preset multiple shopping platforms, and using each The computer code corresponding to a shopping platform performs data crawling on the website.

在一些实施例中，第二爬取单元具体用于：采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。In some embodiments, the second crawling unit is specifically configured to: use the incremental web crawler to incrementally crawl each page in the local page set at the same preset frequency; or, use the incremental web The crawler incrementally crawls each page according to the respective change frequency of each page in the local page set; or, using the incremental web crawler to incrementally crawl the first page subset at a preset first frequency , And the second page subset is incrementally crawled at a preset second frequency; wherein, the first frequency is higher than the second frequency; the local page set is a set of web open source crawlers in the multiple A collection of pages visited on a shopping platform; the first page subset and the second page subset are two subsets obtained by dividing the local page set according to the frequency of page changes, the first The change frequency of any page in a page subset is higher than the change frequency of any page in the second page subset.

在一些实施例中，提取展示模块具体用于：按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。In some embodiments, the extraction and display module is specifically configured to display the shopping platform and the price of the product on the shopping platform in each item of product data matching the attribute information in order of price from low to high, And mark the shopping platform corresponding to the lowest price.

在一些实施例中，所述装置还包括：In some embodiments, the device further includes:

建议提供模块，用于根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。The suggestion providing module is used to provide purchase suggestion information according to the price in each piece of product data matching the attribute information.

数据爬取模块，用于在所述提取展示模块获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。The data crawling module is used to obtain and display each item of product data matching the attribute information in the shopping platform and the price of the product on the shopping platform before the extraction and display module, if the first If there is no product data matching the attribute information in the database, each preset shopping platform is crawled with a corresponding web open source crawler configured in advance to obtain product data matching the attribute information.

本申请提供的基于数据爬数的网络购物引导装置，首先属性获取模块获取属性信息，然后数据查找模块在预先构建的第一数据库中查找与属性信息相匹配的商品数据，进而提取展示模块提取相匹配的商品数据中的购物平台和价格进行展示，可见本申请实施例可以对多个购物平台上的同款商品的价格进行汇总，进而便于用户进行选择购买，不需要用户分别在各个购物平台上搜索、比价等繁琐的操作，能够大大节省时间和人力，提高购物效率。而且，第一数据库是通过网络开源爬虫进行全量爬取来进行第一数据库构建并通过增量式网络爬虫进行增量爬取来进行第一数据库更新的，从而可以保证第一数据库中的商品数据比较全面且是最新商品数据，以保证比价的全面性和有效性。In the online shopping guidance device based on data crawling provided by this application, the attribute acquisition module first acquires attribute information, and then the data search module searches for the product data matching the attribute information in the first pre-built database, and then extracts the display module to extract the relevant information. The shopping platform and price in the matched product data are displayed. It can be seen that the embodiment of the application can summarize the prices of the same product on multiple shopping platforms, thereby facilitating users to choose and purchase, without requiring users to separately on each shopping platform Trivial operations such as search and price comparison can greatly save time and manpower and improve shopping efficiency. Moreover, the first database is fully crawled through the open source web crawler to construct the first database and incremental crawled through the incremental web crawler to update the first database, thereby ensuring the product data in the first database More comprehensive and up-to-date commodity data to ensure the comprehensiveness and effectiveness of the price comparison.

在一些实施例中，提出了一种计算机设备，所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In some embodiments, a computer device is provided. The computer device includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer The following steps are implemented during the program:

其中，所述第一数据库的预先构建过程包括：对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；其中，预设的每一购物平台所对应的网络开源爬虫的预先配置过程包括：根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。Wherein, the pre-construction process of the first database includes: crawling a plurality of preset shopping platforms using pre-configured corresponding web open source crawlers to obtain multiple pieces of product data, and storing the multiple pieces of product data To the first database; wherein, the online open source crawler corresponds to the shopping platform one-to-one, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform Use incremental web crawlers to incrementally crawl the multiple shopping platforms to update the product data in the first database; wherein the preset web open source crawler corresponding to each shopping platform The pre-configuration process includes: according to crawling requirements, select the required code blocks from the pre-built second database; and according to the execution order of the selected code blocks, sort the selected code blocks to obtain the corresponding According to the code block sequence, the online open source crawler corresponding to the shopping platform is configured; wherein, the second database includes a plurality of code blocks; the pre-construction process of the second database includes: Multiple preset shopping platforms perform data crawling separately, and use the computer code corresponding to each crawling step in the data crawling process as a code block.

在一些实施例中，所述处理器执行的所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。In some embodiments, the configuration of the online open source crawler corresponding to the shopping platform according to the code block sequence executed by the processor includes: determining the shopping according to the code block sequence and a preset description document A configuration file of the network open source crawler corresponding to the platform; wherein, the description file stores description information for generating the configuration file.

在一些实施例中，所述处理器执行的所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。In some embodiments, performing data crawling on the plurality of preset shopping platforms executed by the processor includes: respectively writing corresponding computer codes for the plurality of preset shopping platforms, and using The computer code corresponding to each shopping platform performs data crawling on the website.

在一些实施例中，所述处理器执行的所述采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，包括：采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。In some embodiments, the incremental crawling of the multiple shopping platforms performed by the processor using an incremental web crawler includes: using the incremental web crawler to perform the same frequency Each page in the local page set is incrementally crawled; or, the incremental web crawler is used to incrementally crawl each page according to the respective change frequency of each page in the local page set; or, the increment is used Web crawler incrementally crawls a first subset of pages at a preset first frequency, and incrementally crawls a second subset of pages at a preset second frequency; wherein the first frequency is high At the second frequency; the local page set is a set of pages visited by various open source crawlers on the multiple shopping platforms; the first page subset and the second page subset are based on pages The change frequency of the two subsets obtained by dividing the local page set, the change frequency of any page in the first page subset is higher than the change frequency of any page in the second page subset.

在一些实施例中，所述处理器执行的所述展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，包括：按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。In some embodiments, the display of each item of product data that matches the attribute information on the shopping platform and the price of the product on the shopping platform executed by the processor includes: according to the price from low to The shopping platform and the price of the product on the shopping platform in each product data matching the attribute information are displayed in a high order, and the shopping platform corresponding to the lowest price is marked.

在一些实施例中，所述处理器执行所述计算机程序时还实现以下步骤：根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。In some embodiments, the processor further implements the following step when executing the computer program: providing purchase recommendation information according to the price in each piece of commodity data matching the attribute information.

在一些实施例中，所述处理器执行的所述获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，还实现以下步骤：若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。In some embodiments, the processor performs the following steps before acquiring and displaying the shopping platform in each product data matching the attribute information and the price of the product on the shopping platform : If there is no product data matching the attribute information in the first database, each preset shopping platform is crawled with a corresponding web open source crawler configured in advance to obtain the attribute information Matching product data.

本申请提供的计算机设备的有益效果与上述基于数据爬数的网络购物引导方法和装置相同，这里不再赘述。The beneficial effects of the computer equipment provided in this application are the same as the above-mentioned method and device for guiding online shopping based on data crawling numbers, and will not be repeated here.

在一个实施例中，提出了一种存储有计算机可读指令的存储介质，该计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器执行以下步骤：In one embodiment, a storage medium storing computer-readable instructions is provided. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

在一些实施例中，所述一个或多个处理器执行的所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。In some embodiments, the configuration of the online open source crawler corresponding to the shopping platform according to the code block sequence executed by the one or more processors includes: according to the code block sequence and a preset description document , Determine the configuration file of the network open source crawler corresponding to the shopping platform; wherein, the description document stores the description information used to generate the configuration file.

在一些实施例中，所述一个或多个处理器执行的所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。In some embodiments, performing data crawling on the preset multiple shopping platforms executed by the one or more processors respectively includes: writing corresponding computers for the preset multiple shopping platforms. Code, and use the computer code corresponding to each shopping platform to crawl the website data.

在一些实施例中，所述一个或多个处理器执行的所述采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，包括：采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。In some embodiments, the incremental crawling of the multiple shopping platforms performed by the one or more processors using an incremental web crawler includes: using the incremental web crawler to predict Set the same frequency to incrementally crawl each page in the local page set; or, use the incremental web crawler to incrementally crawl each page according to the change frequency of each page in the local page set; or The incremental web crawler incrementally crawls the first page subset at a preset first frequency, and incrementally crawls the second page subset at a preset second frequency; wherein, the The first frequency is higher than the second frequency; the local page set is a set of pages visited by various open source crawlers on the multiple shopping platforms; the first page subset and the second page subset The set is two subsets obtained by dividing the local page set according to the change frequency of the page, the change frequency of any page in the first page subset is higher than the change frequency of any page in the second page subset .

在一些实施例中，所述一个或多个处理器执行的所述展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，包括：按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。In some embodiments, the display of each item of product data that matches the attribute information on the shopping platform and the price of the product on the shopping platform executed by the one or more processors includes: The shopping platform and the price of the product on the shopping platform in each product data matching the attribute information are displayed in order from low to high, and the shopping platform corresponding to the lowest price is marked.

在一些实施例中，所述一个或多个处理器执行所述计算机程序时还实现以下步骤：根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。In some embodiments, when the one or more processors execute the computer program, the following step is further implemented: providing purchase recommendation information according to the price in each piece of product data matching the attribute information.

在一些实施例中，所述一个或多个处理器执行的所述获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，还实现以下步骤：若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。In some embodiments, the acquisition and display of each piece of product data matching the attribute information performed by the one or more processors includes the shopping platform and the price of the product on the shopping platform, The following steps are also implemented: if there is no product data matching the attribute information in the first database, crawl each preset shopping platform using a pre-configured corresponding network open source crawler to obtain and Product data that matches the attribute information.

本申请提供的存储介质的有益效果与基于数据爬数的网络购物引导方法和装置相同，这里不再赘述。The beneficial effects of the storage medium provided in the present application are the same as the method and device for guiding online shopping based on data crawling, and will not be repeated here.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，该计算机程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)等非易失性存储介质，或随机存储记忆体(Random Access Memory，RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, All should be considered as the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of this application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

一种基于数据爬数的网络购物引导方法，所述方法包括：A method for guiding online shopping based on data crawling, the method comprising:

响应于在预设页面上对一商品的比价操作，获取所述商品的属性信息；In response to a price comparison operation for a commodity on a preset page, acquiring attribute information of the commodity;

在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；Searching for product data matching the attribute information in the first pre-built database;

提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；Extracting and displaying the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

其中，所述第一数据库的预先构建过程包括：Wherein, the pre-construction process of the first database includes:

对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与所述购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；Pre-configured corresponding online open source crawlers are used to crawl the preset multiple shopping platforms in full to obtain multiple pieces of product data, and the multiple pieces of product data are stored in the first database; wherein, the network open source crawler One-to-one correspondence with the shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform; Each shopping platform performs incremental crawling to update the product data in the first database;

其中，预设的每一购物平台所对应的网络开源爬虫的预先配置过程包括：根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。Among them, the preset pre-configuration process of the open source crawler corresponding to each shopping platform includes: according to crawling requirements, selecting the required code block from the pre-built second database; and according to the selected code block The execution sequence is to sort the selected code blocks to obtain the corresponding code block sequence; according to the code block sequence, the network open source crawler corresponding to the shopping platform is configured; wherein, the second database includes multiple Code block; the pre-construction process of the second database includes: crawling data for multiple preset shopping platforms separately, and using the computer code corresponding to each crawling step in the data crawling process as a code block .
根据权利要求1所述的方法，所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置，包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。The method according to claim 1, wherein the configuration of the online open source crawler corresponding to the shopping platform according to the code block sequence comprises: determining that the shopping platform corresponds to the shopping platform according to the code block sequence and a preset description document The configuration file of the network open source crawler; wherein, the description document stores the description information used to generate the configuration file.
根据权利要求1所述的方法，所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。The method according to claim 1, wherein the data crawling of the plurality of preset shopping platforms respectively comprises: writing the corresponding computer code for the plurality of preset shopping platforms, and using each shopping platform The computer code corresponding to the platform crawls the website data.
根据权利要求1所述的方法，所述采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，包括：The method of claim 1, wherein the incremental crawling of the multiple shopping platforms by using an incremental web crawler includes:

采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。The incremental web crawler is used to incrementally crawl each page in the local page set at the same preset frequency; or, the incremental web crawler is used to separately crawl each page in the local page set according to the respective change frequency of each page in the local page set. Pages are incrementally crawled; or, the incremental web crawler is used to incrementally crawl the first page subset at a preset first frequency, and the second page subset at a preset second frequency Perform incremental crawling; wherein, the first frequency is higher than the second frequency; the local page set is a set of pages visited by each open source crawler on the multiple shopping platforms; the first The page subset and the second page subset are two subsets obtained by dividing the local page set according to the page change frequency, and the change frequency of any page in the first page subset is higher than that of the first page subset. The change frequency of any page in the second page subset.
根据权利要求1～4任一项所述的方法，所述展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，包括：The method according to any one of claims 1 to 4, wherein the displaying of the shopping platform and the price of the product on the shopping platform in each piece of product data matching the attribute information includes:

按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。Display the shopping platforms in each piece of product data matching the attribute information and the prices of the products on the shopping platform in the order of price from low to high, and mark the shopping platform corresponding to the lowest price.
根据权利要求1～4任一项所述的方法，所述方法还包括：The method according to any one of claims 1 to 4, further comprising:

根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。According to the price in each piece of product data that matches the attribute information, purchase recommendation information is provided.
根据权利要求1～4任一项所述的方法，所述获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，所述方法还包括：The method according to any one of claims 1 to 4, said acquiring and displaying the shopping platform in each product data matching the attribute information and before the price of the product on the shopping platform, the Methods also include:

若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。If there is no product data matching the attribute information in the first database, each preset shopping platform is crawled with a corresponding online open source crawler configured in advance to obtain the attribute information. Matching product data.
一种基于数据爬数的网络购物引导装置，所述装置包括：An online shopping guidance device based on data crawling, the device comprising:

属性获取模块，用于响应于在购物页面上对一商品的比价操作，获取所述商品的属性信息；The attribute obtaining module is used to obtain the attribute information of the commodity in response to the price comparison operation on the shopping page;

数据查找模块，用于在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；A data search module, which is used to search for product data matching the attribute information in the first pre-built database;

提取展示模块，用于提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；The extraction and display module is used to extract and display the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

第一数据库构建模块，用于预先构建所述第一数据库；The first database construction module is used to construct the first database in advance;

其中，所述第一数据库构建模块包括：Wherein, the first database building module includes:

第一爬取单元，用于对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；The first crawling unit is used for crawling a plurality of preset shopping platforms in full using corresponding pre-configured web open source crawlers to obtain multiple pieces of product data, and store the multiple pieces of product data in the first database Wherein, the online open source crawler has a one-to-one correspondence with a shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform;

第二爬取单元，用于采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；The second crawling unit is configured to use incremental web crawlers to incrementally crawl the multiple shopping platforms to update the product data in the first database;

爬虫配置单元，用于对预设的每一购物平台所对应的网络开源爬虫进行预先配置，具体用于：根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；The crawler configuration unit is used to pre-configure the open source crawler corresponding to each preset shopping platform, and is specifically used to: select the required code block from the pre-built second database according to the crawling requirements; and The execution order of each selected code block is sorted to obtain the corresponding code block sequence; according to the code block sequence, the network open source crawler corresponding to the shopping platform is configured; wherein, the The second database includes multiple code blocks;

第二数据库构建单元，用于对所述第二数据库进行预先构建，具体用于：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。The second database construction unit is used to construct the second database in advance, and is specifically used to: crawl data on a plurality of preset shopping platforms separately, and perform data crawling on each crawling step in the data crawling process. The corresponding computer code serves as a code block.
根据权利要求8所述的装置，所述根据所述代码块序列，所述爬虫配置单元，还用于：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。8. The device according to claim 8, wherein the crawler configuration unit is further configured to determine the online open source crawler corresponding to the shopping platform according to the code block sequence and a preset description document according to the code block sequence The configuration file; wherein the description document stores the description information used to generate the configuration file.
根据权利要求8所述的装置，所述第二数据库构建单元，还用于：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。The device according to claim 8, wherein the second database construction unit is further configured to: respectively write the corresponding computer codes for the plurality of preset shopping platforms, and use the corresponding computer codes for each shopping platform The computer code crawls the website data.
根据权利要求8所述的装置，所述第二爬取单元，还用于：The device according to claim 8, wherein the second crawling unit is further configured to:

采用所述增量式网络爬虫以预设相同频率对本地页面集中的各个页面进行增量爬取；或者，采用所述增量式网络爬虫根据本地页面集中的各个页面各自的改变频率分别对各个页面进行增量爬取；或者，采用所述增量式网络爬虫以预设的第一频率对第一页面子集进行增量爬取，且以预设的第二频率对第二页面子集进行增量爬取；其中，所述第一频率高于所述第二频率；所述本地页面集为各个网络开源爬虫在所述多个购物平台上访问过的页面的集合；所述第一页面子集和所述第二页面子集为根据页面的改变频率对所述本地页面集进行划分而得到的两个子集，所述第一页面子集中任意一页面的改变频率高于所述第二页面子集中任意一页面的改变频率。The incremental web crawler is used to incrementally crawl each page in the local page set at the same preset frequency; or, the incremental web crawler is used to separately crawl each page in the local page set according to the respective change frequency of each page in the local page set. Pages are incrementally crawled; or, the incremental web crawler is used to incrementally crawl the first page subset at a preset first frequency, and the second page subset at a preset second frequency Perform incremental crawling; wherein, the first frequency is higher than the second frequency; the local page set is a set of pages visited by each open source crawler on the multiple shopping platforms; the first The page subset and the second page subset are two subsets obtained by dividing the local page set according to the page change frequency, and the change frequency of any page in the first page subset is higher than that of the first page subset. The change frequency of any page in the second page subset.
根据权利要求8～11任一项所述的装置，所述提取展示模块，还用于：The device according to any one of claims 8-11, the extracting and displaying module is further used for:

按照价格从低到高的顺序展示各条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格，并对最低的价格对应的购物平台进行标示。Display the shopping platforms in each piece of product data matching the attribute information and the prices of the products on the shopping platform in the order of price from low to high, and mark the shopping platform corresponding to the lowest price.
根据权利要求8～11任一项所述的装置，所述装置还用于：The device according to any one of claims 8-11, which is further configured to:

根据各条与所述属性信息相匹配的商品数据中的价格，提供购买建议信息。According to the price in each piece of product data that matches the attribute information, purchase recommendation information is provided.
根据权利要求8～11任一项所述的装置，所述装置还用于：The device according to any one of claims 8-11, which is further configured to:

所述在所述提取展示模块获取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格之前，若所述第一数据库中不存在与所述属性信息相匹配的商品数据，则对预设的每一个购物平台采用预先配置的对应的网络开源爬虫进行爬取，以获取与所述属性信息相匹配的商品数据。Before the extraction and display module acquires and displays each item of product data that matches the attribute information, the shopping platform and the price of the product on the shopping platform, if there is no such item in the first database The product data matching the attribute information is crawled using a pre-configured corresponding online open source crawler for each preset shopping platform to obtain product data matching the attribute information.
一种计算机设备，包括存储器和处理器，所述存储器中存储有计算机可读指令，所述计算机可读指令被所述处理器执行时，使得所述处理器执行基于数据爬数的网络购物引导方法的步骤，包括：响应于在预设页面上对一商品的比价操作，获取所述商品的属性信息；A computer device includes a memory and a processor, and the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes online shopping guidance based on data crawling The steps of the method include: in response to a price comparison operation of a commodity on a preset page, obtaining attribute information of the commodity;

在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；Searching for product data matching the attribute information in the first pre-built database;

提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；Extracting and displaying the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

其中，所述第一数据库的预先构建过程包括：Wherein, the pre-construction process of the first database includes:

对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与所述购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；Pre-configured corresponding online open source crawlers are used to crawl the preset multiple shopping platforms in full to obtain multiple pieces of product data, and the multiple pieces of product data are stored in the first database; wherein, the network open source crawler One-to-one correspondence with the shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform; Each shopping platform performs incremental crawling to update the product data in the first database;

其中，预设的每一购物平台所对应的网络开源爬虫的预先配置过程包括：Among them, the preset pre-configuration process of the web open source crawler corresponding to each shopping platform includes:

根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。According to the crawling requirements, select the required code blocks from the pre-built second database; and according to the execution order of the selected code blocks, sort the selected code blocks to obtain the corresponding code block sequence; The code block sequence configures the online open source crawler corresponding to the shopping platform; wherein, the second database includes a plurality of code blocks; the pre-construction process of the second database includes: a plurality of preset shopping The platform performs data crawling separately, and uses the computer code corresponding to each crawling step in the data crawling process as a code block.
根据权利要求15所述的计算机设备，所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置，包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。The computer device according to claim 15, wherein the configuration of the online open source crawler corresponding to the shopping platform according to the code block sequence includes: determining the shopping platform according to the code block sequence and a preset description document The configuration file of the corresponding network open source crawler; wherein the description document stores description information for generating the configuration file.
根据权利要求15所述的计算机设备，所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。The computer device according to claim 15, wherein the data crawling of the plurality of preset shopping platforms respectively comprises: writing the corresponding computer code for the plurality of preset shopping platforms, and using each The computer code corresponding to the shopping platform crawls the website data.
一种存储有计算机可读指令的存储介质，所述计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器执行基于数据爬数的网络购物引导方法的步骤，包括：响应于在预设页面上对一商品的比价操作，获取所述商品的属性信息；A storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps of an online shopping guidance method based on data crawling, including: In response to a price comparison operation for a commodity on a preset page, acquiring attribute information of the commodity;

在预先构建的第一数据库中查找与所述属性信息相匹配的商品数据；Searching for product data matching the attribute information in the first pre-built database;

提取并展示每一条与所述属性信息相匹配的商品数据中的购物平台和所述商品在该购物平台上的价格；Extracting and displaying the shopping platform and the price of the product on the shopping platform in each product data matching the attribute information;

其中，所述第一数据库的预先构建过程包括：Wherein, the pre-construction process of the first database includes:

对预设的多个购物平台分别采用预先配置的对应的网络开源爬虫进行全量爬取，得到多条商品数据，将所述多条商品数据存储至第一数据库中；其中，所述网络开源爬虫与所述购物平台一一对应，每一条商品数据至少包括对应商品的所在购物平台、对应商品的属性信息以及对应商品在所述所在购物平台上的价格；采用增量式网络爬虫对所述多个购物平台分别进行增量爬取，以对所述第一数据库中的商品数据进行更新；Pre-configured corresponding online open source crawlers are used to crawl the preset multiple shopping platforms in full to obtain multiple pieces of product data, and the multiple pieces of product data are stored in the first database; wherein, the network open source crawler One-to-one correspondence with the shopping platform, and each piece of product data includes at least the shopping platform of the corresponding product, the attribute information of the corresponding product, and the price of the corresponding product on the shopping platform; Each shopping platform performs incremental crawling to update the product data in the first database;

其中，预设的每一购物平台所对应的网络开源爬虫的预先配置过程包括：Among them, the preset pre-configuration process of the web open source crawler corresponding to each shopping platform includes:

根据爬取要求，从预先构建的第二数据库中选择所需的代码块；并根据选择出的各个代码块的执行顺序，对选择出的各个代码块进行排序，得到对应的代码块序列；根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置；其中，所述第二数据库中包括多个代码块；所述第二数据库的预先构建过程包括：对预设的多个购物平台分别进行数据爬取，并将数据爬取过程中的每一个爬取步骤所对应的计算机代码作为一个代码块。According to the crawling requirements, select the required code blocks from the pre-built second database; and according to the execution order of the selected code blocks, sort the selected code blocks to obtain the corresponding code block sequence; The code block sequence configures the online open source crawler corresponding to the shopping platform; wherein, the second database includes a plurality of code blocks; the pre-construction process of the second database includes: a plurality of preset shopping The platform performs data crawling separately, and uses the computer code corresponding to each crawling step in the data crawling process as a code block.
根据权利要求18所述的存储介质，所述根据所述代码块序列，对该购物平台对应的网络开源爬虫进行配置，包括：根据所述代码块序列和预设的说明文档，确定该购物平台对应的网络开源爬虫的配置文件；其中，所述说明文档中存储有用于生成所述配置文件的说明信息。The storage medium according to claim 18, wherein the configuration of the online open source crawler corresponding to the shopping platform according to the code block sequence includes: determining the shopping platform according to the code block sequence and a preset description document The configuration file of the corresponding network open source crawler; wherein the description document stores description information for generating the configuration file.
根据权利要求18所述的存储介质，所述预设的多个购物平台分别进行数据爬取，包括：对所述预设的多个购物平台分别编写对应的所述计算机代码，并采用每一购物平台对应的所述计算机代码对该网站进行数据爬取。The storage medium according to claim 18, wherein said multiple preset shopping platforms perform data crawling respectively, comprising: writing corresponding computer codes for said multiple preset shopping platforms, and using each The computer code corresponding to the shopping platform crawls the website data.