CA3154438A1 - Commodity content data processing method,platform and system - Google Patents

Commodity content data processing method,platform and system Download PDF

Info

Publication number
CA3154438A1
CA3154438A1 CA3154438A CA3154438A CA3154438A1 CA 3154438 A1 CA3154438 A1 CA 3154438A1 CA 3154438 A CA3154438 A CA 3154438A CA 3154438 A CA3154438 A CA 3154438A CA 3154438 A1 CA3154438 A1 CA 3154438A1
Authority
CA
Canada
Prior art keywords
data
storing
commodity
index
relational database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3154438A
Other languages
French (fr)
Inventor
Pengcheng Wan
Yong LV
Chunsheng Li
Hongyuan JIA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA3154438A1 publication Critical patent/CA3154438A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data processing method, platform and system. The method comprises: storing original commodity content data in a first relational database by cluster, by library and by table (S51); establishing index data according to the original commodity content data and storing the index data in an index database (S52), the index data comprising keyword fields and query dimension identification data corresponding to each keyword field; and computing the original commodity content data by means of a computing program to obtain computing result data, and associatively storing the computing result data and the query dimension identification data in the first relational database (S53). The computing efficiency is improved, and an index database is established according to query dimensions to perform indexing in advance during a subsequent query, which inevitably improves the querying efficiency.

Description

DATA PROCESSING METHOD, PLATFORM AND SYSTEM
BACKGROUND OF THE INVENTION
Technical Field [0001] =The present application relates to the field of business data calculation and enquiry, and more particularly to a data processing method, and corresponding platform and system.
Description of Related Art
[0002] It is frequently required to use some analytical data as guiding basis for operations when merchants sell commodities. These analytical data are mostly obtained on the basis of analysis of great quantities of commodity content data by platforms. For instance, such data as commodity content quality scores characterizing commodity descriptive information quality can provide commodity operational guidance for merchants to sell material commodities. Such data is obtained by platforms that perform summarized analytical calculations on a great deal of commodity content data of great many merchants.
At present, the summarized analytical calculations on a great deal of commodity content data are mostly realized via the mode of Java and the relational database Mysql. When it is required for a merchant to enquire the calculation result data, Mysql will be directly enquired.
[0003] However, in the ear when e-commerce rapidly develops, colossal volume of commodity content data is generated, especially during such large-scale sales promotional activities of platforms as "Double 11", "618", "818" and "Double 12" etc., when data volume increases even greatly. The mode of Java and the relational database Mysql is relatively low in efficiency in the computation of data, when merchants enquire calculation result data, the mode of Java and the relational database Mysql also renders low the enquiring efficiency. In particular when some complicated enquiring conditions are encountered, the enquiring times are essentially in the order of seconds.

SUMMARY OF THE INVENTION
[0004] The present application provides a data processing method, and corresponding platform and system, so as to solve prior-art problem of low efficiency in calculating and enquiring commodity content data.
[0005] The present application sets forth the following solutions.
[0006] According to one aspect, there is provided a data processing method that comprises:
[0007] storing primary commodity content data in a first relational database through clustering and sharding;
[0008] creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and
[0009] invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0010] Preferably, the method further comprises:
[0011] receiving an enquiring request of a user;
[0012] parsing the enquiring request to obtain a keyword to be enquired;
[0013] enquiring in the index database to obtain query dimension identification data corresponding to the keyword to be enquired to serve as a target identification; and
[0014] enquiring in the first relational database to obtain calculation result data corresponding to the target identification.
[0015] Preferably, the method further comprises:
[0016] storing at least partial data of the calculation result data in association with the query dimension identification data in the index database.
[0017] Preferably, the step of invoking a calculation program to calculate the primary commodity content data to obtain calculation result data includes:
[0018] invoking the calculation program to calculate various dimension content quality scores of each commodity in at least two content dimensions at the primary commodity content data, and calculating a content quality total score of each commodity according to the various dimension scores;
[0019] the step of storing the calculation result data in association with the query dimension identification data in the first relational database includes:
[0020] storing the various dimension content quality scores of each commodity and the content quality total score of each commodity in association with the query dimension identification data in the first relational database; and
[0021] the step of storing at least partial data of the calculation result data in association with the query dimension identification data in the index database includes:
[0022] storing the quality total score of each commodity in association with the query dimension identification data in the index database.
[0023] Preferably, the query dimension identification data is a commodity code and/or a merchant code.
[0024] Preferably, the method further comprises:
[0025] receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding; and
[0026] synchronizing the primary commodity content data in the second relational database to the first relational database.
[0027] Preferably, the step of receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding includes:
[0028] receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding according to commodity codes.
[0029] Preferably, the first relational database is Hbase, the second relational database is Mysql, the calculation program is Spark, and the index database is Elasticsearch.
[0030] According to another aspect, the present application further provides a data processing platform, and the platform comprises a data storage layer and a data calculation layer, of which
[0031] the data storage layer is employed for storing primary commodity content data in a first relational database through clustering and sharding, and creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and
[0032] the data calculation layer is employed for invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0033] According to still another aspect, the present application further provides a computer system that comprises:
[0034] one or more processor(s); and
[0035] a memory, associated with the one or more processor(s) for storing a program instruction that executes the following operations when read and executed by the one or more processor(s):
[0036] storing primary commodity content data in a first relational database through clustering and sharding;
[0037] creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and identification data corresponding to each keyword field; and
[0038] invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0039] According to the specific embodiments provided by the present application, the present application has disclosed the following technical effects:
[0040] the technical solution of the present application enhances computing efficiency by storing the commodity primary data in a relational database by clustering and sharding and invoking a calculation program to perform calculation, creates an index database according to query dimensions, and must enhance the enquiring efficiency by firstly indexing before subsequent enquiring. In comparison with the state of the art, such solution can quickly provide multi-dimensional queries of the calculation result data, and avoids the problem of low efficiency caused by direct enquiry in the relational database.
[0041] Of course, not any product that implements the present application is necessarily required to achieve all of the aforementioned advantages simultaneously.
BRIEF DESCRIPTION OF THE DRAWINGS
[0042] To more clearly describe the technical solutions in the embodiments of the present application or the state of the art, drawings required to be used in the description of the embodiments will be briefly introduced below. Apparently, the drawings introduced below are merely directed to some embodiments of the present invention, while it is possible for persons ordinarily skilled in the art to acquire other drawings based on these drawings without spending creative effort in the process.
[0043] Fig. 1 is a view illustrating the structure of the data processing platform provided by an embodiment of the present application;
[0044] Fig. 2 is a view schematically illustrating clustering and sharding provided by an embodiment of the present application;
[0045] Fig. 3 is a flowchart illustrating synchronization of primary commodity content data provided by an embodiment of the present application;
[0046] Fig. 4 is a flowchart illustrating enquiry of commodity content quality scores provided by an embodiment of the present application;
[0047] Fig. 5 is a flowchart illustrating the data processing method provided by an embodiment of the present application; and
[0048] Fig. 6 is a view illustrating the architecture of the computer system provided by an embodiment of the present application.
DETAILED DESCRIPTION OF THE INVENTION
[0049] The technical solutions in the embodiments of the present application will be more clearly and comprehensively described below with reference to the accompanying drawings in the embodiments of the present application. Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the present application. All other embodiments obtainable by persons ordinarily skilled in the art on the basis of the embodiments in the present application without spending creative effort shall all fall within the protection scope of the present application.
[0050] The present application aims to provide a method of processing commodity content data, whereby after primary commodity content data has been stored in a relational database by clustering and sharding, a calculation program is invoked to perform calculation through sub-libraries in parallel to enhance computing efficiency, and index data is created according to query dimensions to be enquired, so that, during subsequent enquiring, identification data can be firstly matched in the index database and enquiry is thereafter performed in the relational database, such a solution can quickly provide multi-dimensional queries of the calculation result data, and enhances enquiring efficiency.
[0051] As shown in Fig. 1, which is a view illustrating the structure of the data processing platform in one of the embodiments of the present application, included are an Mysql database, an Hbase database, a Spark calculation program for calculation, a search engine Elasticsearch, a remote service framework RFS, and an enquiring merchant.
[0052] The Mysql database serves as the database to receive the primary commodity content data, and stores in itself colossal volume of the primary commodity content data through the mode of clustering and sharding. It is specifically possible to complete clustering and sharding according to commodity codes, and the specific operation thereof will be described later in detail.
[0053] The Hbase database is used to perform synchronization according to the data in the Mysql database. It can specifically complete the synchronization through data replication and a data exchange platform. Having synchronized, the Hbase database stores the primary commodity content data according to the mode of clustering and sharding.
[0054] In other embodiments of the present application, the primary commodity content data can be directly stored in the Hbase database, without having to pass through the Mysql database. However, the mode of passing through the Mysql database on the one hand takes into consideration the stability of data backup, and on the other hand takes into consideration that other business processes should rely on the Mysql database for operations.
[0055] Result data to be subsequently enquired is calculated, index is created, and an association relation is established between the index and the result data obtained by calculation, so that the result data can be further enquired out according to index data:
[0056] In the index database Elasticsearch are stored such keyword fields for enquiry as commodity brands, and such identification data to which the keyword fields correspond as commodity codes. Based on such index, the query keyword input by a user (merchant) can be matched with the corresponding commodity code.
[0057] The Spark calculation program is used to base on the number segments of commodity codes to perform MapReduce (a programming model for parallel operations on large-scale datasets (greater than 1TB)) on the primary commodity content data of each cluster according to an expression rule, so as to obtain a calculation result, for instance, to calculate commodity content quality scores. After the calculation result has been obtained, the calculation result and such identification data as commodity codes are stored in the Hbase database.
[0058] Through the foregoing steps is created an association between the index data in Elasticsearch and the calculation result data in the Hbase database through the identification data.
[0059] When the user inputs a query keyword, the RSF firstly enquires in the index to determine matched identification data, such as commodity codes, and hence determines the calculation result data in the Hbase database according to the commodity codes.
[0060] The aforementioned creation of the index can be independent of the calculation process, and it is of course also possible in the present application to store at least a part of the calculation result in the index database. When this part of the result is enquired, the enquiry can be completed merely through Elasticsearch, while it is not required to further enquire in the Hbase database.
[0061] As should be noted, the aforementioned Mysql database, Hbase database, Spark calculation program, and search engine Elasticsearch can all be replaced with modules of similar functions, and Fig. 1 merely illustrates a specific system structure of the present application.
[0062] Taking for example the system and the calculation of commodity content quality scores illustrated in Fig. 1, the process of storing primary commodity content data through clustering and sharding, the process of synchronizing commodity content data, the process of calculating commodity content quality scores, the process of synchronizing commodity content quality scores, the process of creating index, and the process of enquiring commodity content quality scores are described in detail below:
[0063] the primary commodity content data is stored through clustering and sharding:
[0064] the primary commodity content data is stored in 4 clusters of Mysql according to number segments of commodity codes, the results by getting modulus of 10 according to the last two digits of the commodity codes are stored in 10 sub-libraries of each cluster, and the results by getting remainder of 10 according to the last one digit of the commodity codes are stored in 10 sub-tables of each sub-library, thusly more than one billion commodity content data are dispersed in several hundreds of sub-tables. Fig. 2 is a view schematically illustrating clustering and sharding.
[0065] For instance, number segments of commodity codes stored in each cluster are defined thus that: commodity data from number segment 000000000000000000 to number segment 000000000500000000 are stored in cluster 1; commodity data from number segment 000000000500000001 to number segment 000000000100000000 are stored in cluster 2; commodity data from number segment 000000001000000001 to number segment 000000001500000000 are stored in cluster 3; and commodity data from number segment 000000001500000001 to number segment 000000002000000000 are stored in cluster 4.
[0066] The sub-library of a cluster to which each commodity belongs is defined: a corresponding sub-library is designated according to the results by performing modulo operation to the last two digits of the commodity code with 10.
[0067] The sub-table of a sub-library of a cluster to which each commodity belongs is defined:
a corresponding sub-table is designated according to the results by performing remainder operation to the last one digit of the commodity code with 10.
[0068] For instance, commodity code 000000001500000023 belongs to sub-table 4, sub-library 3, cluster 4.
[0069] Synchronization of the primary commodity content data:
[0070] synchronization of the primary commodity content data is classified into three types:
quasi real-time incremental update, daily incremental update, and weekly total update, of which both the daily incremental update and the weekly total update are directed to fault toleration.
[0071] As shown in Fig. 3, specifically, a real-time data replication system (RDRS) platform can be defined to synchronize Mysql data to HBase in quasi real time, and a data exchange platform IDE can be defined to synchronize Mysql data to HBase daily incrementally and weekly totally:
[0072] The RDRS platform synchronizes commodity content data to HBase by parsing binlog information of the Mysql database cluster in quasi real time.
[0073] The data exchange platform synchronizes commodity content information incremental data to HBase daily, and makes comparison and correction with the quasi real-time HBase commodity content data.
[0074] The data exchange platform synchronizes total commodity data to HBase weekly, and makes comparison and correction with the current HBase commodity content data.
[0075] Calculation of commodity content quality scores:
[0076] the commodity content quality is mainly affected by 7 content dimensions, namely basic information, parameter information, category information, master map information, title information, selling point information, and detailed information. The Spark program bases on the expression rule to perform parallel calculation on each sub-library, calculates out the scores of the basic information, parameter information, category information, master map information, title information, selling point information, and detailed information of the entire sub-library commodities, and finally summarizes and writes the entire dimension scores in Hive (a data warehouse tool of Hadoop), specifically:
[0077] the scores of the basic information, parameter information, category information, master map information, title information, selling point information, and detailed information of the entire sub-libraries are firstly calculated out by means of MapReduce according to the sub-libraries. Calculation according to the sub-libraries mainly aims to reduce excessive data skew, to hence enhance computing efficiency.
[0078] The scores of the basic information, parameter information, category information, master map information, title information, selling point information, and detailed information are merged together to obtain a total score.
[0079] The following is directed to tests of computing efficiencies of the present application and the prior-art technology:
[0080] One million pieces of data to be calculated, ten million pieces of data to be calculated, and one hundred million pieces of data to be calculated are inserted into a calculation table of commodity quality estimation. Calculations are subsequently performed on the basis ofjava+Mysql and Spark+HBase, respectively. The test results are recorded in Table 1.
[0081] Table 1. Comparison of Spark+HBase and java Computing Efficiencies Pieces of Data Recorded Sp ark+HB
ase Java+Mysq 1,000,000 30 minutes 8 hours 10,000,000 2 hours 3 days 100,000,000 5 hours 30 days
[0082] As can be seen from the test results, the calculation based on the combination of Spark+HBase greatly enhances computing efficiency, and the computing efficiency still exhibits excellent performance even when the number of pieces of data is increased by folds.
[0083] Synchronization of commodity content quality scores:
[0084] the various scores are summarized and calculated according to set query dimensions, such as commodities and merchants, to obtain the corresponding total score, for example, the total score of a certain commodity or the total score of a certain merchant.
Of course, other dimensions can also be utilized. Thereafter, such data as the commodity content quality scores of various dimensions, the commodity content quality total score, and the scores summarized according to set query dimensions are synchronized to HBase.
[0085] Creation of query dimension index:
[0086] index data is created according to such query dimensions as commodity codes and merchant codes, and the index data includes keyword fields and corresponding query dimension identification data, such as commodity brands and the corresponding commodity codes.
[0087] The creation of such index can be based on the process of synchronizing the commodity content quality score data, when the commodity content quality score data is calculated and obtained and synchronized to HBase, correspondence relations between the keyword fields in the primary commodity data and the query dimension identification data are created, and the total score data summarized and obtained according to such query dimensions as commodity codes and merchant codes is synchronized to the index data.
[0088] The relevant calculation result data of Elasticsearch and HBase, such as the commodity content quality score data, are all incrementally updated.
[0089] Enquiry of commodity content quality scores.
[0090] with respect to data of differently typed enquiring conditions required by the user, corresponding enquiring interfaces and request parameters are required, corresponding commodity codes and merchant codes are then firstly obtained from Elasticsearch according to the enquiring conditions, the required data is thereafter enquired out of HBase according to the enquired commodity codes and merchant codes, and the data that conforms to the conditions is finally returned to the user after integration and filtration, specifically:
[0091] the remote service framework (RSF) is firstly defined to provide remote query service to the enquirer component and to define the query service, for processing queries of merchants. It invokes the RSF service to perform various types of iterative queries according to the enquiring condition input by a merchant, and then gets intersection of the results of various sub-enquiring conditions, wherein the sub-queries are concurrent queries.
[0092] Fig. 4 is a flowchart illustrating the process of enquiring commodity content quality scores, and the process includes the following steps:
[0093] a client end sends out a query service request of commodity quality scores;
[0094] an enquiring server expression-parses the query service request of commodity quality scores sent by the client end;
[0095] the enquiring server submits the parsed query request to an Elasticsearch cluster ¨ a cluster is set for Elasticsearches in this embodiment to avoid single-point failure of the machine;
[0096] the Elasticsearch cluster returns a query result (commodity codes +
merchant codes) to the enquiring server;
[0097] the enquiring sewer submits the query request to an HBase cluster according to the query result returned from the Elasticsearch cluster;
[0098] the HBase cluster returns a final query result corresponding to the commodity codes and the merchant codes to the enquiring server; and
[0099] the enquiring server returns the final query result to the client end.
[0100] =The following is directed to tests of enquiring efficiencies of the present application and the prior-art technology:
[0101] One million pieces of data to be calculated, ten million pieces of data to be calculated, and one hundred million pieces of data to be calculated are inserted into a calculation table of commodity quality estimation. Calculations are subsequently performed on the basis ofJava+Mysql and Spark+HBase, respectively.
[0102] As can be seen from the test results, the calculation based on the combination of Spark+HBase greatly enhances computing efficiency, and the computing efficiency still exhibits excellent performance even when the number of pieces of data is increased by folds.
[0103] One million pieces of data, ten million pieces of data, one hundred million pieces of data, and one billion pieces of data are respectively inserted into different tables of Elasticsearch and HBase, and 15 fields are recorded for each piece. Enquiry is subsequently performed on the basis of java+Mysql and Elasticsearch+HBase, respectively.
[0104] The test results are recorded in Table 2.
[0105] Table 2. Comparison of Elasticsearch+HBase and Java+Mysql Enquiring Efficiencies Pieces of Data Recorded Elasticsearch+HBase Java+Mysql 1,000,000 125ms 0.564s 10,000,000 140ms 2.543s 100,000,000 162ms timeout exception 1,000,000,000 190ms timeout exception
[0106] As can be seen from the test results, the enquiry based on the combination of Elasticsearch+HBase greatly enhances enquiring efficiency, and the enquiring efficiency still exhibits excellent performance even when the number of pieces of data is increased by folds.
[0107] Embodiment 1
[0108] As previously mentioned, the aforementioned various databases or calculation program Spark can be replaced with modules of similar functions, and the calculation results can is also be set as data other than commodity content quality scores according to requirements of users. On the basis thereof, Embodiment 1 of the present application provides a data processing method, as shown in Fig. 5, the method comprises the following steps:
[0109] S51 - storing primary commodity content data in a first relational database through clustering and sharding;
[0110] S52 - creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and
[0111] S53 - invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0112] Preferably, the method further comprises:
[0113] receiving an enquiring request of a user;
[0114] parsing the enquiring request to obtain a keyword to be enquired;
[0115] enquiring in the index database to obtain query dimension identification data corresponding to the keyword to be enquired to serve as a target identification; and
[0116] enquiring in the first relational database to obtain calculation result data corresponding to the target identification.
[0117] Further, the method can further comprise:
[0118] storing at least partial data of the calculation result data in association with the query dimension identification data in the index database.
[0119] In another preferred embodiment, the method further comprises:
receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding ¨ specifically, clustering and sharding can be performed according to commodity codes; and
[0120] synchronizing the primary commodity content data in the second relational database to the first relational database.
[0121] Embodiment 2
[0122] Corresponding to the aforementioned method, the present application further provides a data processing platform, and the platform comprises a data storage layer and a data calculation layer, of which
[0123] the data storage layer is employed for storing primary commodity content data in a first relational database through clustering and sharding, and creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and
[0124] the data calculation layer is employed for invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0125] In a preferred embodiment, the data processing platform further comprises a data application layer for receiving an enquiring request of a user for parsing to obtain a keyword to be enquired, enquiring in the index database to obtain query dimension identification data corresponding to the keyword to be enquired to serve as a target identification, and enquiring in the first relational database to obtain calculation result data corresponding to the target identification, so as to return the result data to the user.
[0126] In a preferred embodiment, the storage layer is further employed for storing at least partial data of the calculation result data in association with the query dimension identification data in the index database.
[0127] In a preferred embodiment, the storage layer is further employed for receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding, and synchronizing the primary commodity content data in the second relational database to the first relational database.
[0128] Embodiment 3
[0129] Corresponding to the aforementioned method and platform, Embodiment 3 of the present application further provides a computer system that comprises:
[0130] one or more processor(s); and
[0131] a memory, associated with the one or more processor(s) for storing a program instruction that executes the following operations when read and executed by the one or more processor(s):
[0132] storing primary commodity content data in a first relational database through clustering and sharding;
[0133] creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and identification data corresponding to each keyword field; and
[0134] calculating the primary commodity content data through a calculation program to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
[0135] Fig. 6 exemplarily illustrates the framework of a computer system that can specifically include a processor 1510, a video display adapter 1511, a magnetic disk driver 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 can be communicably connected with one another via a communication bus 1530.
[0136] =The processor 1510 can be embodied as a general CPU (Central ProcElasticsearchsing Unit), a microprocessor, an ASIC (Application Specific Integrated Circuit), or one or more integrated circuit(s) for executing relevant program(s) to realize the technical solutions provided by the present application.
[0137] The memory 1520 can be embodied in such a form as an ROM (Read Only Memory), an RANI (Random AccElasticsearchs Memory), a static storage device, or a dynamic storage device. The memory 1520 can store an operating system 1521 for controlling the running of a computer system 1500, and a basic input/output system (BIOS) for controlling lower-level operations of the computer system 1500. In addition, the memory 1520 can also store a web browser 1523, a data storage administration system 1524, and an icon font processing system 1525, etc. The icon font processing system 1525 can be an application program that specifically realizes the aforementioned various step operations in the embodiments of the present application. To sum it up, when the technical solutions provided by the present application are to be realized via software or firmware, the relevant program codes are stored in the memory 1520, and invoked and executed by the processor 1510.
[0138] The input/output interface 1513 is employed to connect with an input/output module to realize input and output of information. The input/output module can be equipped in the device as a component part (not shown in the drawings), and can also be externally connected with the device to provide corresponding functions. The input means can include a keyboard, a mouse, a touch screen, a microphone, and various sensors etc., and the output means can include a display screen, a loudspeaker, a vibrator, an indicator light etc.
[0139] The network interface 1514 is employed to connect to a communication module (not shown in the drawings) to realize intercommunication between the current device and other devices. The communication module can realize communication in a wired mode (via USB, network cable, for example) or in a wireless mode (via mobile network, WIFI, Bluetooth, etc.).
[0140] The bus 1530 includes a passageway transmitting information between various component parts of the device (such as the processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
[0141] Additionally, the computer system 1500 may further obtain information of specific collection conditions from a virtual resource object collection condition information database 1541 for judgment on conditions, and so on.
[0142] As should be noted, although merely the processor 1510, the video display adapter 1511, the magnetic disk driver 1512, the input/output interface 1513, the network interface 1514, the memory 1520, and the bus 1530 are illustrated for the aforementioned device, the device may further include other component parts prerequisite for realizing normal running during specific implementation. In addition, as can be understood by persons skilled in the art, the aforementioned device may as well only include component parts necessary for realizing the solutions of the present application, without including the entire component parts as illustrated.
[0143] As can be known through the description to the aforementioned embodiments, it is clearly learnt by person skilled in the art that the present application can be realized through software plus a general hardware platform. Based on such understanding, the technical solutions of the present application, or the contributions made thereby over the state of the art, can be essentially embodied in the form of a software product, and such a computer software product can be stored in a storage medium, such as an ROM/RAM, a magnetic disk, an optical disk etc., and includes plural instructions enabling a computer equipment (such as a personal computer, a server, or a network device etc.) to execute the methods described in various embodiments or some sections of the embodiments of the present application.
[0144] =The various embodiments are progressively described in the Description, identical or similar sections among the various embodiments can be inferred from one another, and each embodiment stresses what is different from other embodiments.
Particularly, with respect to the system or system embodiment, since it is essentially similar to the method embodiment, its description is relatively simple, and the relevant sections thereof can be inferred from the corresponding sections of the method embodiment. The system or system embodiment as described above is merely exemplary in nature, units therein described as separate parts can be or may not be physically separate, parts displayed as units can be or may not be physical units, that is to say, they can be located in a single site, or distributed over a plurality of network units. It is possible to base on practical requirements to select partial modules or the entire modules to realize the objectives of the embodied solutions. It is understandable and implementable by persons ordinarily skilled in the art without spending creative effort in the process.
[0145] The data processing method and corresponding platform and system provided by the present application are described in detail above, specific examples are used in this paper to enunciate the principles and modes of execution of the present application, and descriptions of the aforementioned embodiments are merely meant to help understand the method and kernel conception of the present application; at the same time, to persons ordinarily skilled in the art, there may be variations in both the specific modes of execution and the range of application based on the conception of the present application.

To sum it up, the contents of the current Description shall not be understood to restrict the present application.

Claims (10)

What is claimed is:
1. A data processing method, characterized in that the method comprises:
storing primary commodity content data in a first relational database through clustering and sharding;
creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and calculating the primary commodity content data through a calculation program to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
2. The data processing method according to Claim 1, characterized in further comprising:
receiving an enquiring request of a user;
parsing the enquiring request to obtain a keyword to be enquired;
enquiring in the index database to obtain query dimension identification data corresponding to the keyword to be enquired to serve as a target identification; and enquiring in the first relational database to obtain calculation result data corresponding to the target identification.
3. The data processing method according to Claim 1, characterized in further comprising:
storing at least partial data of the calculation result data in association with the query dimension identification data in the index database.
4. The data processing method according to Claim 3, characterized in that the step of calculating the primary commodity content data through a calculation program to obtain calculation result data includes:

invoking the calculation program to calculate various dimension content quality scores of each commodity in at least two content dimensions at the primary commodity content data, and calculating a content quality total score of each commodity according to the various dimension content quality scores;
the step of storing the calculation result data in association with the query dimension identification data in the first relational database includes:
storing the various dimension content quality scores of each commodity and the content quality total score of each commodity in association with the identification data in the first relational database;
the step of storing at least partial data of the calculation result data in association with the query dimension identification data in the index database includes:
storing the content quality total score of each commodity in association with the query dimension identification data in the index database.
5. The data processing method according to any of Claims 1 to 4, characterized in that the identification data is a commodity code and/or a merchant code.
6. The data processing method according to any of Claims 1 to 4, characterized in further comprising:
receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding; and synchronizing the primary commodity content data in the second relational database to the first relational database.
7. The data processing method according to Claim 6, characterized in that the step of receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding includes:
receiving the primary commodity content data and storing the same in a second relational database through clustering and sharding according to commodity codes.
8. The data processing method according to Claim 6, characterized in that the first relational database is Hbase, the second relational database is Mysql, the calculation program is Spark, and the index database is Elasticsearch.
9. A data processing platform, characterized in that the platform comprises a data storage layer and a data calculation layer, wherein the data storage layer is employed for storing primary commodity content data in a first relational database through clustering and sharding, and creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and query dimension identification data corresponding to each keyword field; and the data calculation layer is employed for invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the query dimension identification data in the first relational database.
10. A computer system, characterized in comprising:
one or more processor(s); and a memory, associated with the one or more processor(s) for storing a program instruction that executes the following operations when read and executed by the one or more processor(s):
storing primary commodity content data in a first relational database through clustering and sharding;
creating index data according to the primary commodity content data and storing the index data in an index database, wherein the index data includes keyword fields and identit'ication data corresponding to each keyword field; and invoking a calculation program to calculate the primary commodity content data to obtain calculation result data, and storing the calculation result data in association with the identification data in the first relational database.
CA3154438A 2019-10-10 2020-06-19 Commodity content data processing method,platform and system Pending CA3154438A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910959014.9A CN110837520A (en) 2019-10-10 2019-10-10 Data processing method, platform and system
CN201910959014.9 2019-10-10
PCT/CN2020/096999 WO2021068549A1 (en) 2019-10-10 2020-06-19 Data processing method, platform and system

Publications (1)

Publication Number Publication Date
CA3154438A1 true CA3154438A1 (en) 2021-04-15

Family

ID=69575186

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3154438A Pending CA3154438A1 (en) 2019-10-10 2020-06-19 Commodity content data processing method,platform and system

Country Status (3)

Country Link
CN (1) CN110837520A (en)
CA (1) CA3154438A1 (en)
WO (1) WO2021068549A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837520A (en) * 2019-10-10 2020-02-25 苏宁云计算有限公司 Data processing method, platform and system
CN111459985B (en) * 2020-03-31 2023-10-27 美的集团股份有限公司 Identification information processing method and device
CN111651479A (en) * 2020-04-15 2020-09-11 山东中创软件工程股份有限公司 Article evaluation method, device and related equipment
CN112069021B (en) * 2020-08-21 2024-02-20 北京五八信息技术有限公司 Flow data storage method and device, electronic equipment and storage medium
CN112380276B (en) * 2021-01-15 2021-09-07 四川新网银行股份有限公司 Method for querying data by non-fragment key fields after database division and table division of distributed system
CN113961580A (en) * 2021-12-22 2022-01-21 联通智网科技股份有限公司 Data query method, service system and electronic equipment
CN115455149B (en) * 2022-09-20 2023-05-30 城云科技(中国)有限公司 Database construction method based on coding query mode and application thereof
CN117407445B (en) * 2023-10-27 2024-06-04 上海势航网络科技有限公司 Data storage method, system and storage medium for Internet of vehicles data platform
CN118012952A (en) * 2024-02-02 2024-05-10 广州今之港教育咨询有限公司 Data processing method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN105719105A (en) * 2014-12-03 2016-06-29 镇江雅迅软件有限责任公司 Inventory quick lookup method based on keywords
CN106156088B (en) * 2015-04-01 2020-02-04 阿里巴巴集团控股有限公司 Index data processing method, data query method and device
US9760637B2 (en) * 2015-09-11 2017-09-12 Skyhigh Networks, Inc. Wildcard search in encrypted text using order preserving encryption
CN108874971B (en) * 2018-06-07 2021-09-24 北京赛思信安技术股份有限公司 Tool and method applied to mass tagged entity data storage
CN110837520A (en) * 2019-10-10 2020-02-25 苏宁云计算有限公司 Data processing method, platform and system

Also Published As

Publication number Publication date
WO2021068549A1 (en) 2021-04-15
CN110837520A (en) 2020-02-25

Similar Documents

Publication Publication Date Title
CA3154438A1 (en) Commodity content data processing method,platform and system
CN108027833B (en) Method for creating structured data language query
US9053160B2 (en) Distributed, real-time online analytical processing (OLAP)
US10311062B2 (en) Filtering structured data using inexact, culture-dependent terms
US9329751B2 (en) Method and a system to generate a user interface for analytical models
US20200012638A1 (en) Search integration
US10467250B2 (en) Data model design collaboration using semantically correct collaborative objects
US7822710B1 (en) System and method for data collection
US10042921B2 (en) Robust and readily domain-adaptable natural language interface to databases
US11698918B2 (en) System and method for content-based data visualization using a universal knowledge graph
US20190310978A1 (en) Supporting a join operation against multiple nosql databases
US11372569B2 (en) De-duplication in master data management
EP3018595A1 (en) System and method for reporting multiple objects in enterprise content management
WO2021047323A1 (en) Data operation method and apparatus, and system
US10248668B2 (en) Mapping database structure to software
US20200183934A1 (en) Efficient database searching for queries using wildcards
US20110078569A1 (en) Value help user interface system and method
US20210081451A1 (en) Persisted queries and batch streaming
WO2022089235A1 (en) Product demonstration method and apparatus, computer device, and storage medium
WO2018222337A1 (en) Bulk processing of textual search engine queries
CN112214497A (en) Label processing method and device and computer system
US10769164B2 (en) Simplified access for core business with enterprise search
US9852162B2 (en) Defining a set of data across multiple databases using variables and functions
US20240104297A1 (en) Analysis of spreadsheet table in response to user input
US11797549B2 (en) Techniques for linking data to provide improved searching capabilities

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916

EEER Examination request

Effective date: 20220916