CN110851587B

CN110851587B - Commodity coding prediction model generation and commodity coding determination method, device and equipment

Info

Publication number: CN110851587B
Application number: CN201810825197.0A
Authority: CN
Inventors: 夏超
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2024-04-05
Anticipated expiration: 2038-07-25
Also published as: CN110851587A

Abstract

The application discloses a commodity coding prediction model generation method, which comprises the following steps: determining a first commodity description information sample set and a second commodity description information sample set; training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes. The method is adopted to meet the requirement of quickly determining the corresponding correct commodity codes according to the trade names.

Description

Commodity coding prediction model generation and commodity coding determination method, device and equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a commodity coding prediction model generation method, a commodity coding prediction model generation device, electronic equipment and storage equipment. The application also relates to a method, a device, electronic equipment and storage equipment for determining commodity codes; the application also relates to a method and a device for generating the commodity coding prediction model, electronic equipment and storage equipment.

Background

Currently, there are many fields in which merchants or staff are required to fill in commodity codes corresponding to commodity names and commodity names.

However, when filling the commodity code corresponding to the commodity name, the merchant usually fills the commodity code according to experience, and when filling the commodity code, the condition of filling errors frequently occurs, and once errors occur, unnecessary losses are likely to be caused. For example, month 2 of 2016, the national tax agency pushes out tax classification codes for goods and services in Beijing, shanghai, guangdong, jiangsu test points; in 2018, 1 month, commodity codes are pushed nationwide, the abbreviations of the commodity codes need to be displayed on invoices issued, the invoices with incorrect commodity codes belong to non-compliance invoices, the price is penalized, and the price is barked and the virtual statement is made. The tax commodity codes are more than 4000, so that the tax payer is not easy to select, and the tax bureau is required to judge whether the commodity codes selected by the tax payer are accurate or not.

Therefore, how to quickly determine the correct commodity code corresponding to the commodity name according to the commodity name is a problem to be solved.

Disclosure of Invention

The application provides a commodity code prediction model generation method, a commodity code prediction model generation device, a commodity code prediction model generation electronic device, a commodity code prediction model storage device, a commodity code determination method, a commodity code prediction model generation electronic device, a commodity code prediction model storage device and a commodity code determination electronic device.

The application provides a commodity coding prediction model generation method, which comprises the following steps:

determining a first commodity description information sample set and a second commodity description information sample set;

training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

Optionally, the method comprises the following steps:

the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.

Optionally, the method further comprises: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;

The first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names;

the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.

Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:

calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;

and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.

when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.

Optionally, the disambiguating the commodity original description information including the commodity name and the commodity code includes:

when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.

Optionally, the first commodity coding prediction model and the second commodity coding prediction model are fasttext models.

Optionally, the method further comprises:

increasing data information corresponding to commodity names in commodity description information in an n-gram mode to obtain commodity names of the increased data information;

the commodity description information sample is commodity description information of the added data information.

Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.

Optionally, the commodity name and the commodity code source in the commodity description information further include:

and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.

The application also provides a commodity coding prediction model generation method, which comprises the following steps:

determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;

training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.

and the commodity description information sample is the de-noised and/or disambiguated commodity description information.

The application also provides a method of determining a commodity code, comprising:

determining commodity names of commodity codes to be determined;

predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;

predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;

and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.

Optionally, the determining, according to the predicted chapter code and the commodity code set corresponding to the commodity name, the commodity code corresponding to the commodity name includes:

And judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.

The application also provides a commodity coding prediction model generation device, which comprises:

the sample set determining unit is used for determining a first commodity description information sample set and a second commodity description information sample set;

the model training unit is used for training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

The application also provides an electronic device comprising:

a processor; and

a memory for storing a program of a method of generating a commodity code prediction model, the apparatus being powered on and executing the program of the method of generating a commodity code prediction model by the processor, and executing the steps of:

The present application also provides a memory device that,

a program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:

The application also provides an apparatus for determining a commodity code, comprising:

a commodity name determining unit for determining the commodity name of the commodity code to be determined;

the chapter code prediction unit is used for predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;

the commodity code set prediction unit is used for predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;

and the commodity code prediction unit is used for determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names. .

The application also provides an electronic device comprising:

a processor; and

a memory for storing a program for determining a commodity code, the apparatus being powered on and executing the program for determining a commodity code by the processor, and performing the steps of:

determining commodity names of commodity codes to be determined;

The present application also provides a memory device that,

a program for determining a method of encoding a commodity, the program being executable by a processor to perform the steps of:

determining commodity names of commodity codes to be determined;

the sample set determining unit is used for determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;

The model training unit is used for training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.

The application also provides an electronic device comprising:

a processor; and

The present application also provides a memory device that,

Compared with the prior art, the application has the following advantages:

the application provides another commodity coding prediction model generation method, which is used for training a first commodity coding prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set, and adopting the trained first commodity code prediction model and second commodity code prediction model to meet the requirement of quickly determining the corresponding correct commodity codes according to the commodity names.

The application provides another commodity coding prediction model generation method, which comprises the steps of training a commodity coding prediction model according to a commodity description information sample set comprising commodity names and commodity codes, and adopting the trained commodity coding prediction model to meet the requirement of quickly determining the corresponding correct commodity codes according to the commodity names.

The commodity code corresponding to the commodity name is determined according to the pre-trained commodity code prediction model for predicting the commodity code, and the correct commodity code can be rapidly determined according to the commodity name, so that the problem of rapidly determining the correct commodity code corresponding to the commodity code according to the commodity name is solved.

Drawings

Fig. 1 is a flowchart of a method for generating a commodity code prediction model according to a first embodiment of the present application.

Fig. 2 is a flowchart of an example of a method for generating a commodity code prediction model according to the first embodiment of the present application.

Fig. 3 is a flowchart of a method for generating a commodity coding prediction model according to a second embodiment of the present application.

Fig. 4 is a flowchart of a method for determining commodity codes according to a third embodiment of the present application.

Fig. 5 is a schematic diagram of a generating device of a commodity code prediction model according to a fourth embodiment of the present application.

Fig. 6 is a schematic diagram of an electronic device according to a fifth embodiment of the present application.

Fig. 7 is a schematic diagram of an apparatus for determining commodity codes according to a seventh embodiment of the present application.

Fig. 8 is a schematic diagram of an electronic device according to an eighth embodiment of the present application.

Fig. 9 is a schematic diagram of a generating device of a commodity code prediction model according to a tenth embodiment of the present application.

Fig. 10 is a schematic diagram of an electronic device according to an eleventh embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present invention may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present invention is not limited to the specific embodiments disclosed below.

The first embodiment of the application provides a method for generating a commodity coding prediction model. The following describes in detail with reference to fig. 1 and 2.

As shown in fig. 1, in step S101, a first article description information sample set and a second article description information sample set are determined.

The first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name; the commodity name can comprise the name of the commodity, and can also comprise information such as the brand of the commodity, the specification of the commodity, the weight of the commodity and the like. And the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the standard commodity code table representing the correspondence between the commodity name and the commodity code may refer to a "commodity and service tax classification code table" issued by the tax bureau, the chapter code is 7 bits, the commodity code (may also be referred to as detail code) is 19 bits, if the chapter code corresponding to "wheat" is "1010101", then the commodity name "wheat" and the chapter code corresponding to "1010101" may be used as the first commodity description information sample, the commodity code corresponding to "wheat" is "10101010200000000", and the commodity name "wheat" and the commodity code corresponding to "1010101020000000000" may be used as the second commodity description information sample.

The source of the first commodity description information sample and the second commodity description information sample can be commodity invoices issued by tax payers, and the commodity names and the commodity codes can be commodity names and commodity codes on the commodity invoices issued by the tax payers. For example, if the trade name on the commodity invoice is "YT461 friend bolded L46 hanger" and the commodity code is "1070601000000000000", the commodity description information sample may include: trade name: "YT461 friend thickened L46 clothes hanger", commodity code: "1070601000000000000". The sources of the first commodity description information sample and the second commodity description information sample can be other occasions needing to determine commodity codes according to commodity names.

Preferably, denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; the first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names; the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.

Because not every commodity original description information comprising commodity names and commodity codes is suitable for generating a first commodity description information sample or a second commodity description information sample, denoising and/or disambiguating the commodity original description information comprising the commodity names and the commodity codes can be performed first to obtain the denoised and/or disambiguated commodity description information; taking the denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names as a first commodity description information sample; and taking the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names as a second commodity description information sample.

For example, when the source of the second commodity description information sample is a commodity invoice issued by the tax payer, since the commodity invoice issued by the tax payer is filled by the merchant, not the information on each commodity invoice can be used as the second commodity description information sample, the denoising or disambiguation process can be performed on the commodity description information first.

The disambiguating the commodity original description information comprising commodity names and commodity codes comprises the following steps:

For example: a tax payer selects 1080417000000000000 when filling commodity codes of a commodity name YT461 friend thickened L46 clothes hanger in 2017, and a furniture, a metal accessory for a building and a stand; metallic architectural decorations and parts thereof. The commodity code filled in by the tax payer is 1070601000000000000 by 2018, and the name of the commodity code is plastic product. The materials of the clothes hangers are different, the clothes hangers are truly classified into different categories, the plastic-covered clothes hangers are plastic products, and the aluminum alloy clothes hangers are furniture, metal accessories for buildings and frame seats; the wooden clothes rack is made of wooden tableware and related wooden products. The thickened L46 clothes hanger for the YT461 friends is actually a plastic clothes hanger, and the tax payer gradually corrects the previous wrong classification under the accumulation of long-term classification knowledge of commodity and service tax. Therefore, the latest commodity code with the same commodity name, namely the last submitted commodity code '1070601000000000000', can be filled in by the same tax payer and used as the commodity code of the 'YT 461 friend thickened L46 clothes hanger' in the example. Namely, commodity codes corresponding to commodity names YT461 friend thickened L46 clothes hangers in the commodity original description information are all modified into 1070601000000000000.

The denoising processing is carried out on the commodity original description information comprising commodity names and commodity codes to obtain denoised commodity description information, and the denoising processing comprises the following steps:

and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information. It should be noted that, the commodity code entropy value being greater than the entropy value threshold is chaotic unusable data (chaotic), and the commodity code entropy value being less than or equal to the entropy value threshold is relatively clear usable data.

For example: the trade name is "Changkang capsicum oil", wherein the commodity code for 82% invoice filling is 1030206040000000000, the commodity code is the code of "composite flavoring", the commodity code is 1030105010400000000 for 18% invoice filling, and the commodity code is the code of "other edible vegetable oil". An entropy threshold value of 0.5 is set, in this example, the entropy value of 0.47 (the calculation formula of the entropy value is-0.82 log (0.82) -0.18 log (0.18)), and the entropy value is smaller than 0.5, and the commodity code of the long-healthy capsicum oil is uniformly set to 1030206040000000000.

when the same submitter submits commodity description information for a plurality of times according to the same commodity name, carrying out weight reduction processing on the commodity original description information submitted for a plurality of times, and taking the commodity original description information subjected to weight reduction processing as the de-noised commodity description information. And performing weight reduction processing on the commodity original description information submitted for multiple times, namely reducing the number of samples of commodity original description information submitted by the same submitter aiming at the same commodity name. The original description information of the multiple commodities submitted by the same submitter aiming at the same commodity name is subjected to weight reduction processing, so that the influence of the original description information of the multiple commodities submitted by a single submitter on a sample set is avoided. Specifically, the original description information of n commodities which are submitted by the same submitter and are aimed at the same commodity name can be subjected to weight reduction processing according to log (n), and only log (n) commodity original description information which is aimed at the same commodity name is reserved as the denoised commodity description information.

The method has the advantages that cleaner commodity description information can be obtained by carrying out denoising and/or disambiguation on the commodity original description information comprising the commodity name and the commodity code, and the commodity description information after denoising and/or disambiguation is used as a commodity description information sample (comprising a first commodity description information sample and a second commodity description information sample), so that the trained commodity code prediction model can more accurately predict the commodity code corresponding to the commodity name.

In order to improve accuracy of commodity coding predicted by the commodity coding prediction model, commodity names and commodity coding sources in the commodity description information can further include: and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.

For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the commodity original description information can be subjected to expansion processing based on a commodity and service tax classification coding table and a niss commodity classification table issued by a tax bureau. The "tax and service classification code table" includes a large class of products and a code of products, and the "nice classification table" includes detailed product data. For example, a definition of "cereal" in the general category "cereal" in the tax and service classification code table "includes rice, wheat, corn, millet, sorghum, barley, oats, rye, buckwheat, other cereal"; the "rice", "barley" and "wheat" in the "nisi commodity classification table" have corresponding real commodity names. By correlation, the true commodity names of the grains can be obtained, and the commodity names are industry knowledge of the grain industry. In which the association is performed by using a similarity algorithm and manually performing accurate association.

As shown in fig. 1, in step S102, a first commodity code prediction model for predicting a chapter code corresponding to a commodity name is trained according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.

The first commodity coding prediction model and the second commodity coding prediction model form a complete commodity coding prediction model by establishing the following relationship: and comparing the chapter code predicted by the first commodity code prediction model with the commodity code predicted by the second commodity code prediction model, and determining the commodity code corresponding to the commodity name according to the comparison result.

The first commodity code prediction model and the second commodity code prediction model are obtained by training the first commodity code prediction model and the second commodity code prediction model, and when the commodity codes corresponding to commodity names are predicted by adopting the two prediction models, the predicted chapter codes can be combined with the commodity codes, so that more accurate commodity codes are obtained.

Preferably, the first commodity coding prediction model and the second commodity coding prediction model adopt fasttext models.

Because the description data corresponding to the commodity names in the first commodity description information sample and the second commodity description information sample are basically short texts, the data information corresponding to the commodity names in the commodity description information is added in an n-gram mode, and the commodity names of the added data information are obtained; the first commodity description information sample and the second commodity description information sample are commodity description information of the added data information. For example: the luxury plus coarse solid wood clothes hangers generate a sequence of the luxury plus coarse solid wood clothes hangers through the 2-gram, and more information in the sample improves the classification effect.

The hierarchical softmax is adopted to well solve the problems of multi-classification and unbalanced number of class samples. The hierarchical softmax structure constructs a Huffman tree at the output layer of the fasttext model, and the algorithm principle of the Huffman tree is the hierarchical softmax. The huffman tree is the tree with the shortest weighted path length, and the node with larger weight is closer to the root. In the application, the sample numbers of different commodity codes (classes) are different, the sample sizes of some commodity codes are large, and the sample sizes of some commodity codes are smaller. In the construction of the Huffman tree, commodity codes with large sample sizes are closer to the root node, and the probability that commodity names are classified into the class is higher, and the probability that commodity codes with small sample sizes are classified into the class is lower. The classification effect of handling sample imbalance is often not ideal without using a huffman tree of hierarchical softmax structure.

The following takes fig. 2 as an example, which is a procedure of a method for generating a commodity code prediction model according to a first embodiment of the present application.

As shown in fig. 2, in step S201, modeling is performed to determine commodity original description information 21 including commodity names and commodity codes; in step S202, denoising and/or disambiguating the commodity original description information 21 including the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; in step S203, the de-noised and/or disambiguated commodity description information is expanded, so as to obtain expanded commodity description information; thereafter, a chapter granularity sample set 22 (a first commodity description information sample set) and a detail granularity sample set 23 (a second commodity description information sample set) are generated from the expanded commodity description information; finally, training a first commodity code prediction model (fasttet model 24) for predicting the chapter code corresponding to the commodity name according to the chapter granularity sample set 22; and training a second commodity code prediction model (fasttet model 25) for predicting commodity codes according to the fine granularity sample set 23.

The second embodiment of the application provides a method for generating a commodity coding prediction model. The following describes in detail with reference to fig. 3 and 2.

As shown in fig. 3, in step S301, a commodity description information sample set is determined; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code.

The commodity description information sample comprises commodity names and commodity codes, wherein the commodity names can comprise names of commodities and can also comprise information such as brands, commodity specifications and commodity weights of the commodities.

The commodity descriptive information sample can be a commodity invoice issued by the tax payer, and the commodity name and the commodity code can be the commodity name and the commodity code on the commodity invoice issued by the tax payer. For example, if the trade name on the commodity invoice is "YT461 friend bolded L46 hanger" and the commodity code is "1070601000000000000", the commodity description information sample may include: trade name: "YT461 friend thickened L46 clothes hanger", commodity code: "1070601000000000000". The source of the commodity description information sample can be other occasions needing to determine commodity codes according to commodity names.

Preferably, denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; and the commodity description information sample is the de-noised and/or disambiguated commodity description information.

Since not every commodity original description information including commodity names and commodity codes is suitable for being used as a commodity description information sample, denoising and/or disambiguation treatment can be performed on the commodity original description information including the commodity names and the commodity codes to obtain the denoised and/or disambiguated commodity description information; and taking the de-noised and/or disambiguated commodity description information as a commodity description information sample.

For example, when the source of the commodity descriptive information sample is a commodity invoice issued by a tax payer, since the commodity invoice issued by the tax payer is filled by a merchant, not the information on each commodity invoice can be used as the commodity descriptive information sample, the denoising or disambiguation process can be performed on the commodity descriptive information first.

The method has the advantages that through denoising and/or disambiguation treatment on the commodity original description information comprising commodity names and commodity codes, cleaner commodity description information can be obtained, and the commodity description information after denoising and/or disambiguation treatment is used as a commodity description information sample, so that the trained commodity code prediction model can more accurately predict commodity codes corresponding to commodity names.

Because the accuracy of the chapter code corresponding to the predicted commodity name is higher than that of the predicted commodity code, in order to improve the accuracy of the predicted commodity code, the commodity description information sample set may include a first commodity description information sample set and a second commodity description information sample set, where the first commodity description information sample set includes at least one first commodity description information sample, the first commodity description information sample includes a commodity name and the chapter code corresponding to the commodity name, and the second commodity description information sample includes a commodity name and the commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.

As shown in fig. 3, in step S302, a commodity coding prediction model is trained according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.

When the commodity description information sample set includes a first commodity description information sample set and a second commodity description information sample set, training a commodity coding prediction model according to the commodity description information sample set, including:

training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.

Preferably, the commodity coding prediction model adopts a fastatex model.

Because the description data corresponding to the commodity name in the commodity description information sample is basically a shorter text, the data information corresponding to the commodity name in the commodity description information is added in an n-gram mode, and the commodity name of the added data information is obtained; the commodity description information sample is commodity description information of the added data information. For example: the luxury plus coarse solid wood clothes hangers generate a sequence of the luxury plus coarse solid wood clothes hangers through the 2-gram, and more information in the sample improves the classification effect.

The process of the method for generating a commodity code prediction model according to the second embodiment of the present application will be described below with reference to fig. 2.

A third embodiment of the present application provides a method of determining a commodity code. The following describes in detail with reference to fig. 2 and 4.

As shown in fig. 4, in step S401, the commodity name of the commodity code to be determined is determined.

For example, when the commodity code is determined to be "wheat", the trade name is "wheat".

As shown in fig. 4, in step S402, the chapter code corresponding to the commodity name is predicted according to the commodity name and a first commodity code prediction model trained in advance for predicting the chapter code corresponding to the commodity name.

As shown in fig. 4, in step S403, a commodity code set corresponding to the commodity name is predicted according to the commodity name and a second commodity code prediction model for predicting commodity codes, which is trained in advance.

As shown in fig. 4, in step S404, a commodity code corresponding to the commodity name is determined from the predicted chapter code and commodity code set corresponding to the commodity name.

And the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

Specifically, determining the commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code corresponding to the commodity name comprises the following steps:

Since the predicted chapter code is more accurate than the commodity code corresponding to the predicted commodity name, the commodity code corresponding to the commodity name is determined according to the predicted chapter code and the commodity code corresponding to the commodity name, and is more accurate than the commodity code predicted by the second commodity code prediction model only.

The top few commodity codes of the predicted commodity codes may be included in the commodity code set, such as the top 5 commodity codes or the top 3 commodity codes.

If the source of the commodity descriptive information sample is a commodity invoice issued by a tax payer, the standard commodity coding table representing the correspondence between commodity names and commodity codes can refer to a commodity and service tax classification coding table issued by a tax bureau, the chapter code is 7 bits, the commodity code (also called detail code) is 19 bits, and the first 7 bits of the commodity code are chapter codes. For example, the chapter code corresponding to "wheat" is "1010101", and the commodity code corresponding to "wheat" is "10101010200000000".

For example, if the chapter code predicted by the first commodity code prediction model is "1010101", and the commodity code set predicted by the second commodity code prediction model includes the first three predicted codes, "10101010200000000", "10101020200000000" and "10101030200000000", respectively, the commodity code "10101010200000000" having the same top seven bits as the predicted chapter code may be used as the commodity code corresponding to the commodity name.

As shown in fig. 2, the commodity name 26 is input into a first commodity code prediction model (fasttet model 24) for predicting a chapter code to which the commodity name corresponds; and a second commodity code prediction model (fasttext model 25) for predicting commodity codes; in step S204, a commodity code chapter (chapter code included in commodity codes) is predicted; in step S205, a top n commodity coding detail (commodity coding set) is predicted; in step S206, the commodity code set in the prediction section is output.

The fourth embodiment of the present application also provides a device for generating a commodity coding prediction model, corresponding to the method for generating a commodity coding prediction model provided in the first embodiment of the present application.

A sample set determining unit 501 configured to determine a first commodity description information sample set and a second commodity description information sample set;

the model training unit 502 is configured to train a first commodity code prediction model for predicting a chapter code corresponding to a commodity name according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

Optionally, the first commodity description information sample set includes at least one first commodity description information sample, and the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.

Optionally, the method further comprises:

the de-noising processing and/or disambiguation processing unit is used for carrying out de-noising processing and/or disambiguation processing on the commodity original description information comprising commodity names and commodity codes to obtain de-noised and/or disambiguated commodity description information;

Optionally, the drying processing unit is specifically configured to:

The disambiguation processing unit is specifically configured to:

Optionally, the method further comprises:

the information adding unit is used for adding the data information corresponding to the commodity name in the commodity description information in an n-gram mode to obtain the commodity name of the added data information;

the first commodity description information sample and/or the second commodity description information sample are commodity description information of the added data information.

It should be noted that, for the detailed description of the apparatus for generating a commodity code prediction model according to the fourth embodiment of the present application, reference may be made to the description related to the first embodiment of the present application, which is not repeated here.

Corresponding to the method for generating the commodity coding prediction model provided by the above, the fifth embodiment of the present application further provides an electronic device.

As shown in fig. 6, the electronic device includes:

a processor 601; and

a memory 602, configured to store a program for a method for generating a commodity code prediction model, wherein after the apparatus is powered on and the processor executes the program for the method for generating a commodity code prediction model, the apparatus performs the steps of:

Optionally, the electronic device further performs the following steps: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;

Optionally, the electronic device further performs the following steps:

It should be noted that, for the detailed description of the electronic device provided in the fifth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.

Corresponding to the method for generating the commodity coding prediction model provided by the above, the sixth embodiment of the present application further provides a storage device.

It should be noted that, for the detailed description of the storage device provided in the sixth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.

Corresponding to the method for determining commodity codes provided in the above, a seventh embodiment of the present application further provides an apparatus for determining commodity codes, where the apparatus includes: commodity name determining section 701, chapter coding predicting section 702, commodity coding set predicting section 703, commodity coding predicting section 704.

A commodity name determining unit 701 for determining a commodity name of a commodity code to be determined;

a chapter code prediction unit 702, configured to predict a chapter code corresponding to the commodity name according to the commodity name and a first commodity code prediction model that is trained in advance and used for predicting the chapter code corresponding to the commodity name;

a commodity code set prediction unit 703, configured to predict a commodity code set corresponding to the commodity name according to the commodity name and a second commodity code prediction model that is trained in advance and is used for predicting commodity codes;

and a commodity code prediction unit 704, configured to determine a commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code set corresponding to the commodity name.

Optionally, the commodity coding prediction unit is specifically configured to:

It should be noted that, for the detailed description of the apparatus for determining commodity codes provided in the seventh embodiment of the present application, reference may be made to the related description of the third embodiment of the present application, which is not repeated here.

Corresponding to the method for determining commodity codes provided by the above, the eighth embodiment of the present application also provides an electronic device.

As shown in fig. 8, the electronic device includes:

a processor 801; and

a memory 802 for storing a program for determining a commodity code, the apparatus being powered on and executing the program for determining a commodity code by the processor, and performing the steps of:

determining commodity names of commodity codes to be determined;

Optionally, the determining the commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code corresponding to the commodity name includes:

It should be noted that, for the detailed description of the electronic device provided in the eighth embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.

In response to the above-provided method for determining a commodity code, a ninth embodiment of the present application further provides a storage device,

Determining commodity names of commodity codes to be determined;

It should be noted that, for the detailed description of the storage device provided in the ninth embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.

Corresponding to the method for generating the commodity coding prediction model provided in the second embodiment of the present application, the tenth embodiment of the present application further provides a device for generating the commodity coding prediction model.

As shown in fig. 9, the commodity code prediction model generation apparatus includes:

a sample set determining unit 901, configured to determine a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;

The model training unit 902 is configured to train a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.

Optionally, the method further comprises:

Optionally, the drying processing unit is specifically configured to:

The disambiguation processing unit is specifically configured to:

Optionally, the commodity description information sample set includes a first commodity description information sample set and a second commodity description information sample set, the first commodity description information sample set includes at least one first commodity description information sample, the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample includes a commodity name and a commodity code corresponding to the commodity name; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.

Optionally, the model training unit is specifically configured to:

Optionally, the commodity coding prediction model is a fastatex model.

Optionally, the method further comprises:

It should be noted that, for the detailed description of the apparatus for generating a commodity code prediction model according to the tenth embodiment of the present application, reference may be made to the description related to the second embodiment of the present application, which is not repeated here.

Corresponding to the method for generating the commodity coding prediction model provided by the above, the eleventh embodiment of the present application further provides an electronic device.

As shown in fig. 10, the electronic device includes:

A processor 1001; and

a memory 1002 for storing a program of a method for generating a commodity code prediction model, the apparatus being powered on and executing the program of the method for generating a commodity code prediction model by the processor, and executing the steps of:

Optionally, training a commodity coding prediction model according to the commodity description information sample set includes:

Optionally, the commodity coding prediction model is a fastatex model.

Optionally, the electronic device further performs the following steps:

It should be noted that, for the detailed description of the electronic device provided in the eleventh embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.

Corresponding to the method for generating the commodity coding prediction model provided by the above, the twelfth embodiment of the present application further provides a storage device.

While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the invention, so that the scope of the invention shall be defined by the appended claims.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. The commodity coding prediction model generation method is characterized by comprising the following steps: the commodity coding prediction model comprises a first commodity coding prediction model and a second commodity coding prediction model;

training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;

2. The method as recited in claim 1, further comprising: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;

3. The method according to claim 2, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:

4. The method according to claim 2, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:

5. The method of claim 2, wherein disambiguating the commodity original description information including commodity names and commodity codes comprises:

6. The method of claim 1, wherein the first commodity coding prediction model and the second commodity coding prediction model are fasttet models.

7. The method as recited in claim 6, further comprising:

8. The method of claim 1, wherein the source of the commodity name and commodity code in the commodity descriptive information comprises: commodity names and commodity codes on commodity invoices made by tax payers.

9. The method of claim 8, wherein the source of the commodity name and commodity code in the commodity descriptive information further comprises:

10. The commodity coding prediction model generation method is characterized by comprising the following steps:

Training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.

11. The method as recited in claim 10, further comprising: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;

12. The method of claim 11, wherein denoising the original description information of the commodity including the commodity name and the commodity code to obtain denoised commodity description information, comprises:

13. The method of claim 11, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:

14. A method of determining a commodity code, comprising:

determining commodity names of commodity codes to be determined;

predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;

Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;

15. The method of claim 14, wherein the determining the commodity code corresponding to the commodity name from the predicted set of commodity codes corresponding to the chapter code and the commodity name comprises:

16. A commodity code prediction model generation apparatus, comprising: the commodity coding prediction model comprises a first commodity coding prediction model and a second commodity coding prediction model;

the model training unit is used for training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;

17. An electronic device, comprising:

a processor; and

a memory for storing a program of a method of generating a commodity code prediction model including a first commodity code prediction model and a second commodity code prediction model; after the equipment is electrified and the program of the commodity coding prediction model generating method is run through the processor, the following steps are executed:

18. A memory device, characterized in that,

a program storing a method for generating a commodity code prediction model, the commodity code prediction model including a first commodity code prediction model and a second commodity code prediction model; the program is executed by a processor and performs the steps of:

19. An apparatus for determining a commodity code, comprising:

the chapter code prediction unit is used for predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;

The commodity code set prediction unit is used for predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;

and the commodity code prediction unit is used for determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.

20. An electronic device, comprising:

a processor; and

determining commodity names of commodity codes to be determined;

21. A memory device, characterized in that,

determining commodity names of commodity codes to be determined;

22. A commodity code prediction model generation apparatus, comprising:

the model training unit is used for training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.

23. An electronic device, comprising:

a processor; and

24. A memory device, characterized in that,