CN110851587B - Commodity coding prediction model generation and commodity coding determination method, device and equipment - Google Patents

Commodity coding prediction model generation and commodity coding determination method, device and equipment Download PDF

Info

Publication number
CN110851587B
CN110851587B CN201810825197.0A CN201810825197A CN110851587B CN 110851587 B CN110851587 B CN 110851587B CN 201810825197 A CN201810825197 A CN 201810825197A CN 110851587 B CN110851587 B CN 110851587B
Authority
CN
China
Prior art keywords
commodity
description information
code
name
information sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810825197.0A
Other languages
Chinese (zh)
Other versions
CN110851587A (en
Inventor
夏超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810825197.0A priority Critical patent/CN110851587B/en
Publication of CN110851587A publication Critical patent/CN110851587A/en
Application granted granted Critical
Publication of CN110851587B publication Critical patent/CN110851587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies

Landscapes

  • Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a commodity coding prediction model generation method, which comprises the following steps: determining a first commodity description information sample set and a second commodity description information sample set; training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes. The method is adopted to meet the requirement of quickly determining the corresponding correct commodity codes according to the trade names.

Description

Commodity coding prediction model generation and commodity coding determination method, device and equipment
Technical Field
The application relates to the field of artificial intelligence, in particular to a commodity coding prediction model generation method, a commodity coding prediction model generation device, electronic equipment and storage equipment. The application also relates to a method, a device, electronic equipment and storage equipment for determining commodity codes; the application also relates to a method and a device for generating the commodity coding prediction model, electronic equipment and storage equipment.
Background
Currently, there are many fields in which merchants or staff are required to fill in commodity codes corresponding to commodity names and commodity names.
However, when filling the commodity code corresponding to the commodity name, the merchant usually fills the commodity code according to experience, and when filling the commodity code, the condition of filling errors frequently occurs, and once errors occur, unnecessary losses are likely to be caused. For example, month 2 of 2016, the national tax agency pushes out tax classification codes for goods and services in Beijing, shanghai, guangdong, jiangsu test points; in 2018, 1 month, commodity codes are pushed nationwide, the abbreviations of the commodity codes need to be displayed on invoices issued, the invoices with incorrect commodity codes belong to non-compliance invoices, the price is penalized, and the price is barked and the virtual statement is made. The tax commodity codes are more than 4000, so that the tax payer is not easy to select, and the tax bureau is required to judge whether the commodity codes selected by the tax payer are accurate or not.
Therefore, how to quickly determine the correct commodity code corresponding to the commodity name according to the commodity name is a problem to be solved.
Disclosure of Invention
The application provides a commodity code prediction model generation method, a commodity code prediction model generation device, a commodity code prediction model generation electronic device, a commodity code prediction model storage device, a commodity code determination method, a commodity code prediction model generation electronic device, a commodity code prediction model storage device and a commodity code determination electronic device.
The application provides a commodity coding prediction model generation method, which comprises the following steps:
determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Optionally, the method comprises the following steps:
the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
Optionally, the method further comprises: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
The first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names;
the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
Optionally, the disambiguating the commodity original description information including the commodity name and the commodity code includes:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
Optionally, the first commodity coding prediction model and the second commodity coding prediction model are fasttext models.
Optionally, the method further comprises:
increasing data information corresponding to commodity names in commodity description information in an n-gram mode to obtain commodity names of the increased data information;
the commodity description information sample is commodity description information of the added data information.
Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.
Optionally, the commodity name and the commodity code source in the commodity description information further include:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
The application also provides a commodity coding prediction model generation method, which comprises the following steps:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
Optionally, the method further comprises: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
and the commodity description information sample is the de-noised and/or disambiguated commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
The application also provides a method of determining a commodity code, comprising:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
Optionally, the determining, according to the predicted chapter code and the commodity code set corresponding to the commodity name, the commodity code corresponding to the commodity name includes:
And judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.
The application also provides a commodity coding prediction model generation device, which comprises:
the sample set determining unit is used for determining a first commodity description information sample set and a second commodity description information sample set;
the model training unit is used for training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
The application also provides an electronic device comprising:
a processor; and
a memory for storing a program of a method of generating a commodity code prediction model, the apparatus being powered on and executing the program of the method of generating a commodity code prediction model by the processor, and executing the steps of:
Determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
The present application also provides a memory device that,
a program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:
determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
The application also provides an apparatus for determining a commodity code, comprising:
a commodity name determining unit for determining the commodity name of the commodity code to be determined;
the chapter code prediction unit is used for predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
the commodity code set prediction unit is used for predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and the commodity code prediction unit is used for determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names. .
The application also provides an electronic device comprising:
a processor; and
a memory for storing a program for determining a commodity code, the apparatus being powered on and executing the program for determining a commodity code by the processor, and performing the steps of:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
The present application also provides a memory device that,
a program for determining a method of encoding a commodity, the program being executable by a processor to perform the steps of:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
The application also provides a commodity coding prediction model generation device, which comprises:
the sample set determining unit is used for determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
The model training unit is used for training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
The application also provides an electronic device comprising:
a processor; and
a memory for storing a program of a method of generating a commodity code prediction model, the apparatus being powered on and executing the program of the method of generating a commodity code prediction model by the processor, and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
The present application also provides a memory device that,
a program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
Training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
Compared with the prior art, the application has the following advantages:
the application provides another commodity coding prediction model generation method, which is used for training a first commodity coding prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set, and adopting the trained first commodity code prediction model and second commodity code prediction model to meet the requirement of quickly determining the corresponding correct commodity codes according to the commodity names.
The application provides another commodity coding prediction model generation method, which comprises the steps of training a commodity coding prediction model according to a commodity description information sample set comprising commodity names and commodity codes, and adopting the trained commodity coding prediction model to meet the requirement of quickly determining the corresponding correct commodity codes according to the commodity names.
The commodity code corresponding to the commodity name is determined according to the pre-trained commodity code prediction model for predicting the commodity code, and the correct commodity code can be rapidly determined according to the commodity name, so that the problem of rapidly determining the correct commodity code corresponding to the commodity code according to the commodity name is solved.
Drawings
Fig. 1 is a flowchart of a method for generating a commodity code prediction model according to a first embodiment of the present application.
Fig. 2 is a flowchart of an example of a method for generating a commodity code prediction model according to the first embodiment of the present application.
Fig. 3 is a flowchart of a method for generating a commodity coding prediction model according to a second embodiment of the present application.
Fig. 4 is a flowchart of a method for determining commodity codes according to a third embodiment of the present application.
Fig. 5 is a schematic diagram of a generating device of a commodity code prediction model according to a fourth embodiment of the present application.
Fig. 6 is a schematic diagram of an electronic device according to a fifth embodiment of the present application.
Fig. 7 is a schematic diagram of an apparatus for determining commodity codes according to a seventh embodiment of the present application.
Fig. 8 is a schematic diagram of an electronic device according to an eighth embodiment of the present application.
Fig. 9 is a schematic diagram of a generating device of a commodity code prediction model according to a tenth embodiment of the present application.
Fig. 10 is a schematic diagram of an electronic device according to an eleventh embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present invention may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present invention is not limited to the specific embodiments disclosed below.
The first embodiment of the application provides a method for generating a commodity coding prediction model. The following describes in detail with reference to fig. 1 and 2.
As shown in fig. 1, in step S101, a first article description information sample set and a second article description information sample set are determined.
The first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name; the commodity name can comprise the name of the commodity, and can also comprise information such as the brand of the commodity, the specification of the commodity, the weight of the commodity and the like. And the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the standard commodity code table representing the correspondence between the commodity name and the commodity code may refer to a "commodity and service tax classification code table" issued by the tax bureau, the chapter code is 7 bits, the commodity code (may also be referred to as detail code) is 19 bits, if the chapter code corresponding to "wheat" is "1010101", then the commodity name "wheat" and the chapter code corresponding to "1010101" may be used as the first commodity description information sample, the commodity code corresponding to "wheat" is "10101010200000000", and the commodity name "wheat" and the commodity code corresponding to "1010101020000000000" may be used as the second commodity description information sample.
The source of the first commodity description information sample and the second commodity description information sample can be commodity invoices issued by tax payers, and the commodity names and the commodity codes can be commodity names and commodity codes on the commodity invoices issued by the tax payers. For example, if the trade name on the commodity invoice is "YT461 friend bolded L46 hanger" and the commodity code is "1070601000000000000", the commodity description information sample may include: trade name: "YT461 friend thickened L46 clothes hanger", commodity code: "1070601000000000000". The sources of the first commodity description information sample and the second commodity description information sample can be other occasions needing to determine commodity codes according to commodity names.
Preferably, denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; the first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names; the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.
Because not every commodity original description information comprising commodity names and commodity codes is suitable for generating a first commodity description information sample or a second commodity description information sample, denoising and/or disambiguating the commodity original description information comprising the commodity names and the commodity codes can be performed first to obtain the denoised and/or disambiguated commodity description information; taking the denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names as a first commodity description information sample; and taking the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names as a second commodity description information sample.
For example, when the source of the second commodity description information sample is a commodity invoice issued by the tax payer, since the commodity invoice issued by the tax payer is filled by the merchant, not the information on each commodity invoice can be used as the second commodity description information sample, the denoising or disambiguation process can be performed on the commodity description information first.
The disambiguating the commodity original description information comprising commodity names and commodity codes comprises the following steps:
When the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
For example: a tax payer selects 1080417000000000000 when filling commodity codes of a commodity name YT461 friend thickened L46 clothes hanger in 2017, and a furniture, a metal accessory for a building and a stand; metallic architectural decorations and parts thereof. The commodity code filled in by the tax payer is 1070601000000000000 by 2018, and the name of the commodity code is plastic product. The materials of the clothes hangers are different, the clothes hangers are truly classified into different categories, the plastic-covered clothes hangers are plastic products, and the aluminum alloy clothes hangers are furniture, metal accessories for buildings and frame seats; the wooden clothes rack is made of wooden tableware and related wooden products. The thickened L46 clothes hanger for the YT461 friends is actually a plastic clothes hanger, and the tax payer gradually corrects the previous wrong classification under the accumulation of long-term classification knowledge of commodity and service tax. Therefore, the latest commodity code with the same commodity name, namely the last submitted commodity code '1070601000000000000', can be filled in by the same tax payer and used as the commodity code of the 'YT 461 friend thickened L46 clothes hanger' in the example. Namely, commodity codes corresponding to commodity names YT461 friend thickened L46 clothes hangers in the commodity original description information are all modified into 1070601000000000000.
The denoising processing is carried out on the commodity original description information comprising commodity names and commodity codes to obtain denoised commodity description information, and the denoising processing comprises the following steps:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information. It should be noted that, the commodity code entropy value being greater than the entropy value threshold is chaotic unusable data (chaotic), and the commodity code entropy value being less than or equal to the entropy value threshold is relatively clear usable data.
For example: the trade name is "Changkang capsicum oil", wherein the commodity code for 82% invoice filling is 1030206040000000000, the commodity code is the code of "composite flavoring", the commodity code is 1030105010400000000 for 18% invoice filling, and the commodity code is the code of "other edible vegetable oil". An entropy threshold value of 0.5 is set, in this example, the entropy value of 0.47 (the calculation formula of the entropy value is-0.82 log (0.82) -0.18 log (0.18)), and the entropy value is smaller than 0.5, and the commodity code of the long-healthy capsicum oil is uniformly set to 1030206040000000000.
The denoising processing is carried out on the commodity original description information comprising commodity names and commodity codes to obtain denoised commodity description information, and the denoising processing comprises the following steps:
when the same submitter submits commodity description information for a plurality of times according to the same commodity name, carrying out weight reduction processing on the commodity original description information submitted for a plurality of times, and taking the commodity original description information subjected to weight reduction processing as the de-noised commodity description information. And performing weight reduction processing on the commodity original description information submitted for multiple times, namely reducing the number of samples of commodity original description information submitted by the same submitter aiming at the same commodity name. The original description information of the multiple commodities submitted by the same submitter aiming at the same commodity name is subjected to weight reduction processing, so that the influence of the original description information of the multiple commodities submitted by a single submitter on a sample set is avoided. Specifically, the original description information of n commodities which are submitted by the same submitter and are aimed at the same commodity name can be subjected to weight reduction processing according to log (n), and only log (n) commodity original description information which is aimed at the same commodity name is reserved as the denoised commodity description information.
The method has the advantages that cleaner commodity description information can be obtained by carrying out denoising and/or disambiguation on the commodity original description information comprising the commodity name and the commodity code, and the commodity description information after denoising and/or disambiguation is used as a commodity description information sample (comprising a first commodity description information sample and a second commodity description information sample), so that the trained commodity code prediction model can more accurately predict the commodity code corresponding to the commodity name.
In order to improve accuracy of commodity coding predicted by the commodity coding prediction model, commodity names and commodity coding sources in the commodity description information can further include: and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the commodity original description information can be subjected to expansion processing based on a commodity and service tax classification coding table and a niss commodity classification table issued by a tax bureau. The "tax and service classification code table" includes a large class of products and a code of products, and the "nice classification table" includes detailed product data. For example, a definition of "cereal" in the general category "cereal" in the tax and service classification code table "includes rice, wheat, corn, millet, sorghum, barley, oats, rye, buckwheat, other cereal"; the "rice", "barley" and "wheat" in the "nisi commodity classification table" have corresponding real commodity names. By correlation, the true commodity names of the grains can be obtained, and the commodity names are industry knowledge of the grain industry. In which the association is performed by using a similarity algorithm and manually performing accurate association.
As shown in fig. 1, in step S102, a first commodity code prediction model for predicting a chapter code corresponding to a commodity name is trained according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.
The first commodity coding prediction model and the second commodity coding prediction model form a complete commodity coding prediction model by establishing the following relationship: and comparing the chapter code predicted by the first commodity code prediction model with the commodity code predicted by the second commodity code prediction model, and determining the commodity code corresponding to the commodity name according to the comparison result.
The first commodity code prediction model and the second commodity code prediction model are obtained by training the first commodity code prediction model and the second commodity code prediction model, and when the commodity codes corresponding to commodity names are predicted by adopting the two prediction models, the predicted chapter codes can be combined with the commodity codes, so that more accurate commodity codes are obtained.
Preferably, the first commodity coding prediction model and the second commodity coding prediction model adopt fasttext models.
Because the description data corresponding to the commodity names in the first commodity description information sample and the second commodity description information sample are basically short texts, the data information corresponding to the commodity names in the commodity description information is added in an n-gram mode, and the commodity names of the added data information are obtained; the first commodity description information sample and the second commodity description information sample are commodity description information of the added data information. For example: the luxury plus coarse solid wood clothes hangers generate a sequence of the luxury plus coarse solid wood clothes hangers through the 2-gram, and more information in the sample improves the classification effect.
The hierarchical softmax is adopted to well solve the problems of multi-classification and unbalanced number of class samples. The hierarchical softmax structure constructs a Huffman tree at the output layer of the fasttext model, and the algorithm principle of the Huffman tree is the hierarchical softmax. The huffman tree is the tree with the shortest weighted path length, and the node with larger weight is closer to the root. In the application, the sample numbers of different commodity codes (classes) are different, the sample sizes of some commodity codes are large, and the sample sizes of some commodity codes are smaller. In the construction of the Huffman tree, commodity codes with large sample sizes are closer to the root node, and the probability that commodity names are classified into the class is higher, and the probability that commodity codes with small sample sizes are classified into the class is lower. The classification effect of handling sample imbalance is often not ideal without using a huffman tree of hierarchical softmax structure.
The following takes fig. 2 as an example, which is a procedure of a method for generating a commodity code prediction model according to a first embodiment of the present application.
As shown in fig. 2, in step S201, modeling is performed to determine commodity original description information 21 including commodity names and commodity codes; in step S202, denoising and/or disambiguating the commodity original description information 21 including the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; in step S203, the de-noised and/or disambiguated commodity description information is expanded, so as to obtain expanded commodity description information; thereafter, a chapter granularity sample set 22 (a first commodity description information sample set) and a detail granularity sample set 23 (a second commodity description information sample set) are generated from the expanded commodity description information; finally, training a first commodity code prediction model (fasttet model 24) for predicting the chapter code corresponding to the commodity name according to the chapter granularity sample set 22; and training a second commodity code prediction model (fasttet model 25) for predicting commodity codes according to the fine granularity sample set 23.
The second embodiment of the application provides a method for generating a commodity coding prediction model. The following describes in detail with reference to fig. 3 and 2.
As shown in fig. 3, in step S301, a commodity description information sample set is determined; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code.
The commodity description information sample comprises commodity names and commodity codes, wherein the commodity names can comprise names of commodities and can also comprise information such as brands, commodity specifications and commodity weights of the commodities.
The commodity descriptive information sample can be a commodity invoice issued by the tax payer, and the commodity name and the commodity code can be the commodity name and the commodity code on the commodity invoice issued by the tax payer. For example, if the trade name on the commodity invoice is "YT461 friend bolded L46 hanger" and the commodity code is "1070601000000000000", the commodity description information sample may include: trade name: "YT461 friend thickened L46 clothes hanger", commodity code: "1070601000000000000". The source of the commodity description information sample can be other occasions needing to determine commodity codes according to commodity names.
Preferably, denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; and the commodity description information sample is the de-noised and/or disambiguated commodity description information.
Since not every commodity original description information including commodity names and commodity codes is suitable for being used as a commodity description information sample, denoising and/or disambiguation treatment can be performed on the commodity original description information including the commodity names and the commodity codes to obtain the denoised and/or disambiguated commodity description information; and taking the de-noised and/or disambiguated commodity description information as a commodity description information sample.
For example, when the source of the commodity descriptive information sample is a commodity invoice issued by a tax payer, since the commodity invoice issued by the tax payer is filled by a merchant, not the information on each commodity invoice can be used as the commodity descriptive information sample, the denoising or disambiguation process can be performed on the commodity descriptive information first.
The disambiguating the commodity original description information comprising commodity names and commodity codes comprises the following steps:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
For example: a tax payer selects 1080417000000000000 when filling commodity codes of a commodity name YT461 friend thickened L46 clothes hanger in 2017, and a furniture, a metal accessory for a building and a stand; metallic architectural decorations and parts thereof. The commodity code filled in by the tax payer is 1070601000000000000 by 2018, and the name of the commodity code is plastic product. The materials of the clothes hangers are different, the clothes hangers are truly classified into different categories, the plastic-covered clothes hangers are plastic products, and the aluminum alloy clothes hangers are furniture, metal accessories for buildings and frame seats; the wooden clothes rack is made of wooden tableware and related wooden products. The thickened L46 clothes hanger for the YT461 friends is actually a plastic clothes hanger, and the tax payer gradually corrects the previous wrong classification under the accumulation of long-term classification knowledge of commodity and service tax. Therefore, the latest commodity code with the same commodity name, namely the last submitted commodity code '1070601000000000000', can be filled in by the same tax payer and used as the commodity code of the 'YT 461 friend thickened L46 clothes hanger' in the example. Namely, commodity codes corresponding to commodity names YT461 friend thickened L46 clothes hangers in the commodity original description information are all modified into 1070601000000000000.
The denoising processing is carried out on the commodity original description information comprising commodity names and commodity codes to obtain denoised commodity description information, and the denoising processing comprises the following steps:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information. It should be noted that, the commodity code entropy value being greater than the entropy value threshold is chaotic unusable data (chaotic), and the commodity code entropy value being less than or equal to the entropy value threshold is relatively clear usable data.
For example: the trade name is "Changkang capsicum oil", wherein the commodity code for 82% invoice filling is 1030206040000000000, the commodity code is the code of "composite flavoring", the commodity code is 1030105010400000000 for 18% invoice filling, and the commodity code is the code of "other edible vegetable oil". An entropy threshold value of 0.5 is set, in this example, the entropy value of 0.47 (the calculation formula of the entropy value is-0.82 log (0.82) -0.18 log (0.18)), and the entropy value is smaller than 0.5, and the commodity code of the long-healthy capsicum oil is uniformly set to 1030206040000000000.
The denoising processing is carried out on the commodity original description information comprising commodity names and commodity codes to obtain denoised commodity description information, and the denoising processing comprises the following steps:
when the same submitter submits commodity description information for a plurality of times according to the same commodity name, carrying out weight reduction processing on the commodity original description information submitted for a plurality of times, and taking the commodity original description information subjected to weight reduction processing as the de-noised commodity description information. And performing weight reduction processing on the commodity original description information submitted for multiple times, namely reducing the number of samples of commodity original description information submitted by the same submitter aiming at the same commodity name. The original description information of the multiple commodities submitted by the same submitter aiming at the same commodity name is subjected to weight reduction processing, so that the influence of the original description information of the multiple commodities submitted by a single submitter on a sample set is avoided. Specifically, the original description information of n commodities which are submitted by the same submitter and are aimed at the same commodity name can be subjected to weight reduction processing according to log (n), and only log (n) commodity original description information which is aimed at the same commodity name is reserved as the denoised commodity description information.
The method has the advantages that through denoising and/or disambiguation treatment on the commodity original description information comprising commodity names and commodity codes, cleaner commodity description information can be obtained, and the commodity description information after denoising and/or disambiguation treatment is used as a commodity description information sample, so that the trained commodity code prediction model can more accurately predict commodity codes corresponding to commodity names.
In order to improve accuracy of commodity coding predicted by the commodity coding prediction model, commodity names and commodity coding sources in the commodity description information can further include: and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the commodity original description information can be subjected to expansion processing based on a commodity and service tax classification coding table and a niss commodity classification table issued by a tax bureau. The "tax and service classification code table" includes a large class of products and a code of products, and the "nice classification table" includes detailed product data. For example, a definition of "cereal" in the general category "cereal" in the tax and service classification code table "includes rice, wheat, corn, millet, sorghum, barley, oats, rye, buckwheat, other cereal"; the "rice", "barley" and "wheat" in the "nisi commodity classification table" have corresponding real commodity names. By correlation, the true commodity names of the grains can be obtained, and the commodity names are industry knowledge of the grain industry. In which the association is performed by using a similarity algorithm and manually performing accurate association.
Because the accuracy of the chapter code corresponding to the predicted commodity name is higher than that of the predicted commodity code, in order to improve the accuracy of the predicted commodity code, the commodity description information sample set may include a first commodity description information sample set and a second commodity description information sample set, where the first commodity description information sample set includes at least one first commodity description information sample, the first commodity description information sample includes a commodity name and the chapter code corresponding to the commodity name, and the second commodity description information sample includes a commodity name and the commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.
For example, when the source of the commodity description information sample is a commodity invoice issued by a tax payer, the standard commodity code table representing the correspondence between the commodity name and the commodity code may refer to a "commodity and service tax classification code table" issued by the tax bureau, the chapter code is 7 bits, the commodity code (may also be referred to as detail code) is 19 bits, if the chapter code corresponding to "wheat" is "1010101", then the commodity name "wheat" and the chapter code corresponding to "1010101" may be used as the first commodity description information sample, the commodity code corresponding to "wheat" is "10101010200000000", and the commodity name "wheat" and the commodity code corresponding to "1010101020000000000" may be used as the second commodity description information sample.
As shown in fig. 3, in step S302, a commodity coding prediction model is trained according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
When the commodity description information sample set includes a first commodity description information sample set and a second commodity description information sample set, training a commodity coding prediction model according to the commodity description information sample set, including:
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.
The first commodity code prediction model and the second commodity code prediction model are obtained by training the first commodity code prediction model and the second commodity code prediction model, and when the commodity codes corresponding to commodity names are predicted by adopting the two prediction models, the predicted chapter codes can be combined with the commodity codes, so that more accurate commodity codes are obtained.
Preferably, the commodity coding prediction model adopts a fastatex model.
Because the description data corresponding to the commodity name in the commodity description information sample is basically a shorter text, the data information corresponding to the commodity name in the commodity description information is added in an n-gram mode, and the commodity name of the added data information is obtained; the commodity description information sample is commodity description information of the added data information. For example: the luxury plus coarse solid wood clothes hangers generate a sequence of the luxury plus coarse solid wood clothes hangers through the 2-gram, and more information in the sample improves the classification effect.
The hierarchical softmax is adopted to well solve the problems of multi-classification and unbalanced number of class samples. The hierarchical softmax structure constructs a Huffman tree at the output layer of the fasttext model, and the algorithm principle of the Huffman tree is the hierarchical softmax. The huffman tree is the tree with the shortest weighted path length, and the node with larger weight is closer to the root. In the application, the sample numbers of different commodity codes (classes) are different, the sample sizes of some commodity codes are large, and the sample sizes of some commodity codes are smaller. In the construction of the Huffman tree, commodity codes with large sample sizes are closer to the root node, and the probability that commodity names are classified into the class is higher, and the probability that commodity codes with small sample sizes are classified into the class is lower. The classification effect of handling sample imbalance is often not ideal without using a huffman tree of hierarchical softmax structure.
The process of the method for generating a commodity code prediction model according to the second embodiment of the present application will be described below with reference to fig. 2.
As shown in fig. 2, in step S201, modeling is performed to determine commodity original description information 21 including commodity names and commodity codes; in step S202, denoising and/or disambiguating the commodity original description information 21 including the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information; in step S203, the de-noised and/or disambiguated commodity description information is expanded, so as to obtain expanded commodity description information; thereafter, a chapter granularity sample set 22 (a first commodity description information sample set) and a detail granularity sample set 23 (a second commodity description information sample set) are generated from the expanded commodity description information; finally, training a first commodity code prediction model (fasttet model 24) for predicting the chapter code corresponding to the commodity name according to the chapter granularity sample set 22; and training a second commodity code prediction model (fasttet model 25) for predicting commodity codes according to the fine granularity sample set 23.
A third embodiment of the present application provides a method of determining a commodity code. The following describes in detail with reference to fig. 2 and 4.
As shown in fig. 4, in step S401, the commodity name of the commodity code to be determined is determined.
For example, when the commodity code is determined to be "wheat", the trade name is "wheat".
As shown in fig. 4, in step S402, the chapter code corresponding to the commodity name is predicted according to the commodity name and a first commodity code prediction model trained in advance for predicting the chapter code corresponding to the commodity name.
As shown in fig. 4, in step S403, a commodity code set corresponding to the commodity name is predicted according to the commodity name and a second commodity code prediction model for predicting commodity codes, which is trained in advance.
As shown in fig. 4, in step S404, a commodity code corresponding to the commodity name is determined from the predicted chapter code and commodity code set corresponding to the commodity name.
And the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Specifically, determining the commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code corresponding to the commodity name comprises the following steps:
And judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.
Since the predicted chapter code is more accurate than the commodity code corresponding to the predicted commodity name, the commodity code corresponding to the commodity name is determined according to the predicted chapter code and the commodity code corresponding to the commodity name, and is more accurate than the commodity code predicted by the second commodity code prediction model only.
The top few commodity codes of the predicted commodity codes may be included in the commodity code set, such as the top 5 commodity codes or the top 3 commodity codes.
If the source of the commodity descriptive information sample is a commodity invoice issued by a tax payer, the standard commodity coding table representing the correspondence between commodity names and commodity codes can refer to a commodity and service tax classification coding table issued by a tax bureau, the chapter code is 7 bits, the commodity code (also called detail code) is 19 bits, and the first 7 bits of the commodity code are chapter codes. For example, the chapter code corresponding to "wheat" is "1010101", and the commodity code corresponding to "wheat" is "10101010200000000".
For example, if the chapter code predicted by the first commodity code prediction model is "1010101", and the commodity code set predicted by the second commodity code prediction model includes the first three predicted codes, "10101010200000000", "10101020200000000" and "10101030200000000", respectively, the commodity code "10101010200000000" having the same top seven bits as the predicted chapter code may be used as the commodity code corresponding to the commodity name.
As shown in fig. 2, the commodity name 26 is input into a first commodity code prediction model (fasttet model 24) for predicting a chapter code to which the commodity name corresponds; and a second commodity code prediction model (fasttext model 25) for predicting commodity codes; in step S204, a commodity code chapter (chapter code included in commodity codes) is predicted; in step S205, a top n commodity coding detail (commodity coding set) is predicted; in step S206, the commodity code set in the prediction section is output.
The fourth embodiment of the present application also provides a device for generating a commodity coding prediction model, corresponding to the method for generating a commodity coding prediction model provided in the first embodiment of the present application.
A sample set determining unit 501 configured to determine a first commodity description information sample set and a second commodity description information sample set;
the model training unit 502 is configured to train a first commodity code prediction model for predicting a chapter code corresponding to a commodity name according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Optionally, the first commodity description information sample set includes at least one first commodity description information sample, and the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
Optionally, the method further comprises:
the de-noising processing and/or disambiguation processing unit is used for carrying out de-noising processing and/or disambiguation processing on the commodity original description information comprising commodity names and commodity codes to obtain de-noised and/or disambiguated commodity description information;
The first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names;
the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.
Optionally, the drying processing unit is specifically configured to:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the drying processing unit is specifically configured to:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
The disambiguation processing unit is specifically configured to:
When the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
Optionally, the first commodity coding prediction model and the second commodity coding prediction model are fasttext models.
Optionally, the method further comprises:
the information adding unit is used for adding the data information corresponding to the commodity name in the commodity description information in an n-gram mode to obtain the commodity name of the added data information;
the first commodity description information sample and/or the second commodity description information sample are commodity description information of the added data information.
Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.
Optionally, the commodity name and the commodity code source in the commodity description information further include:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
It should be noted that, for the detailed description of the apparatus for generating a commodity code prediction model according to the fourth embodiment of the present application, reference may be made to the description related to the first embodiment of the present application, which is not repeated here.
Corresponding to the method for generating the commodity coding prediction model provided by the above, the fifth embodiment of the present application further provides an electronic device.
As shown in fig. 6, the electronic device includes:
a processor 601; and
a memory 602, configured to store a program for a method for generating a commodity code prediction model, wherein after the apparatus is powered on and the processor executes the program for the method for generating a commodity code prediction model, the apparatus performs the steps of:
determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Optionally, the first commodity description information sample set includes at least one first commodity description information sample, and the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
Optionally, the electronic device further performs the following steps: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
the first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names;
the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
When the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
Optionally, the disambiguating the commodity original description information including the commodity name and the commodity code includes:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
Optionally, the first commodity coding prediction model and the second commodity coding prediction model are fasttext models.
Optionally, the electronic device further performs the following steps:
increasing data information corresponding to commodity names in commodity description information in an n-gram mode to obtain commodity names of the increased data information;
the first commodity description information sample and/or the second commodity description information sample are commodity description information of the added data information.
Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.
Optionally, the commodity name and the commodity code source in the commodity description information further include:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
It should be noted that, for the detailed description of the electronic device provided in the fifth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.
Corresponding to the method for generating the commodity coding prediction model provided by the above, the sixth embodiment of the present application further provides a storage device.
A program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:
determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
It should be noted that, for the detailed description of the storage device provided in the sixth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.
Corresponding to the method for determining commodity codes provided in the above, a seventh embodiment of the present application further provides an apparatus for determining commodity codes, where the apparatus includes: commodity name determining section 701, chapter coding predicting section 702, commodity coding set predicting section 703, commodity coding predicting section 704.
A commodity name determining unit 701 for determining a commodity name of a commodity code to be determined;
a chapter code prediction unit 702, configured to predict a chapter code corresponding to the commodity name according to the commodity name and a first commodity code prediction model that is trained in advance and used for predicting the chapter code corresponding to the commodity name;
a commodity code set prediction unit 703, configured to predict a commodity code set corresponding to the commodity name according to the commodity name and a second commodity code prediction model that is trained in advance and is used for predicting commodity codes;
and a commodity code prediction unit 704, configured to determine a commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code set corresponding to the commodity name.
Optionally, the commodity coding prediction unit is specifically configured to:
and judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.
It should be noted that, for the detailed description of the apparatus for determining commodity codes provided in the seventh embodiment of the present application, reference may be made to the related description of the third embodiment of the present application, which is not repeated here.
Corresponding to the method for determining commodity codes provided by the above, the eighth embodiment of the present application also provides an electronic device.
As shown in fig. 8, the electronic device includes:
a processor 801; and
a memory 802 for storing a program for determining a commodity code, the apparatus being powered on and executing the program for determining a commodity code by the processor, and performing the steps of:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
Optionally, the determining the commodity code corresponding to the commodity name according to the predicted chapter code and the commodity code corresponding to the commodity name includes:
and judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.
It should be noted that, for the detailed description of the electronic device provided in the eighth embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.
In response to the above-provided method for determining a commodity code, a ninth embodiment of the present application further provides a storage device,
a program for determining a method of encoding a commodity, the program being executable by a processor to perform the steps of:
Determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name;
predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
It should be noted that, for the detailed description of the storage device provided in the ninth embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.
Corresponding to the method for generating the commodity coding prediction model provided in the second embodiment of the present application, the tenth embodiment of the present application further provides a device for generating the commodity coding prediction model.
As shown in fig. 9, the commodity code prediction model generation apparatus includes:
a sample set determining unit 901, configured to determine a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
The model training unit 902 is configured to train a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
Optionally, the method further comprises:
the de-noising processing and/or disambiguation processing unit is used for carrying out de-noising processing and/or disambiguation processing on the commodity original description information comprising commodity names and commodity codes to obtain de-noised and/or disambiguated commodity description information;
and the commodity description information sample is the de-noised and/or disambiguated commodity description information.
Optionally, the drying processing unit is specifically configured to:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the drying processing unit is specifically configured to:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
The disambiguation processing unit is specifically configured to:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
Optionally, the commodity description information sample set includes a first commodity description information sample set and a second commodity description information sample set, the first commodity description information sample set includes at least one first commodity description information sample, the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample includes a commodity name and a commodity code corresponding to the commodity name; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Optionally, the model training unit is specifically configured to:
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.
Optionally, the commodity coding prediction model is a fastatex model.
Optionally, the method further comprises:
the information adding unit is used for adding the data information corresponding to the commodity name in the commodity description information in an n-gram mode to obtain the commodity name of the added data information;
the commodity description information sample is commodity description information of the added data information.
Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.
Optionally, the commodity name and the commodity code source in the commodity description information further include:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
It should be noted that, for the detailed description of the apparatus for generating a commodity code prediction model according to the tenth embodiment of the present application, reference may be made to the description related to the second embodiment of the present application, which is not repeated here.
Corresponding to the method for generating the commodity coding prediction model provided by the above, the eleventh embodiment of the present application further provides an electronic device.
As shown in fig. 10, the electronic device includes:
A processor 1001; and
a memory 1002 for storing a program of a method for generating a commodity code prediction model, the apparatus being powered on and executing the program of the method for generating a commodity code prediction model by the processor, and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
Optionally, the electronic device further performs the following steps: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
and the commodity description information sample is the de-noised and/or disambiguated commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
And deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
Optionally, the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, including:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
Optionally, the disambiguating the commodity original description information including the commodity name and the commodity code includes:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
Optionally, the commodity description information sample set includes a first commodity description information sample set and a second commodity description information sample set, the first commodity description information sample set includes at least one first commodity description information sample, the first commodity description information sample includes a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample includes a commodity name and a commodity code corresponding to the commodity name; and the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes.
Optionally, training a commodity coding prediction model according to the commodity description information sample set includes:
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; and training a second commodity code prediction model for predicting commodity codes according to the second commodity description information sample set.
Optionally, the commodity coding prediction model is a fastatex model.
Optionally, the electronic device further performs the following steps:
increasing data information corresponding to commodity names in commodity description information in an n-gram mode to obtain commodity names of the increased data information;
the commodity description information sample is commodity description information of the added data information.
Optionally, the commodity name and the commodity code in the commodity description information include: commodity names and commodity codes on commodity invoices made by tax payers.
Optionally, the commodity name and the commodity code source in the commodity description information further include:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
It should be noted that, for the detailed description of the electronic device provided in the eleventh embodiment of the present application, reference may be made to the related description of the second embodiment of the present application, which is not repeated here.
Corresponding to the method for generating the commodity coding prediction model provided by the above, the twelfth embodiment of the present application further provides a storage device.
A program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names.
While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the invention, so that the scope of the invention shall be defined by the appended claims.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (24)

1. The commodity coding prediction model generation method is characterized by comprising the following steps: the commodity coding prediction model comprises a first commodity coding prediction model and a second commodity coding prediction model;
determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;
The first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
2. The method as recited in claim 1, further comprising: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
the first commodity description information sample is denoised and/or disambiguated commodity description information containing commodity names and chapter codes corresponding to the commodity names;
the second commodity description information sample is the denoised and/or disambiguated commodity description information containing commodity names and commodity codes corresponding to the commodity names.
3. The method according to claim 2, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:
Calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
4. The method according to claim 2, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
5. The method of claim 2, wherein disambiguating the commodity original description information including commodity names and commodity codes comprises:
when the commodity codes submitted by the same submitter aiming at the same commodity name are multiple, the commodity code submitted by the submitter aiming at the same commodity name for the last time is used as the commodity code corresponding to the same commodity name.
6. The method of claim 1, wherein the first commodity coding prediction model and the second commodity coding prediction model are fasttet models.
7. The method as recited in claim 6, further comprising:
increasing data information corresponding to commodity names in commodity description information in an n-gram mode to obtain commodity names of the increased data information;
the first commodity description information sample and/or the second commodity description information sample are commodity description information of the added data information.
8. The method of claim 1, wherein the source of the commodity name and commodity code in the commodity descriptive information comprises: commodity names and commodity codes on commodity invoices made by tax payers.
9. The method of claim 8, wherein the source of the commodity name and commodity code in the commodity descriptive information further comprises:
and determining commodity names and commodity codes according to a standard commodity code table representing the corresponding relation between the commodity names and commodity codes.
10. The commodity coding prediction model generation method is characterized by comprising the following steps:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
Training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.
11. The method as recited in claim 10, further comprising: denoising and/or disambiguating the commodity original description information comprising the commodity name and the commodity code to obtain denoised and/or disambiguated commodity description information;
and the commodity description information sample is the de-noised and/or disambiguated commodity description information.
12. The method of claim 11, wherein denoising the original description information of the commodity including the commodity name and the commodity code to obtain denoised commodity description information, comprises:
Calculating commodity coding entropy values of the same commodity name aiming at commodity original description information, wherein the commodity coding entropy values are used for representing the discrete degree of commodity codes corresponding to the same commodity name;
and deleting commodity original description information with commodity coding entropy value larger than entropy value threshold, and taking the reserved commodity original description information as denoised commodity description information.
13. The method of claim 11, wherein the denoising processing is performed on the commodity original description information including the commodity name and the commodity code to obtain denoised commodity description information, and the denoising processing includes:
when the same submitter submits the commodity original description information for a plurality of times according to the same commodity name, the commodity original description information submitted for a plurality of times is subjected to weight reduction processing, and the commodity original description information subjected to weight reduction processing is used as the de-noised commodity description information.
14. A method of determining a commodity code, comprising:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;
Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
15. The method of claim 14, wherein the determining the commodity code corresponding to the commodity name from the predicted set of commodity codes corresponding to the chapter code and the commodity name comprises:
and judging whether the chapter code contained in the commodity code in the predicted commodity code set is consistent with the predicted chapter code, and if so, taking the predicted commodity code as the commodity code corresponding to the commodity name.
16. A commodity code prediction model generation apparatus, comprising: the commodity coding prediction model comprises a first commodity coding prediction model and a second commodity coding prediction model;
The sample set determining unit is used for determining a first commodity description information sample set and a second commodity description information sample set;
the model training unit is used for training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;
the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
17. An electronic device, comprising:
a processor; and
a memory for storing a program of a method of generating a commodity code prediction model including a first commodity code prediction model and a second commodity code prediction model; after the equipment is electrified and the program of the commodity coding prediction model generating method is run through the processor, the following steps are executed:
Determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;
the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
18. A memory device, characterized in that,
a program storing a method for generating a commodity code prediction model, the commodity code prediction model including a first commodity code prediction model and a second commodity code prediction model; the program is executed by a processor and performs the steps of:
Determining a first commodity description information sample set and a second commodity description information sample set;
training a first commodity code prediction model for predicting chapter codes corresponding to commodity names according to the first commodity description information sample set; training a second commodity coding prediction model for predicting commodity coding according to the second commodity description information sample set; the chapter codes are chapter codes in a standard commodity code table for representing the correspondence between commodity names and commodity codes;
the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name.
19. An apparatus for determining a commodity code, comprising:
a commodity name determining unit for determining the commodity name of the commodity code to be determined;
the chapter code prediction unit is used for predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;
The commodity code set prediction unit is used for predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;
and the commodity code prediction unit is used for determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
20. An electronic device, comprising:
a processor; and
a memory for storing a program for determining a commodity code, the apparatus being powered on and executing the program for determining a commodity code by the processor, and performing the steps of:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;
Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
21. A memory device, characterized in that,
a program for determining a method of encoding a commodity, the program being executable by a processor to perform the steps of:
determining commodity names of commodity codes to be determined;
predicting the chapter code corresponding to the commodity name according to the commodity name and a pre-trained first commodity code prediction model for predicting the chapter code corresponding to the commodity name; the first commodity coding prediction model is obtained through training according to a first commodity description information sample set; the first commodity description information sample set comprises at least one first commodity description information sample, wherein the first commodity description information sample comprises a commodity name and a chapter code corresponding to the commodity name;
Predicting a commodity code set corresponding to the commodity name according to the commodity name and a pre-trained second commodity code prediction model for predicting commodity codes; the second commodity coding prediction model is obtained through training according to a second commodity description information sample set; the second commodity description information sample set comprises at least one second commodity description information sample, and the second commodity description information sample comprises a commodity name and a commodity code corresponding to the commodity name;
and determining commodity codes corresponding to the commodity names according to the predicted chapter codes and commodity code sets corresponding to the commodity names.
22. A commodity code prediction model generation apparatus, comprising:
the sample set determining unit is used for determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
the model training unit is used for training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.
23. An electronic device, comprising:
a processor; and
a memory for storing a program of a method of generating a commodity code prediction model, the apparatus being powered on and executing the program of the method of generating a commodity code prediction model by the processor, and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.
24. A memory device, characterized in that,
a program for storing a method for generating a commodity code prediction model, the program being executed by a processor and executing the steps of:
determining a commodity description information sample set; the commodity description information sample set comprises at least one commodity description information sample, and the commodity description information sample comprises a commodity name and a commodity code;
training a commodity coding prediction model according to the commodity description information sample set; the commodity code prediction model is used for predicting commodity codes corresponding to commodity names; the commodity description information sample set can comprise a first commodity description information sample set and a second commodity description information sample set, wherein the first commodity description information sample set comprises a commodity name and a chapter code corresponding to the commodity name, and the second commodity description information sample set comprises a commodity name and a commodity code corresponding to the commodity name; the chapter codes are chapter codes in a standard commodity code table based on the correspondence between the characterization commodity names and commodity codes.
CN201810825197.0A 2018-07-25 2018-07-25 Commodity coding prediction model generation and commodity coding determination method, device and equipment Active CN110851587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810825197.0A CN110851587B (en) 2018-07-25 2018-07-25 Commodity coding prediction model generation and commodity coding determination method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810825197.0A CN110851587B (en) 2018-07-25 2018-07-25 Commodity coding prediction model generation and commodity coding determination method, device and equipment

Publications (2)

Publication Number Publication Date
CN110851587A CN110851587A (en) 2020-02-28
CN110851587B true CN110851587B (en) 2024-04-05

Family

ID=69594392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810825197.0A Active CN110851587B (en) 2018-07-25 2018-07-25 Commodity coding prediction model generation and commodity coding determination method, device and equipment

Country Status (1)

Country Link
CN (1) CN110851587B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695979A (en) * 2020-06-18 2020-09-22 税友软件集团股份有限公司 Method, device and equipment for analyzing relation between raw material and finished product
CN111967246A (en) * 2020-07-30 2020-11-20 湖南大学 Error correction method for shopping bill recognition result
CN114548041A (en) * 2020-11-27 2022-05-27 华晨宝马汽车有限公司 Method, electronic device and medium for recommending HS codes for goods
CN113779933B (en) * 2021-09-03 2024-07-09 深圳市朗华供应链服务有限公司 Commodity encoding method, electronic device, and computer-readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193321A (en) * 2006-11-27 2008-06-04 汤姆森许可贸易公司 Encoding device, decoding device, recording device, audio/video data transmission system
CN103488655A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Method and system for processing composite model data
CN104134128A (en) * 2014-08-11 2014-11-05 税友软件集团股份有限公司 Invoice processing method and system
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN107067293A (en) * 2017-03-07 2017-08-18 北京三快在线科技有限公司 Merchant category method, device and electronic equipment
CN107704892A (en) * 2017-11-07 2018-02-16 宁波爱信诺航天信息有限公司 A kind of commodity code sorting technique and system based on Bayesian model
CN107862046A (en) * 2017-11-07 2018-03-30 宁波爱信诺航天信息有限公司 A kind of tax commodity code sorting technique and system based on short text similarity
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108052668A (en) * 2017-12-29 2018-05-18 北京百旺金赋科技有限公司 The endowed method and system of intelligence based on commodity code
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193321A (en) * 2006-11-27 2008-06-04 汤姆森许可贸易公司 Encoding device, decoding device, recording device, audio/video data transmission system
CN103488655A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Method and system for processing composite model data
CN104134128A (en) * 2014-08-11 2014-11-05 税友软件集团股份有限公司 Invoice processing method and system
CN105550173A (en) * 2016-02-06 2016-05-04 北京京东尚科信息技术有限公司 Text correction method and device
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN107067293A (en) * 2017-03-07 2017-08-18 北京三快在线科技有限公司 Merchant category method, device and electronic equipment
CN107704892A (en) * 2017-11-07 2018-02-16 宁波爱信诺航天信息有限公司 A kind of commodity code sorting technique and system based on Bayesian model
CN107862046A (en) * 2017-11-07 2018-03-30 宁波爱信诺航天信息有限公司 A kind of tax commodity code sorting technique and system based on short text similarity
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108052668A (en) * 2017-12-29 2018-05-18 北京百旺金赋科技有限公司 The endowed method and system of intelligence based on commodity code

Also Published As

Publication number Publication date
CN110851587A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN110851587B (en) Commodity coding prediction model generation and commodity coding determination method, device and equipment
US20200117675A1 (en) Obtaining of Recommendation Information
WO2016058485A2 (en) Methods and devices for calculating ranking score and creating model, and product recommendation system
CN106021433B (en) A kind of the public praise analysis method and device of comment on commodity data
US9449283B1 (en) Selecting a training strategy for training a machine learning model
KR101868829B1 (en) Generation of weights in machine learning
US20090248657A1 (en) web searching
US20180157965A1 (en) Device and method for determining convolutional neural network model for database
CN106774975B (en) Input method and device
KR101982674B1 (en) Alternative training distribution based on density modification
JP2015087973A (en) Generation device, generation method, and program
CN107766573B (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium based on data processing
WO2020065806A1 (en) Processing device, processing method, and program
CN111353620A (en) Method, device and equipment for constructing network point component prediction model and storage medium
CN111612581A (en) Method, device and equipment for recommending articles and storage medium
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
CN111324698A (en) Deep learning method, evaluation viewpoint extraction method, device and system
CN105809379A (en) Logistics branch evaluation method, device and electronic device
CN110738508A (en) data analysis method and device
CN111680213B (en) Information recommendation method, data processing method and device
CN107395447A (en) Module detection method, power system capacity predictor method and corresponding equipment
US20140324523A1 (en) Missing String Compensation In Capped Customer Linkage Model
US11397779B2 (en) Method and device for pushing information based on search content
CN108229572A (en) A kind of parameter optimization method and computing device
CN105786791B (en) Data subject acquisition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant