CN112348604A - Invoice commodity code assignment method, system and device and readable storage medium - Google Patents

Invoice commodity code assignment method, system and device and readable storage medium Download PDF

Info

Publication number
CN112348604A
CN112348604A CN202011346801.5A CN202011346801A CN112348604A CN 112348604 A CN112348604 A CN 112348604A CN 202011346801 A CN202011346801 A CN 202011346801A CN 112348604 A CN112348604 A CN 112348604A
Authority
CN
China
Prior art keywords
word
matching
result
goods
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011346801.5A
Other languages
Chinese (zh)
Other versions
CN112348604B (en
Inventor
陈鹏飞
张镇潮
施建生
涂昶
钱力扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202011346801.5A priority Critical patent/CN112348604B/en
Publication of CN112348604A publication Critical patent/CN112348604A/en
Application granted granted Critical
Publication of CN112348604B publication Critical patent/CN112348604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses an invoice commodity code assignment method, system, device and computer readable storage medium, comprising: receiving a goods name; segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result; matching the full-mode word segmentation result and the precise-mode word segmentation result in the core word library by using a composite core word extraction algorithm to obtain a plurality of matching results; calculating the confidence coefficient of each matching result by utilizing a preset weighting ratio and the number ratio of the goods and commodity codes of the development companies in each matching result recorded in the core lexicon; and outputting the matching result with the highest confidence degree. According to the method and the device, the core words are extracted by using multiple composite algorithms, the matching hit rate is improved, multiple matching results are obtained, the matching result with the highest confidence coefficient is selected from the matching results by using the confidence coefficient, and the accuracy of the final result is ensured.

Description

Invoice commodity code assignment method, system and device and readable storage medium
Technical Field
The invention relates to the technical field of computers, in particular to an invoice commodity code assignment method, system and device and a computer readable storage medium.
Background
When an enterprise invoices, goods and services can be classified into more than 4000 categories according to a tax classification code table of a national tax administration. Users who are not familiar with the tax classification code table usually fill in the tax classification code table according to experience, and the condition of wrong filling of the commodity code often occurs, and once errors occur, unnecessary loss is likely to be brought. Therefore, it is necessary to design an algorithm capable of classifying the names of the goods filled by the user into the most suitable goods codes through a series of calculations.
In the algorithm in the prior art, a user needs to accurately input the goods name, so that the corresponding goods code can be found in a pre-constructed goods name library, but because different invoices have different habits, some goods codes can be found in the library, and some goods codes are difficult to find. For example, "farmer mountain spring mineral water", some enterprises may offer "farmer mountain spring mineral water", but some enterprises may offer "farmer mountain spring mineral water 500 ml", "farmer mountain spring mineral water 1.5L", etc., and may find the former "farmer mountain spring mineral water", but may not find the latter two kinds
Therefore, a more flexible and efficient invoice commodity code assignment method is needed.
Disclosure of Invention
In view of the above, the present invention provides a method, a system, a device and a computer readable storage medium for assigning an invoice commodity code, which are more flexible and efficient. The specific scheme is as follows:
an invoice commodity code assignment method comprises the following steps:
receiving a goods name;
segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result;
matching the full-mode word segmentation result and the precise-mode word segmentation result in the core word library by using a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the goods name, the goods code and the number of the goods code development companies, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by utilizing a preset weighting ratio and the number ratio of the goods and commodity codes of the development companies in each matching result recorded in the core lexicon;
outputting a matching result with the highest confidence coefficient;
the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code issuing companies of each goods.
Optionally, the process of receiving the name of the goods includes:
receiving an original goods name;
and cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name.
Optionally, the process of obtaining a plurality of matching results by matching the composite core word extraction algorithm, the full-mode word segmentation result, and the precise-mode word segmentation result in the core word bank includes:
matching in the core word stock by utilizing a final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a final word matching result;
and matching the word segmentation result in the core word library by using a unique word algorithm and the accurate mode to obtain a unique word matching result.
Optionally, the method further includes:
receiving a commodity code abbreviation;
the process of obtaining a plurality of matching results by matching the composite core word extraction algorithm, the full-mode word segmentation results and the precise-mode word segmentation results in the core word bank includes:
matching in the core word stock by utilizing a final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a final word matching result;
matching the word segmentation result in the core word library by using a unique word algorithm and the accurate mode to obtain a unique word matching result;
and matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a word algorithm for short, the commodity code for short, and the word matching result for short.
The invention also discloses an invoice commodity code assignment system, which comprises the following components:
the goods name receiving module is used for receiving goods names;
the system comprises a result word segmentation module, a goods name segmentation module and a goods name segmentation module, wherein the result word segmentation module is used for segmenting the goods name by using result words and a preset core lexicon to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word library by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the goods name, the goods code and the number of the goods code development companies, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculation module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting ratio and the number ratio of the goods and commodity code issuing companies in each matching result recorded in the core lexicon;
the result output module is used for outputting the matching result with the highest confidence coefficient;
the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code issuing companies of each goods.
Optionally, the cargo name receiving module includes:
the original name receiving unit is used for receiving an original goods name;
and the original name cleaning unit is used for cleaning the original goods name and removing useless words by using a preset useless word bank to obtain the goods name.
Optionally, the core word extracting module includes:
the final word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a final word algorithm to obtain a final word matching result;
and the unique word calculation unit is used for matching the precise mode word segmentation result in the core word library by using a unique word algorithm to obtain a unique word matching result.
Optionally, the method further includes:
the code abbreviation receiving module is used for receiving the commodity code abbreviation;
the core word extraction module comprises:
the final word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a final word algorithm to obtain a final word matching result;
the unique word calculation unit is used for matching the word segmentation result in the core word bank by using a unique word algorithm and the accurate mode word segmentation result to obtain a unique word matching result;
and the short word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock by using a short word algorithm, the commodity code short word, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a short word matching result.
The invention also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
a processor for executing the computer program to implement the invoice merchandise code assignment as described above.
The invention also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the invoice commodity code assignment as described above.
The invoice commodity code assignment method comprises the following steps: receiving a goods name; segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result; matching the full-mode word segmentation result and the precise-mode word segmentation result in the core word library by using a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the goods name, the goods code and the number of the goods code development companies, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms; calculating the confidence coefficient of each matching result by utilizing a preset weighting ratio and the number ratio of the goods and commodity codes of the development companies in each matching result recorded in the core lexicon; outputting a matching result with the highest confidence coefficient; the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code issuing companies of each goods.
According to the method, the core words are extracted by using various composite algorithms, the matching hit rate is improved by matching, various matching results are obtained, and finally the matching result with the highest confidence coefficient is selected from the matching results by using the confidence coefficients, so that the accuracy of the final result is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for mirrored storage of a docker container according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another method for mirrored storage of a docker container according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a docker container mirror image directional pulling method disclosed in the embodiments of the present invention;
fig. 4 is a schematic view of another mirror-image directional pulling process of a docker container according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses an invoice commodity code assignment method, which is shown in figure 1 and comprises the following steps:
s11: receiving a goods name;
s12: and segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result.
Specifically, after receiving a goods name input by a user, segmenting the goods name by utilizing a crust segmentation word and a preset core word library to obtain a full-mode segmentation result and an accurate-mode segmentation result, for example, the full-mode segmentation result and the accurate-mode segmentation result are respectively recorded as cut _ all _ result and cut _ result, assuming that the goods name input by the user is "farmer spring purified water", and the core word library records three core words of "farmer spring purified water", "farmer spring" and "purified water", so that three results, namely "farmer spring purified water", "farmer spring" and "purified water", can be obtained by segmenting the goods name "farmer spring purified water", wherein the "farmer spring purified water" is completely recorded as the accurate-mode segmentation result as the goods name, and the "farmer spring" and "purified water" are partial goods names and are recorded as the full-mode segmentation result, two results are obtained, cut _ all _ result [ 'farmer spring', 'purified water' ] and cut _ result [ 'farmer spring purified water' ].
Specifically, if the segmentation result of the goods name cannot be found in the core thesaurus by the ending segmentation, which may be caused by the fact that the goods name input by the user is wrong or the information of the related goods name is not recorded in the core thesaurus, the finally output full-mode segmentation result and the accurate-mode segmentation result are both empty, and the subsequent matching process can be terminated.
It can be understood that if the information of the related goods name is not recorded in the core word stock, the information can be added subsequently according to the actual application requirement.
S13: and matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word library by using a composite core word extraction algorithm to obtain a plurality of matching results.
Specifically, the composite core word extraction algorithm includes multiple core word extraction algorithms, such as a final word algorithm, a unique word algorithm, a word short algorithm, and the like, and the matching results output by the multiple core word extraction algorithms can be obtained by continuously matching core words in the core word library on the basis of the full-mode word segmentation result and the precise-mode word segmentation result obtained by the previous final word extraction algorithm by using the multiple core word extraction algorithms.
It can be understood that, if all the core word extraction algorithms output null, the final matching result is null, which may be due to a wrong goods name input by the user or due to no information of the related goods name recorded in the core thesaurus. If the information of the related goods name is not recorded in the core word stock, the information can be added subsequently according to the actual application requirement.
S14: and calculating the confidence coefficient of each matching result by using the preset weighted ratio and the number ratio of the goods and commodity codes issuing companies in each matching result recorded in the core word stock.
Specifically, because a composite core word extraction algorithm is adopted, a plurality of matching results are obtained, and in order to output a unique and accurate commodity code corresponding to the goods name, a corresponding weighting ratio is set for each core word extraction algorithm in the composite core word extraction algorithm in advance, so that each matching result corresponds to a corresponding weighting ratio.
Finally, the confidence of each matching result can be calculated by using the ratio of the number of the goods and goods code opening companies in each matching result and a preset weighted ratio, for example, the matching results of three algorithms are { 'purified water-1030307040000000000': 90}, { 'farmer spring purified water-1030307040000000000': 60} and { 'purified water-1030307040000000000': 90}, respectively, wherein a text part such as "purified water" is a goods name, a number part such as "1030307040000000000" is a goods code, and a number part such as "90" is a goods and goods code opening company number ratio, and the result obtained by using the weighted ratio calculation can be: { ' purified water-1030307040000000000 ': 90} ' 0.2+ { ' farmer spring purified water-1030307040000000000 ': 60} ' 0.3+ { ' purified water-1030307040000000000 ': 90} ' 0.5 { ' purified water-1030307040000000000 ': 63, ' farmer spring purified water-1030307040000000000 ':18}, wherein numerals such as "0.2", "0.5" are weighted proportions of each algorithm, in the above example, the final calculation yields a cargo name of "purified water", a confidence of "1030307040000000000" of a cargo number of 63, a cargo name of "farmer spring purified water", and a confidence of "1030307040000000000" of a cargo number of 18.
It should be noted that, the core lexicon is pre-constructed with the corresponding relationship between the goods name and the goods code, so that after the result is matched in the core lexicon, the corresponding goods code and the number ratio of the goods code issuing company can be obtained, see the core lexicon shown in table 1.
TABLE 1
Figure BDA0002800180730000071
S15: and outputting the matching result with the highest confidence degree.
Specifically, after the confidence is calculated, the matching result with the highest confidence can be output, and the commodity code corresponding to the goods name initially input by the user can be obtained.
Therefore, the embodiment of the invention extracts the core words by using various composite algorithms, improves the matching hit rate by matching, obtains various matching results, and finally picks out the matching result with the highest confidence coefficient from the matching results by using the confidence coefficient, thereby ensuring the accuracy of the final result.
Specifically, when the core lexicon is created, the goods names input into the core lexicon are cleaned to remove stop words, the precision of the goods names is ensured, interference information is reduced, the efficiency of extracting the core words by a subsequent core extraction algorithm is improved, meanwhile, the goods names with the quantity of goods commodity opening companies lower than a certain threshold value can be removed, the quantity of the core words is reduced, so that the subsequent extraction speed is improved, for example, data with the quantity of opening companies smaller than 5 can be removed, for example, hydrogen peroxide in table 1, with the quantity of opening companies being only 1 can be removed, in addition, goods names with the quantity of goods commodity code opening companies being more than 0.1% can be selected, therefore, hydrogen peroxide in table 1 is low in the quantity of opening companies, but the quantity of opening companies is low, and meets the requirement, may still be stored in the core lexicon.
The embodiment of the invention discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the technical scheme is further explained and optimized in the embodiment. Referring to fig. 2, specifically:
s21: receiving an original goods name;
s22: and cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name.
Specifically, because the original goods name input by the user has the inaccurate problem, the original goods name input by the user can be cleaned, and useless words are removed from the original goods name through a preset useless word bank and a corresponding cleaning algorithm, so that the goods name is obtained.
For example, the original goods name is 'special price farmer spring pure water 500 ml', the goods name obtained after cleaning is 'farmer spring pure water', and two useless words of 'special price' and '500 ml' are removed, so that the subsequent word segmentation precision and the subsequent matching efficiency are improved.
S23: segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result;
s24: and matching in the core word library by using a final word algorithm, a full-mode word segmentation result and/or an accurate-mode word segmentation result to obtain a final word matching result.
Specifically, the composite core word extraction algorithm may include a final word algorithm, which first determines whether the goods name ends with some words in the full-mode word segmentation result, if so, outputting the word which is taken as the final word in the full-mode word segmentation result as the final word matching result, for example, the product name is "farmer spring pure water", the full-mode word segmentation result is "farmer spring" and "pure water", the "pure water" is the final word, the final word matching result is "pure water", if not, judging whether the number of words in the accurate mode word segmentation result is larger than 1, if so, it is continuously determined whether the last word in the exact mode word segmentation result ends with some words in the full mode word segmentation result, and if the final word is output as the final word matching result, the final word matching result is null.
S25: and matching the unique word algorithm and the accurate mode word segmentation result in the core word library to obtain a unique word matching result.
Specifically, a unique word algorithm is used for judging whether a word in the accurate mode word segmentation result is unique, if the word is unique, the word is used as a unique word matching result, and if the word is not unique, the output result is empty.
S26: calculating the confidence coefficient of each matching result by using the preset weighted ratio and the number ratio of the goods and commodity codes issuing companies in each matching result recorded in the core word stock;
s27: and outputting the matching result with the highest confidence degree.
Further, the embodiment of the present invention also discloses an invoice commodity code assignment method, as shown in fig. 3, the method includes:
s31: receiving an original goods name and a commodity code abbreviation;
s32: cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name;
s33: segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result;
s34: matching in a core word library by utilizing a final word algorithm, a full-mode word segmentation result and/or an accurate-mode word segmentation result to obtain a final word matching result;
s35: matching the word segmentation results in a core word library by using a unique word algorithm and an accurate mode to obtain a unique word matching result;
s36: and matching the full-mode word segmentation result and/or the precise-mode word segmentation result in the core word stock by using a word algorithm for short, a commodity code for short, so as to obtain a word matching result for short.
Specifically, the user may also input a product code abbreviation, for example, 500ml of the farm spring purified water, wherein the "soft drink" is the product code abbreviation, and the "farm spring purified water 500 ml" is the original product name.
Specifically, the word segmentation algorithm is used for judging whether the accurate mode word segmentation result is empty, and if the accurate mode word segmentation result is empty, the output result is empty. Otherwise, judging whether the commodity code is in a preset core word library for short, and if the commodity code is empty, outputting the result to be empty. If not, finding the corresponding core word sub-library according to the commodity code abbreviation, matching the core word of the full-mode word segmentation result in the commodity code abbreviation core word sub-library to set the number ratio of companies, if the core word is matched, selecting the core word with the largest ratio as an output result, otherwise, outputting the result to be null.
For example, if the commodity code is found in the core word stock and is referred to as "soft drink", the full-mode word segmentation result is matched with the commodity name under the commodity code, and then the matching result with the largest number ratio of the issuing company is selected from the full-mode word segmentation result.
S37: calculating the confidence coefficient of each matching result by using the preset weighted ratio and the number ratio of the goods and commodity codes issuing companies in each matching result recorded in the core word stock;
s38: and outputting the matching result with the highest confidence degree.
Correspondingly, the embodiment of the invention discloses a specific invoice commodity code assignment method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. Referring to fig. 4, specifically:
a goods name receiving module 11, configured to receive a goods name;
the ending word segmentation module 12 is used for segmenting the goods name by utilizing the ending word segmentation and a preset core lexicon to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
a core word extraction module 13, configured to match the full-mode word segmentation result and the precise-mode word segmentation result in a core word library by using a composite core word extraction algorithm to obtain multiple matching results; the matching result comprises the goods name, the goods code and the number proportion of the goods code development company, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
a confidence coefficient calculation module 14, configured to calculate a confidence coefficient of each matching result by using a preset weighted proportion and a number proportion of goods and commodity code issuing companies in each matching result recorded in the core lexicon;
a result output module 15, configured to output a matching result with the highest confidence;
the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code development companies of the goods.
Therefore, the embodiment of the invention extracts the core words by using various composite algorithms, improves the matching hit rate by matching, obtains various matching results, and finally picks out the matching result with the highest confidence coefficient from the matching results by using the confidence coefficient, thereby ensuring the accuracy of the final result.
Specifically, the goods name receiving module 11 may include an original name receiving unit and an original name washing unit; wherein,
the original name receiving unit is used for receiving an original goods name;
and the original name cleaning unit is used for cleaning the original goods name and removing useless words by using a preset useless word bank to obtain the goods name.
Specifically, the core word extraction module 13 may include a final word calculation unit and a unique word calculation unit; wherein
The final word calculation unit is used for matching the final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library to obtain a final word matching result;
and the unique word calculation unit is used for matching the unique word algorithm and the accurate mode word segmentation result in the core word bank to obtain a unique word matching result.
Specifically, the system can further comprise a code receiving module for short; wherein,
the code abbreviation receiving module is used for receiving the commodity code abbreviation;
the core word extraction module 13 may include a final word calculation unit, a unique word calculation unit and; wherein,
the final word calculation unit is used for matching the final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library to obtain a final word matching result;
the unique word calculation unit is used for matching the unique word algorithm and the accurate mode word segmentation result in the core word stock to obtain a unique word matching result;
and the word computing unit is used for matching the full-mode word segmentation result and/or the precise-mode word segmentation result in the core word stock by utilizing a word algorithm for short, a commodity code for short, so as to obtain a word matching result for short.
In addition, the embodiment of the invention also discloses an invoice commodity code assignment method, which comprises the following steps:
a memory for storing a computer program;
a processor for executing a computer program to implement the invoice merchandise code assignment as described above.
In addition, the embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the invoice commodity code assignment is realized.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The technical content provided by the present invention is described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the above description of the examples is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An invoice commodity code assignment method is characterized by comprising the following steps:
receiving a goods name;
segmenting the goods name by using the ending segmentation and a preset core lexicon to obtain a full-mode segmentation result and an accurate-mode segmentation result;
matching the full-mode word segmentation result and the precise-mode word segmentation result in the core word library by using a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the goods name, the goods code and the number of the goods code development companies, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
calculating the confidence coefficient of each matching result by utilizing a preset weighting ratio and the number ratio of the goods and commodity codes of the development companies in each matching result recorded in the core lexicon;
outputting a matching result with the highest confidence coefficient;
the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code issuing companies of each goods.
2. The invoice commodity code assignment method of claim 1, wherein the process of receiving a goods name, comprises:
receiving an original goods name;
and cleaning the original goods name, and removing useless words by using a preset useless word bank to obtain the goods name.
3. The invoice commodity code assignment method according to claim 2, wherein the process of obtaining a plurality of matching results by matching the composite core word extraction algorithm, the full-mode segmentation results and the precise-mode segmentation results in the core word library comprises:
matching in the core word stock by utilizing a final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a final word matching result;
and matching the word segmentation result in the core word library by using a unique word algorithm and the accurate mode to obtain a unique word matching result.
4. The invoice commodity code assignment method of claim 2, further comprising:
receiving a commodity code abbreviation;
the process of obtaining a plurality of matching results by matching the composite core word extraction algorithm, the full-mode word segmentation results and the precise-mode word segmentation results in the core word bank includes:
matching in the core word stock by utilizing a final word algorithm, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a final word matching result;
matching the word segmentation result in the core word library by using a unique word algorithm and the accurate mode to obtain a unique word matching result;
and matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a word algorithm for short, the commodity code for short, and the word matching result for short.
5. An invoice commodity code assignment system, comprising:
the goods name receiving module is used for receiving goods names;
the system comprises a result word segmentation module, a goods name segmentation module and a goods name segmentation module, wherein the result word segmentation module is used for segmenting the goods name by using result words and a preset core lexicon to obtain a full-mode word segmentation result and an accurate-mode word segmentation result;
the core word extraction module is used for matching the full-mode word segmentation result and the accurate-mode word segmentation result in the core word library by utilizing a composite core word extraction algorithm to obtain a plurality of matching results; the matching result comprises the goods name, the goods code and the number of the goods code development companies, and the composite core word extraction algorithm comprises a plurality of core word extraction algorithms;
the confidence coefficient calculation module is used for calculating the confidence coefficient of each matching result by utilizing the preset weighting ratio and the number ratio of the goods and commodity code issuing companies in each matching result recorded in the core lexicon;
the result output module is used for outputting the matching result with the highest confidence coefficient;
the core word stock is a database which is created in advance and comprises a plurality of goods names, goods codes of the goods and the number of goods code issuing companies of each goods.
6. The invoice commodity code assignment system of claim 5, wherein the goods name receiving module, comprises:
the original name receiving unit is used for receiving an original goods name;
and the original name cleaning unit is used for cleaning the original goods name and removing useless words by using a preset useless word bank to obtain the goods name.
7. The invoice commodity code assignment system of claim 6, wherein the core word extraction module comprises:
the final word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a final word algorithm to obtain a final word matching result;
and the unique word calculation unit is used for matching the precise mode word segmentation result in the core word library by using a unique word algorithm to obtain a unique word matching result.
8. The invoice commodity code assignment system of claim 6, further comprising:
the code abbreviation receiving module is used for receiving the commodity code abbreviation;
the core word extraction module comprises:
the final word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word library by using a final word algorithm to obtain a final word matching result;
the unique word calculation unit is used for matching the word segmentation result in the core word bank by using a unique word algorithm and the accurate mode word segmentation result to obtain a unique word matching result;
and the short word calculation unit is used for matching the full-mode word segmentation result and/or the accurate-mode word segmentation result in the core word stock by using a short word algorithm, the commodity code short word, the full-mode word segmentation result and/or the accurate-mode word segmentation result to obtain a short word matching result.
9. An invoice commodity code assignment method is characterized by comprising the following steps:
a memory for storing a computer program;
a processor for executing the computer program to implement the invoice commodity code assignment of any one of claims 1 to 4.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the invoice commodity code assignment of any one of claims 1 to 4.
CN202011346801.5A 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium Active CN112348604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011346801.5A CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011346801.5A CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Publications (2)

Publication Number Publication Date
CN112348604A true CN112348604A (en) 2021-02-09
CN112348604B CN112348604B (en) 2023-11-17

Family

ID=74365936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011346801.5A Active CN112348604B (en) 2020-11-26 2020-11-26 Invoice commodity code assignment method, system, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN112348604B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219038A (en) * 2021-12-17 2022-03-22 税友信息技术有限公司 Invoice commodity name classification method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276360A (en) * 2007-03-30 2008-10-01 建准电机工业股份有限公司 Reliability verification method of patent retrieval data
CN106844651A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results compare screening plant
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN109213866A (en) * 2018-09-19 2019-01-15 浙江诺诺网络科技有限公司 A kind of tax commodity code classification method and system based on deep learning
CN109918480A (en) * 2019-03-01 2019-06-21 陈包容 A method of address is extracted from text
CN110347801A (en) * 2019-07-17 2019-10-18 安徽航天信息有限公司 A kind of commodity classification codes match method and system
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN110852815A (en) * 2018-07-25 2020-02-28 阿里巴巴集团控股有限公司 Data processing method, device and machine readable medium
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111832318A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Single sentence natural language processing method and device, computer equipment and readable storage medium
CN111985211A (en) * 2020-09-01 2020-11-24 中国民航科学技术研究院 Ontology concept obtaining method and device in civil aviation safety field and storage medium
CN113191146A (en) * 2021-05-26 2021-07-30 平安国际智慧城市科技股份有限公司 Appeal data distribution method and device, computer equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276360A (en) * 2007-03-30 2008-10-01 建准电机工业股份有限公司 Reliability verification method of patent retrieval data
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN106844651A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results compare screening plant
CN107871144A (en) * 2017-11-24 2018-04-03 税友软件集团股份有限公司 Invoice trade name sorting technique, system, equipment and computer-readable recording medium
CN110852815A (en) * 2018-07-25 2020-02-28 阿里巴巴集团控股有限公司 Data processing method, device and machine readable medium
CN109213866A (en) * 2018-09-19 2019-01-15 浙江诺诺网络科技有限公司 A kind of tax commodity code classification method and system based on deep learning
CN109918480A (en) * 2019-03-01 2019-06-21 陈包容 A method of address is extracted from text
CN110347801A (en) * 2019-07-17 2019-10-18 安徽航天信息有限公司 A kind of commodity classification codes match method and system
CN110597995A (en) * 2019-09-20 2019-12-20 税友软件集团股份有限公司 Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110688851A (en) * 2019-09-26 2020-01-14 税友软件集团股份有限公司 Method, device and medium for extracting key information of address text
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111832318A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Single sentence natural language processing method and device, computer equipment and readable storage medium
CN111985211A (en) * 2020-09-01 2020-11-24 中国民航科学技术研究院 Ontology concept obtaining method and device in civil aviation safety field and storage medium
CN113191146A (en) * 2021-05-26 2021-07-30 平安国际智慧城市科技股份有限公司 Appeal data distribution method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张毓;陈军清;: "基于深度特征语义学习模型的垃圾短信文本聚类研究", 现代计算机(专业版), no. 07, pages 17 - 21 *
陈江涛;张金隆;张亚军;: "在线商品评论有用性影响因素研究:基于文本语义视角", 图书情报工作, no. 10, pages 121 - 125 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219038A (en) * 2021-12-17 2022-03-22 税友信息技术有限公司 Invoice commodity name classification method and device

Also Published As

Publication number Publication date
CN112348604B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN102193939B (en) The implementation method of information navigation, information navigation server and information handling system
CN109255564B (en) Pick-up point address recommendation method and device
CN109087163B (en) Credit assessment method and device
CN110580335A (en) user intention determination method and device
CN106407420B (en) Multimedia resource recommendation method and system
CN109816134B (en) Method and device for predicting delivery address and storage medium
CN110597995B (en) Commodity name classification method, commodity name classification device, commodity name classification equipment and readable storage medium
CN110019418B (en) Object description method and device, identification system, electronic equipment and storage medium
CN110019650B (en) Method and device for providing search association word, storage medium and electronic equipment
CN107247728B (en) Text processing method and device and computer storage medium
CN110674621A (en) Attribute information filling method and device
CN113988057A (en) Title generation method, device, equipment and medium based on concept extraction
CN112348604A (en) Invoice commodity code assignment method, system and device and readable storage medium
CN111428486B (en) Article information data processing method, device, medium and electronic equipment
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN110674388A (en) Mapping method and device for push item, storage medium and terminal equipment
CN112559877A (en) CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN111797622A (en) Method and apparatus for generating attribute information
CN110781365A (en) Commodity searching method, device and system and electronic equipment
CN114358736A (en) Customer service work order generation method and device, storage medium and electronic equipment
CN113571198A (en) Conversion rate prediction method, device, equipment and storage medium
CN112541357A (en) Entity identification method and device and intelligent equipment
CN112328709B (en) Entity labeling method and device, server and storage medium
CN116738973B (en) Search intention recognition method, method for constructing prediction model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant