CN117609583A - Customs import and export commodity classification method based on image text combination retrieval - Google Patents

Customs import and export commodity classification method based on image text combination retrieval Download PDF

Info

Publication number
CN117609583A
CN117609583A CN202310894185.4A CN202310894185A CN117609583A CN 117609583 A CN117609583 A CN 117609583A CN 202310894185 A CN202310894185 A CN 202310894185A CN 117609583 A CN117609583 A CN 117609583A
Authority
CN
China
Prior art keywords
commodity
image
customs
text
image text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310894185.4A
Other languages
Chinese (zh)
Inventor
杨良怀
秦钰淑
朱艳超
龚卫华
范玉雷
傅萧磊
贾美
项逸婧
朱辰
蔡华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronic Port Data Center Hangzhou Branch
Zhejiang University of Technology ZJUT
Original Assignee
China Electronic Port Data Center Hangzhou Branch
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronic Port Data Center Hangzhou Branch, Zhejiang University of Technology ZJUT filed Critical China Electronic Port Data Center Hangzhou Branch
Priority to CN202310894185.4A priority Critical patent/CN117609583A/en
Publication of CN117609583A publication Critical patent/CN117609583A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Educational Administration (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a customs import and export commodity classifying method based on image text combination retrieval, which comprises the following steps: 1) Establishing a customs commodity image text database; 2) Denoising and data enhancing operations are carried out on the image, and word segmentation, stop word removal and vectorization operations are carried out on text data; 3) Extracting low, medium and high-layer features of the image by using a convolutional neural network, and extracting commodity text features by using a long-term memory neural network; 4) Fusing low, middle and high-level features of the image with text features; 5) Inputting image text data of a customs commodity image text database into a model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function; 6) The method and the system realize the combined use of the text description information and the image information of the import and export commodities of the customs, and assist related enterprises and customs personnel to quickly and accurately classify different import and export commodities.

Description

Customs import and export commodity classification method based on image text combination retrieval
Technical Field
The invention relates to the technical field of computer multi-mode retrieval and customs import and export commodity classification, in particular to a method for obtaining customs tax code by combining and retrieving commodity images and texts.
Background
In the field of customs import and export commodity classification, the customs commodity classification rule is complex, and influence factors are numerous. Under otherwise identical conditions, an incorrect tariff code (HS code) may be obtained if a factor is changed. While incorrect categorization of goods may result in different penalty consequences such as delays in clearance, withholding of goods, refusal to enjoy certain import and export benefits, etc. Thus, accurately categorizing customs merchandise is an extremely important and challenging task.
The general working form of customs commodity classification at present is to use declaration text (such as commodity name, component, use and other information) of import and export commodity to identify and classify. However, due to the fact that customs law enforcement personnel and import and export business personnel have different professional levels, even for the same commodity, certain differences exist in descriptions among different personnel, and therefore the classification of the commodity and subsequent tax return work are easy to be wrong. Thus, it is difficult to obtain accurate HS coding by a single text search.
With the development of internet technology and the popularization of digital devices, image retrieval technology has been rapidly developed and applied in various fields related to computer vision and artificial intelligence: the commodity image retrieval can help customers to find favorite commodities faster and more accurately; medical image retrieval may help doctors to perform medical diagnosis and the like more effectively. However, the application of the image retrieval technology to the field of customs import and export commodity classification is very challenging: firstly, the customs basically does not establish a reliable import and export commodity image data set, and establishing a related image data set means that the current import and export customs declaration flow is required to be modified; second, goods from the same brand may have different specifications, which typically have a similar appearance but different HS codes; finally, the problem of image distortion may occur when the mobile phone or the camera shoots the commodity, and the image distortion may be further caused by secondary operations such as stretching, rotating, compressing and the like during uploading. Thus, it is difficult to obtain accurate HS coding by single picture retrieval.
Either single text retrieval or single picture retrieval has its own limitations. The text can cause the problems of information loss, unknown semantics and the like because of different expressive power and habits of different people. The increasing size of images has led to a proliferation of similar pictures, so that many undesirable pictures are often found in the search results. Therefore, the information between the two modes is fully utilized and mutually complemented, so that the limitation dilemma of falling into a single mode is avoided, and the method has important significance for improving the classification accuracy of the customs import and export commodities.
Disclosure of Invention
The invention aims to provide a customs import and export commodity classifying method based on image text combined retrieval, which realizes the combined use of text description information and image information of customs import and export commodities and assists related enterprises and customs personnel to classify different import and export commodities rapidly and accurately.
In order to solve the technical problems, the invention provides a method for image text combined retrieval, which fully utilizes the complementary information between the two modes to improve the classification accuracy of customs import and export commodities.
The technical scheme of the invention is as follows:
a customs import and export commodity classifying method based on image text combination retrieval comprises the following steps:
step 1: and constructing a customs commodity image text acquisition module, constructing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information.
Step 2: and constructing a customs commodity image text data preprocessing module, denoising and data enhancing the image, and unifying the format and the size of the image. And performing word segmentation, stop word removal and vectorization on the text.
Step 3: and constructing a customs commodity image text feature extraction module, and extracting low, medium and high-level features of the image by using a convolutional neural network as a commodity image feature encoder. And encoding the commodity text by using the long-term and short-term memory neural network, and extracting text characteristics of commodity description information.
Step 4: and (3) constructing a customs commodity image text feature multi-mode fusion module, and fusing the low, middle and high-level image features and the text features in the step (3) to obtain low, middle and high-level image text fusion features.
Step 5: the method comprises the steps of constructing a customs commodity image text combined retrieval model, and comprising a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module. Inputting image text data of a customs commodity image text database into a model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function.
Step 6: classifying the commodities to be searched, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities.
Further, in the step 1, a customs commodity image text acquisition module is constructed, commodity image data and text description data in a customs commodity image text database are in one-to-one correspondence, the text description data comprises commodity names and HS codes, and the data acquisition specifically comprises the following steps:
1.1 The office staff needs to upload commodity description text and pictures at the same time when reporting, and the background automatically collects commodity pictures, names and HS codes of the filled commodity, and stores the commodity pictures and the names and HS codes into a customs commodity image text database.
1.2 In the customs site checking link, the commodity images in the customs commodity image text database are required to be compared to determine whether the commodity is the commodity. If the customs commodity image text database does not contain the commodity, acquiring the image, commodity name and HS code of the commodity after further checking by manpower, and storing the image, commodity name and HS code in the customs commodity image text database.
Further, in the step 2, a customs commodity image text data preprocessing module is constructed, denoising and data enhancement operations are carried out on the image, and the format and the size of the image are unified. And performing word segmentation, stop word removal and vectorization on the text. The specific process comprises the following steps:
2.1 Denoising and standardizing the commodity image, and unifying the format and the size of the image. And carrying out data enhancement on the denoised image by using random clipping, affine transformation and brightness adjustment, so as to improve the universality of the image data.
2.2 Using NLPIR natural language word segmentation system to segment text, and removing stop words (usually words which are not considered in text processing process) to obtain a word dictionary. The value of each word in the dictionary represents the frequency level at which it appears in all sentences, and finally GloVe is used to convert the text into a word vector.
Further, in the step 3 of constructing a customs commodity image text feature extraction module, resNet-18 is used as a commodity image feature encoder to extract low, medium and high-level features of the image. And encoding the commodity text by using the LSTM to obtain the text characteristics of the commodity descriptive information. The specific process comprises the following steps:
3.1 Using ResNet-18 convolutional neural network as commodity image feature encoder, extracting low-level features as low-level features L of image in shallow network of ResNet-18, extracting middle-level features as middle-level features M of image in middle-level network of ResNet-18, extracting high-level features as high-level features H of image in high-level network of ResNet-18).
F={L,M,H}=ResNet(m)#(1)
Wherein F is the low (L), medium (M), high (H) layer image feature set of image M.
3.2 Inputting the text word vector obtained in the step 2.2 into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text.
Further, in the step 4, a customs commodity image text characteristic multi-mode fusion module is constructed, and the specific process comprises the following steps:
4.1 Expanding the text feature T by copying to be the same as the dimension of the image feature.
4.2 Multiplying the low, medium and high-level image features L, M, H in step 3 with text corresponding elements to obtain a joint representation of the image text features: LT, MT, HT.
4.3 Using Sigmoid functions, convolution, and de-averaging normalization.
Further, a customs commodity image text combination retrieval model is built in the step 5, the customs commodity image text combination retrieval model comprises a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module, image text data of a customs commodity image text database are input into the model to obtain customs commodity fusion features, a training data set is obtained through random sampling, and the model is trained by using a triplet loss function. The method comprises the following specific steps:
5.1 Inputting the image text data of the customs commodity image text database into a customs commodity image text combination retrieval model to obtain the customs commodity fusion characteristics. And establishing a customs commodity fusion feature index library, and storing customs commodity fusion features in a lasting manner. Randomly sampling the customs commodity fusion characteristics to obtain a training data set.
5.2 Clustering the image text fusion characteristics of the training data set by adopting a K-means algorithm to obtain a clustering center C t
5.3 Fusing the text fusion characteristic x of the query image of the training data set with the clustering center C t Calculating to find the most similar cluster c t And then carrying out similarity calculation with each image text fusion feature y in the cluster to determine the relative positions of the image text fusion features y in the cluster in the feature space. The similarity function uses cosine similarity, as indicated by the dot product, with the formula:
5.4 At most similar cluster c) t Randomly selecting positive sample y + Randomly selecting a negative sample y from other clusters - Repeated n times, n is an adjustable parameter. Training a customs commodity image text combined retrieval model by using a triple loss function, so that the similarity between positive sample pairs is as large as possible, and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy. The triplet loss function formula is:
wherein D (x, y - ) Representing x and negative sample y - Cosine similarity between D (x, y + ) Representing x and positive sample y + The cosine similarity between the positive sample pairs and the negative sample pairs is equal to or greater than t, and the loss is 0 when the difference between the cosine similarity between the positive sample pairs and the cosine similarity between the negative sample pairs is greater than t.
Further, classifying the commodities to be searched in the step 6, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to the similarity matching result, and completing classifying the commodities. The method comprises the following specific steps:
6.1 Inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model, and carrying out image text data preprocessing, feature extraction and multi-mode fusion.
6.2 Similar to step 5.2), clustering the customs commodity fusion features of the customs commodity fusion feature index library by using a K-means clustering algorithm to obtain a customs commodity image text fusion feature clustering center C h
6.3 Similar to step 5.3, the fusion characteristics of the commodities to be searched are firstly combined with the clustering center C h Calculating and findingTo the most similar cluster c h And performing cosine similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.
The invention has the beneficial effects that:
1) The invention applies the image text combination retrieval technology to the field of customs import and export commodity classification for the first time, and provides a new method for customs import and export commodity classification.
2) The invention fully utilizes the information of two modes of the image and the text, captures the internal association between the two modes, and utilizes the complementarity between the two modes of the image and the text to complete the integration of the information, thereby learning better characteristic representation and improving the retrieval accuracy.
3) Aiming at the fact that customs does not have import and export commodity image data sets, the method collects customs import and export commodity images and corresponding description texts and establishes a customs commodity image text database for storage in the enterprise declaration process and customs site checking link innovatively.
Drawings
FIG. 1 is a general flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the collection and related processing of text data of a sea-related image according to the present invention;
FIG. 3 is a schematic diagram of a text combined search model of an image of a sea-related commodity;
fig. 4 is a schematic diagram of a text feature extraction and hierarchical matching architecture for a sea-related image according to the present invention.
Detailed Description
The method for classifying customs import and export commodities based on image text combination retrieval according to the present invention will be described in more detail with reference to the accompanying drawings, so that those skilled in the art can better understand the present invention.
Referring to fig. 1, the method for classifying customs import and export commodities based on image text combination retrieval of the invention is further described in detail and comprises the following steps:
step 1: and constructing a customs commodity image text acquisition module, constructing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information. Referring to fig. 2, the specific steps are as follows:
1.1 When the business customs declaration personnel declares the flow, 1 commodity object front picture needs to be added, the picture can be photographed and uploaded through a mobile phone, and the size of the picture is not more than 1024kb. After customs auditing, storing the commodity picture, the corresponding commodity name and HS code into a customs commodity image text database, wherein the customs commodity image text database adopts a Redis database.
1.2 In the customs site checking link, a high-speed image shooting instrument is used for scanning and checking the commodity. If the customs commodity image text database does not contain the commodity, the customs personnel are required to further check, and after the check passes, the front image, commodity name and HS code of the commodity are stored in the customs commodity image text database.
Step 2: constructing a customs commodity image text data preprocessing module, denoising and data enhancing the image, unifying the format and the size of the image, and performing word segmentation, stop word removal and vectorization on the text. Referring to fig. 3, the specific steps are as follows:
2.1 Denoising the images, wherein the unified image is in jpg format, the size of the unified image is 256 x 256, and renaming the images by using the corresponding commodity names, tax codes and image IDs.
2.2 Using a transform module in a pytorch framework to enhance the image data, and improving the universality of the image data, specifically: randomly cutting the image by using transform. Affine transformation is carried out on the image by using transform. Random Affine, wherein the affine transformation consists of five basic atoms, namely selection, translation, scaling, miscut and overturn; the brightness, contrast, saturation, and hue of the image are adjusted using a transform.
2.3 Using NLPIR natural language word segmentation system to segment commodity description text, and decomposing sentence into data structure with word as unit. Afterwards, stop words (articles, mediates) are removed from the textWords, pronouns, conjunctions) to obtain a word dictionary in which the value of each word represents the frequency level at which it appears in all sentences. Text is converted to word vectors using the GloVe model: w= { w 1 ,…,w n -w is n Representing the nth word vector.
Step 3: and constructing a customs commodity image text feature extraction module, using a convolutional neural network as a commodity image feature encoder, extracting low, middle and high-layer features of the image, and encoding commodity text by using a long-term and short-term memory neural network to obtain text features of commodity description information. With reference to fig. 3 and 4, the specific steps are as follows:
3.1 Using ResNet-18 as an image feature extractor, extracting the 5 th layer feature of ResNet-18 as the lower layer feature L of the image, extracting the 9 th layer feature of ResNet-18 as the middle layer feature M of the image, and extracting the 17 th layer feature of ResNet-18 as the upper layer feature H of the image, thereby capturing visual information of different granularities:
F q ={L q ,M q ,H q }=ResNet(m q )#(4)
F h ={L h ,M h ,H h }=ResNet(m h )#(5)
wherein F is q For the image m to be retrieved q Low (L) q ) Middle (M) q ) High (H) q ) Layer image feature set, F h For customs images m h Low (L) h ) Middle (M) h ) High (H) h ) A layer image feature set.
3.2 Inputting the text word vector w obtained in the step 2.3 into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text:
T q =LSTM(w q )#(6)
T h =LSTM(w h )#(7)
wherein T is q 、w q For the text feature vector and word vector to be searched, T h 、w h Is a customs text feature vector and a word vector.
Step 4: and (3) constructing a customs commodity image text feature multi-mode fusion module, and fusing the low, middle and high-level image features and the text features in the step (3) to obtain low, middle and high-level image text fusion features. With reference to fig. 3 and 4, the specific steps are as follows:
4.1 Expanding the text feature T by copying to make it identical to the image feature dimension, normalizing the text and image feature vector to make its value range between 0 and 1 so as to eliminate the dimension difference between different feature vectors and obtain the search text feature vector T' q Customs text feature vector T' h
4.2 To T' q And F in formula (4) q 、T′ h And F in formula (5) h Multiplying the corresponding elements to obtain the text feature joint representation C of the image to be searched q Textual feature joint representation C of customs images h
4.3 Optimizing by using Sigmoid function (sigma), convolution (conv) and de-mean normalization (BN) to obtain the text fusion feature alpha of the image to be retrieved q Customs commodity fusion feature alpha h
α q =σ(conv(BN(C q )))#(10)
α h =σ(conv(BN(C h )))#(11)
Step 5: the method comprises the steps of constructing a customs commodity image text combined retrieval model, comprising a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module, inputting image text data of a customs commodity image text database into the model to obtain customs commodity fusion features, randomly sampling to obtain a training data set, and training the model by using a triplet loss function. With reference to fig. 3 and 4, the specific steps are as follows:
5.1 Image of customs commodityThe image text data of the text database is input into a customs commodity image text combination retrieval model, and an image feature set F of the customs commodity is obtained through a feature extraction module, such as formulas (5) and (7) h And text feature vector T h . Obtaining customs commodity fusion characteristics alpha through a multi-mode fusion module, such as formulas (9) and (11) h . Adopting Faiss (Facebook AI Similarity Search) to build customs commodity fusion characteristic index library, and for alpha h Persistent storage is performed as shown in fig. 2. For alpha h Randomly sampling to obtain a training data set.
5.2 Image text fusion feature alpha of training data set by adopting K-means algorithm ht Clustering to obtain a cluster center C t
Wherein the cluster Center of the kth cluster is a Center k ,C k Representing the number of data objects contained in the kth cluster,representing the fusion characteristics of the ith training dataset, dist represents the Euclidean distance, and K represents the number of class clusters, which is set to 100.
5.3 Text fusion features for query images of training data setsFirst with a cluster center C t Similarity calculation is carried out to find the most similar cluster c t And performing similarity calculation with each image text fusion feature in the cluster to determine the relative positions of the image text fusion features in the feature space. The similarity function uses cosine similarity, as indicated by the dot product, with the formula:
5.4 At most similar cluster c) t Randomly selecting positive samplesRandomly selecting a negative sample in other clusters +.>Repeated n times, n is an adjustable parameter. Training a customs commodity image text combined retrieval model by using a triple loss function, so that the similarity between positive sample pairs is as large as possible, and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy. The triplet loss function formula is:
wherein,representation->And negative sample->Cosine similarity between->Representation->And positive sample->The cosine similarity between the positive sample pairs and the negative sample pairs is equal to or greater than t, and the loss is 0 when the difference between the cosine similarity between the positive sample pairs and the cosine similarity between the negative sample pairs is greater than t.
Step 6: classifying the commodities to be searched, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities. The method comprises the following specific steps:
6.1 Inputting the image and text of the commodity to be searched into a customs commodity image text combined search model, and obtaining an image feature set F of the commodity to be searched through a feature extraction module, such as formulas (4) and (6) q And text feature vector T q . Obtaining the image text fusion characteristic alpha of the commodity to be searched through a multi-mode fusion module, such as formulas (8) and (10) q
6.2 Similar to step 5.2, the K-means clustering algorithm is used for fusing the characteristics alpha of customs commodity in the characteristic index library of customs commodity fusion h Clustering to obtain a cluster center C h
6.3 Similar to step 5.3, alpha is determined q First with C h Similarity calculation is carried out to find the most similar cluster c h And then carrying out similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting thereof; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the previous embodiment can be modified or part of technical features can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A customs import and export commodity classifying method based on image text combination retrieval is characterized by comprising the following steps:
step 1), establishing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information;
step 2) denoising and data enhancement operations are carried out on the images, the format and the size of the images are unified, and word segmentation, stop word removal and vectorization operations are carried out on texts;
step 3) using a convolutional neural network as a commodity image feature encoder, extracting low, medium and high-level features of an image, encoding commodity texts by using a long-term and short-term memory neural network, and extracting text features of commodity description information;
step 4) fusing the low, middle and high-level image characteristics and text characteristics in the step 3) to obtain low, middle and high-level image text fusion characteristics;
step 5) inputting image text data of a customs commodity image text database into a customs commodity image text combination retrieval model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function;
and 6) inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities.
2. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the specific steps of the step 1) are as follows:
step 1.1), automatically collecting commodity pictures, declared and filled commodity names and HS codes through a background, and storing the commodity pictures and the commodity names and HS codes into a customs commodity image text database;
and 1.2) comparing the commodity image in the customs commodity image text database with the commodity, if the commodity is not in the customs commodity image text database, acquiring the image, commodity name and HS code of the commodity after further checking by manpower, and storing the image, commodity name and HS code in the customs commodity image text database.
3. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 2) specifically comprises the following steps:
step 2.1) denoising the commodity image, unifying the format and the size of the image, and carrying out data enhancement on the denoised image by adopting the modes of random clipping, affine transformation and brightness adjustment;
step 2.2) using an NLPIR natural language word segmentation system to segment the text, and then removing stop words from the text to obtain a word dictionary; the value of each word in the dictionary represents the frequency level at which it appears in all sentences, and finally GloVe is used to convert the text into a word vector.
4. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 3) specifically comprises the following steps:
step 3.1) using a ResNet-18 convolutional neural network as a commodity image feature encoder, extracting low-level features as low-level features L of an image on a shallow network of the ResNet-18, extracting middle-level features as middle-level features M of the image on a middle-level network of the ResNet-18, and extracting high-level features as high-level features H of the image on a high-level network of the ResNet-18:
F={L,M,H}=ResNet(m)#(1)
f is a low L layer image feature set, a medium M layer image feature set and a high H layer image feature set of the image M;
step 3.2) inputting the text word vector obtained in the step 2.2) into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text.
5. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 4) specifically comprises the following steps:
step 4.1) expanding the feature vector T through copying to enable the feature vector T to be identical with the feature dimension of the image;
step 4.2) multiplying the low L, medium M and high H layer image features in the step 3) with text corresponding elements to obtain a joint representation of the image text features: LT, MT, HT;
step 4.3) optimizing using Sigmoid function, convolution and de-averaging normalization.
6. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 5) specifically comprises the following steps:
step 5.1), establishing a customs commodity fusion feature index library, and storing customs commodity fusion features in a lasting manner; randomly sampling the customs commodity fusion characteristics to obtain a training data set;
step 5.2) clustering the image text fusion characteristics of the training data set by adopting a K-means algorithm to obtain a clustering center C t
Step 5.3) fusing the characteristic x of the text fusion of the query image of the training data set with the clustering center C t Calculating to find the most similar cluster c t Then carrying out similarity calculation with each image text fusion feature y in the cluster to determine the relative positions of the image text fusion features y in the cluster in a feature space; the similarity function uses cosine similarity, as indicated by the dot product, with the formula:
step 5.4) in most similar cluster c t Randomly selecting positive sample y + Randomly selecting a negative sample y from other clusters - Repeatedly taking n times, wherein n is an adjustable parameter; training a customs commodity image text combination retrieval model by using a triple loss function to ensure that the similarity between positive sample pairs is as large as possible and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy; the triplet loss function formula is:
wherein D (x, y - ) Representing x and negative sample y - Cosine similarity between D (x, y + ) Representing x and positive sample y + The cosine similarity between the positive sample pairs and the negative sample pairs is equal to or greater than t, and the loss is 0 when the difference between the cosine similarity between the positive sample pairs and the cosine similarity between the negative sample pairs is greater than t.
7. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 6) specifically comprises the following steps:
step 6.1), inputting images and texts of the commodities to be searched into a customs commodity image text combination search model, and carrying out image text data preprocessing, feature extraction and multi-mode fusion;
step 6.2) clustering the customs commodity fusion features of the customs commodity fusion feature index library by using a K-means clustering algorithm to obtain a customs commodity image text fusion feature clustering center C h
Step 6.3) combining the fusion characteristics of the commodities to be searched with a clustering center C h Calculating to find the most similar cluster c h And performing cosine similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.
CN202310894185.4A 2023-07-20 2023-07-20 Customs import and export commodity classification method based on image text combination retrieval Pending CN117609583A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310894185.4A CN117609583A (en) 2023-07-20 2023-07-20 Customs import and export commodity classification method based on image text combination retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310894185.4A CN117609583A (en) 2023-07-20 2023-07-20 Customs import and export commodity classification method based on image text combination retrieval

Publications (1)

Publication Number Publication Date
CN117609583A true CN117609583A (en) 2024-02-27

Family

ID=89952182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310894185.4A Pending CN117609583A (en) 2023-07-20 2023-07-20 Customs import and export commodity classification method based on image text combination retrieval

Country Status (1)

Country Link
CN (1) CN117609583A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952507A (en) * 2024-03-26 2024-04-30 南京亿猫信息技术有限公司 Intelligent shopping cart commodity returning identification method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952507A (en) * 2024-03-26 2024-04-30 南京亿猫信息技术有限公司 Intelligent shopping cart commodity returning identification method and system

Similar Documents

Publication Publication Date Title
US20170024384A1 (en) System and method for analyzing and searching imagery
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
US20110116690A1 (en) Automatically Mining Person Models of Celebrities for Visual Search Applications
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
Yang et al. An improved Bag-of-Words framework for remote sensing image retrieval in large-scale image databases
CN115443490A (en) Image auditing method and device, equipment and storage medium
CN113889228B (en) Semantic enhancement hash medical image retrieval method based on mixed attention
CN106033426A (en) Image retrieval method based on latent semantic minimum hash
CN113836896A (en) Patent text abstract generation method and device based on deep learning
CN114416979A (en) Text query method, text query equipment and storage medium
CN117609583A (en) Customs import and export commodity classification method based on image text combination retrieval
CN110598022A (en) Image retrieval system and method based on robust deep hash network
Akhlaghi et al. Farsi handwritten phone number recognition using deep learning
Poornima et al. Multi-modal features and correlation incorporated Naive Bayes classifier for a semantic-enriched lecture video retrieval system
Mussarat et al. Content based image retrieval using combined features of shape, color and relevance feedback
CN112560925A (en) Complex scene target detection data set construction method and system
CN116975738A (en) Polynomial naive Bayesian classification method for question intent recognition
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN112650877B (en) High-definition remote sensing image quick retrieval method based on improved depth residual convolution neural network and depth hash
Hadid et al. Semantic Image Retrieval Analysis Based on Deep Learning and Singular Value Decomposition
CN103530656B (en) Hidden structure learning-based image digest generation method
Mumar Image retrieval using SURF features
Feng et al. Image retrieval system based on bag of view words model
Rahul et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN111611427A (en) Image retrieval method and system based on linear discriminant analysis depth hash algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination