CN117609583A

CN117609583A - Customs import and export commodity classification method based on image text combination retrieval

Info

Publication number: CN117609583A
Application number: CN202310894185.4A
Authority: CN
Inventors: 杨良怀; 秦钰淑; 朱艳超; 龚卫华; 范玉雷; 傅萧磊; 贾美; 项逸婧; 朱辰; 蔡华
Original assignee: China Electronic Port Data Center Hangzhou Branch; Zhejiang University of Technology ZJUT
Current assignee: China Electronic Port Data Center Hangzhou Branch; Zhejiang University of Technology ZJUT
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2024-02-27

Abstract

The invention discloses a customs import and export commodity classifying method based on image text combination retrieval, which comprises the following steps: 1) Establishing a customs commodity image text database; 2) Denoising and data enhancing operations are carried out on the image, and word segmentation, stop word removal and vectorization operations are carried out on text data; 3) Extracting low, medium and high-layer features of the image by using a convolutional neural network, and extracting commodity text features by using a long-term memory neural network; 4) Fusing low, middle and high-level features of the image with text features; 5) Inputting image text data of a customs commodity image text database into a model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function; 6) The method and the system realize the combined use of the text description information and the image information of the import and export commodities of the customs, and assist related enterprises and customs personnel to quickly and accurately classify different import and export commodities.

Description

Customs import and export commodity classification method based on image text combination retrieval

Technical Field

The invention relates to the technical field of computer multi-mode retrieval and customs import and export commodity classification, in particular to a method for obtaining customs tax code by combining and retrieving commodity images and texts.

Background

In the field of customs import and export commodity classification, the customs commodity classification rule is complex, and influence factors are numerous. Under otherwise identical conditions, an incorrect tariff code (HS code) may be obtained if a factor is changed. While incorrect categorization of goods may result in different penalty consequences such as delays in clearance, withholding of goods, refusal to enjoy certain import and export benefits, etc. Thus, accurately categorizing customs merchandise is an extremely important and challenging task.

The general working form of customs commodity classification at present is to use declaration text (such as commodity name, component, use and other information) of import and export commodity to identify and classify. However, due to the fact that customs law enforcement personnel and import and export business personnel have different professional levels, even for the same commodity, certain differences exist in descriptions among different personnel, and therefore the classification of the commodity and subsequent tax return work are easy to be wrong. Thus, it is difficult to obtain accurate HS coding by a single text search.

With the development of internet technology and the popularization of digital devices, image retrieval technology has been rapidly developed and applied in various fields related to computer vision and artificial intelligence: the commodity image retrieval can help customers to find favorite commodities faster and more accurately; medical image retrieval may help doctors to perform medical diagnosis and the like more effectively. However, the application of the image retrieval technology to the field of customs import and export commodity classification is very challenging: firstly, the customs basically does not establish a reliable import and export commodity image data set, and establishing a related image data set means that the current import and export customs declaration flow is required to be modified; second, goods from the same brand may have different specifications, which typically have a similar appearance but different HS codes; finally, the problem of image distortion may occur when the mobile phone or the camera shoots the commodity, and the image distortion may be further caused by secondary operations such as stretching, rotating, compressing and the like during uploading. Thus, it is difficult to obtain accurate HS coding by single picture retrieval.

Either single text retrieval or single picture retrieval has its own limitations. The text can cause the problems of information loss, unknown semantics and the like because of different expressive power and habits of different people. The increasing size of images has led to a proliferation of similar pictures, so that many undesirable pictures are often found in the search results. Therefore, the information between the two modes is fully utilized and mutually complemented, so that the limitation dilemma of falling into a single mode is avoided, and the method has important significance for improving the classification accuracy of the customs import and export commodities.

Disclosure of Invention

The invention aims to provide a customs import and export commodity classifying method based on image text combined retrieval, which realizes the combined use of text description information and image information of customs import and export commodities and assists related enterprises and customs personnel to classify different import and export commodities rapidly and accurately.

In order to solve the technical problems, the invention provides a method for image text combined retrieval, which fully utilizes the complementary information between the two modes to improve the classification accuracy of customs import and export commodities.

The technical scheme of the invention is as follows:

a customs import and export commodity classifying method based on image text combination retrieval comprises the following steps:

step 1: and constructing a customs commodity image text acquisition module, constructing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information.

Step 2: and constructing a customs commodity image text data preprocessing module, denoising and data enhancing the image, and unifying the format and the size of the image. And performing word segmentation, stop word removal and vectorization on the text.

Step 3: and constructing a customs commodity image text feature extraction module, and extracting low, medium and high-level features of the image by using a convolutional neural network as a commodity image feature encoder. And encoding the commodity text by using the long-term and short-term memory neural network, and extracting text characteristics of commodity description information.

Step 4: and (3) constructing a customs commodity image text feature multi-mode fusion module, and fusing the low, middle and high-level image features and the text features in the step (3) to obtain low, middle and high-level image text fusion features.

Step 5: the method comprises the steps of constructing a customs commodity image text combined retrieval model, and comprising a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module. Inputting image text data of a customs commodity image text database into a model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function.

Step 6: classifying the commodities to be searched, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities.

Further, in the step 1, a customs commodity image text acquisition module is constructed, commodity image data and text description data in a customs commodity image text database are in one-to-one correspondence, the text description data comprises commodity names and HS codes, and the data acquisition specifically comprises the following steps:

1.1 The office staff needs to upload commodity description text and pictures at the same time when reporting, and the background automatically collects commodity pictures, names and HS codes of the filled commodity, and stores the commodity pictures and the names and HS codes into a customs commodity image text database.

1.2 In the customs site checking link, the commodity images in the customs commodity image text database are required to be compared to determine whether the commodity is the commodity. If the customs commodity image text database does not contain the commodity, acquiring the image, commodity name and HS code of the commodity after further checking by manpower, and storing the image, commodity name and HS code in the customs commodity image text database.

Further, in the step 2, a customs commodity image text data preprocessing module is constructed, denoising and data enhancement operations are carried out on the image, and the format and the size of the image are unified. And performing word segmentation, stop word removal and vectorization on the text. The specific process comprises the following steps:

2.1 Denoising and standardizing the commodity image, and unifying the format and the size of the image. And carrying out data enhancement on the denoised image by using random clipping, affine transformation and brightness adjustment, so as to improve the universality of the image data.

2.2 Using NLPIR natural language word segmentation system to segment text, and removing stop words (usually words which are not considered in text processing process) to obtain a word dictionary. The value of each word in the dictionary represents the frequency level at which it appears in all sentences, and finally GloVe is used to convert the text into a word vector.

Further, in the step 3 of constructing a customs commodity image text feature extraction module, resNet-18 is used as a commodity image feature encoder to extract low, medium and high-level features of the image. And encoding the commodity text by using the LSTM to obtain the text characteristics of the commodity descriptive information. The specific process comprises the following steps:

3.1 Using ResNet-18 convolutional neural network as commodity image feature encoder, extracting low-level features as low-level features L of image in shallow network of ResNet-18, extracting middle-level features as middle-level features M of image in middle-level network of ResNet-18, extracting high-level features as high-level features H of image in high-level network of ResNet-18).

F＝{L，M，H}＝ResNet(m)#(1)

Wherein F is the low (L), medium (M), high (H) layer image feature set of image M.

3.2 Inputting the text word vector obtained in the step 2.2 into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text.

Further, in the step 4, a customs commodity image text characteristic multi-mode fusion module is constructed, and the specific process comprises the following steps:

4.1 Expanding the text feature T by copying to be the same as the dimension of the image feature.

4.2 Multiplying the low, medium and high-level image features L, M, H in step 3 with text corresponding elements to obtain a joint representation of the image text features: LT, MT, HT.

4.3 Using Sigmoid functions, convolution, and de-averaging normalization.

Further, a customs commodity image text combination retrieval model is built in the step 5, the customs commodity image text combination retrieval model comprises a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module, image text data of a customs commodity image text database are input into the model to obtain customs commodity fusion features, a training data set is obtained through random sampling, and the model is trained by using a triplet loss function. The method comprises the following specific steps:

5.1 Inputting the image text data of the customs commodity image text database into a customs commodity image text combination retrieval model to obtain the customs commodity fusion characteristics. And establishing a customs commodity fusion feature index library, and storing customs commodity fusion features in a lasting manner. Randomly sampling the customs commodity fusion characteristics to obtain a training data set.

5.2 Clustering the image text fusion characteristics of the training data set by adopting a K-means algorithm to obtain a clustering center C _t 。

5.3 Fusing the text fusion characteristic x of the query image of the training data set with the clustering center C _t Calculating to find the most similar cluster c _t And then carrying out similarity calculation with each image text fusion feature y in the cluster to determine the relative positions of the image text fusion features y in the cluster in the feature space. The similarity function uses cosine similarity, as indicated by the dot product, with the formula:

5.4 At most similar cluster c) _t Randomly selecting positive sample y ⁺ Randomly selecting a negative sample y from other clusters ^- Repeated n times, n is an adjustable parameter. Training a customs commodity image text combined retrieval model by using a triple loss function, so that the similarity between positive sample pairs is as large as possible, and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy. The triplet loss function formula is:

wherein D (x, y ^- ) Representing x and negative sample y ^- Cosine similarity between D (x, y ⁺ ) Representing x and positive sample y ⁺ The cosine similarity between the positive sample pairs and the negative sample pairs is equal to or greater than t, and the loss is 0 when the difference between the cosine similarity between the positive sample pairs and the cosine similarity between the negative sample pairs is greater than t.

Further, classifying the commodities to be searched in the step 6, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to the similarity matching result, and completing classifying the commodities. The method comprises the following specific steps:

6.1 Inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model, and carrying out image text data preprocessing, feature extraction and multi-mode fusion.

6.2 Similar to step 5.2), clustering the customs commodity fusion features of the customs commodity fusion feature index library by using a K-means clustering algorithm to obtain a customs commodity image text fusion feature clustering center C _h 。

6.3 Similar to step 5.3, the fusion characteristics of the commodities to be searched are firstly combined with the clustering center C _h Calculating and findingTo the most similar cluster c _h And performing cosine similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.

The invention has the beneficial effects that:

1) The invention applies the image text combination retrieval technology to the field of customs import and export commodity classification for the first time, and provides a new method for customs import and export commodity classification.

2) The invention fully utilizes the information of two modes of the image and the text, captures the internal association between the two modes, and utilizes the complementarity between the two modes of the image and the text to complete the integration of the information, thereby learning better characteristic representation and improving the retrieval accuracy.

3) Aiming at the fact that customs does not have import and export commodity image data sets, the method collects customs import and export commodity images and corresponding description texts and establishes a customs commodity image text database for storage in the enterprise declaration process and customs site checking link innovatively.

Drawings

FIG. 1 is a general flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of the collection and related processing of text data of a sea-related image according to the present invention;

FIG. 3 is a schematic diagram of a text combined search model of an image of a sea-related commodity;

fig. 4 is a schematic diagram of a text feature extraction and hierarchical matching architecture for a sea-related image according to the present invention.

Detailed Description

The method for classifying customs import and export commodities based on image text combination retrieval according to the present invention will be described in more detail with reference to the accompanying drawings, so that those skilled in the art can better understand the present invention.

Referring to fig. 1, the method for classifying customs import and export commodities based on image text combination retrieval of the invention is further described in detail and comprises the following steps:

step 1: and constructing a customs commodity image text acquisition module, constructing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information. Referring to fig. 2, the specific steps are as follows:

1.1 When the business customs declaration personnel declares the flow, 1 commodity object front picture needs to be added, the picture can be photographed and uploaded through a mobile phone, and the size of the picture is not more than 1024kb. After customs auditing, storing the commodity picture, the corresponding commodity name and HS code into a customs commodity image text database, wherein the customs commodity image text database adopts a Redis database.

1.2 In the customs site checking link, a high-speed image shooting instrument is used for scanning and checking the commodity. If the customs commodity image text database does not contain the commodity, the customs personnel are required to further check, and after the check passes, the front image, commodity name and HS code of the commodity are stored in the customs commodity image text database.

Step 2: constructing a customs commodity image text data preprocessing module, denoising and data enhancing the image, unifying the format and the size of the image, and performing word segmentation, stop word removal and vectorization on the text. Referring to fig. 3, the specific steps are as follows:

2.1 Denoising the images, wherein the unified image is in jpg format, the size of the unified image is 256 x 256, and renaming the images by using the corresponding commodity names, tax codes and image IDs.

2.2 Using a transform module in a pytorch framework to enhance the image data, and improving the universality of the image data, specifically: randomly cutting the image by using transform. Affine transformation is carried out on the image by using transform. Random Affine, wherein the affine transformation consists of five basic atoms, namely selection, translation, scaling, miscut and overturn; the brightness, contrast, saturation, and hue of the image are adjusted using a transform.

2.3 Using NLPIR natural language word segmentation system to segment commodity description text, and decomposing sentence into data structure with word as unit. Afterwards, stop words (articles, mediates) are removed from the textWords, pronouns, conjunctions) to obtain a word dictionary in which the value of each word represents the frequency level at which it appears in all sentences. Text is converted to word vectors using the GloVe model: w= { w ₁ ,…,w _n -w is _n Representing the nth word vector.

Step 3: and constructing a customs commodity image text feature extraction module, using a convolutional neural network as a commodity image feature encoder, extracting low, middle and high-layer features of the image, and encoding commodity text by using a long-term and short-term memory neural network to obtain text features of commodity description information. With reference to fig. 3 and 4, the specific steps are as follows:

3.1 Using ResNet-18 as an image feature extractor, extracting the 5 th layer feature of ResNet-18 as the lower layer feature L of the image, extracting the 9 th layer feature of ResNet-18 as the middle layer feature M of the image, and extracting the 17 th layer feature of ResNet-18 as the upper layer feature H of the image, thereby capturing visual information of different granularities:

F _q ＝{L _q ，M _q ，H _q }＝ResNet(m _q )#(4)

F _h ＝{L _h ，M _h ，H _h }＝ResNet(m _h )#(5)

wherein F is _q For the image m to be retrieved _q Low (L) _q ) Middle (M) _q ) High (H) _q ) Layer image feature set, F _h For customs images m _h Low (L) _h ) Middle (M) _h ) High (H) _h ) A layer image feature set.

3.2 Inputting the text word vector w obtained in the step 2.3 into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text:

T _q ＝LSTM(w _q )#(6)

T _h ＝LSTM(w _h )#(7)

wherein T is _q 、w _q For the text feature vector and word vector to be searched, T _h 、w _h Is a customs text feature vector and a word vector.

Step 4: and (3) constructing a customs commodity image text feature multi-mode fusion module, and fusing the low, middle and high-level image features and the text features in the step (3) to obtain low, middle and high-level image text fusion features. With reference to fig. 3 and 4, the specific steps are as follows:

4.1 Expanding the text feature T by copying to make it identical to the image feature dimension, normalizing the text and image feature vector to make its value range between 0 and 1 so as to eliminate the dimension difference between different feature vectors and obtain the search text feature vector T' _q Customs text feature vector T' _h 。

4.2 To T' _q And F in formula (4) _q 、T′ _h And F in formula (5) _h Multiplying the corresponding elements to obtain the text feature joint representation C of the image to be searched _q Textual feature joint representation C of customs images _h ：

4.3 Optimizing by using Sigmoid function (sigma), convolution (conv) and de-mean normalization (BN) to obtain the text fusion feature alpha of the image to be retrieved _q Customs commodity fusion feature alpha _h ：

α _q ＝σ(conv(BN(C _q )))#(10)

α _h ＝σ(conv(BN(C _h )))#(11)

Step 5: the method comprises the steps of constructing a customs commodity image text combined retrieval model, comprising a customs commodity image text data preprocessing module, a feature extraction module and a multi-mode fusion module, inputting image text data of a customs commodity image text database into the model to obtain customs commodity fusion features, randomly sampling to obtain a training data set, and training the model by using a triplet loss function. With reference to fig. 3 and 4, the specific steps are as follows:

5.1 Image of customs commodityThe image text data of the text database is input into a customs commodity image text combination retrieval model, and an image feature set F of the customs commodity is obtained through a feature extraction module, such as formulas (5) and (7) _h And text feature vector T _h . Obtaining customs commodity fusion characteristics alpha through a multi-mode fusion module, such as formulas (9) and (11) _h . Adopting Faiss (Facebook AI Similarity Search) to build customs commodity fusion characteristic index library, and for alpha _h Persistent storage is performed as shown in fig. 2. For alpha _h Randomly sampling to obtain a training data set.

5.2 Image text fusion feature alpha of training data set by adopting K-means algorithm _ht Clustering to obtain a cluster center C _t ：

Wherein the cluster Center of the kth cluster is a Center _k ，C _k Representing the number of data objects contained in the kth cluster,representing the fusion characteristics of the ith training dataset, dist represents the Euclidean distance, and K represents the number of class clusters, which is set to 100.

5.3 Text fusion features for query images of training data setsFirst with a cluster center C _t Similarity calculation is carried out to find the most similar cluster c _t And performing similarity calculation with each image text fusion feature in the cluster to determine the relative positions of the image text fusion features in the feature space. The similarity function uses cosine similarity, as indicated by the dot product, with the formula:

5.4 At most similar cluster c) _t Randomly selecting positive samplesRandomly selecting a negative sample in other clusters +.>Repeated n times, n is an adjustable parameter. Training a customs commodity image text combined retrieval model by using a triple loss function, so that the similarity between positive sample pairs is as large as possible, and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy. The triplet loss function formula is:

wherein,representation->And negative sample->Cosine similarity between->Representation->And positive sample->The cosine similarity between the positive sample pairs and the negative sample pairs is equal to or greater than t, and the loss is 0 when the difference between the cosine similarity between the positive sample pairs and the cosine similarity between the negative sample pairs is greater than t.

Step 6: classifying the commodities to be searched, inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities. The method comprises the following specific steps:

6.1 Inputting the image and text of the commodity to be searched into a customs commodity image text combined search model, and obtaining an image feature set F of the commodity to be searched through a feature extraction module, such as formulas (4) and (6) _q And text feature vector T _q . Obtaining the image text fusion characteristic alpha of the commodity to be searched through a multi-mode fusion module, such as formulas (8) and (10) _q 。

6.2 Similar to step 5.2, the K-means clustering algorithm is used for fusing the characteristics alpha of customs commodity in the characteristic index library of customs commodity fusion _h Clustering to obtain a cluster center C _h 。

6.3 Similar to step 5.3, alpha is determined _q First with C _h Similarity calculation is carried out to find the most similar cluster c _h And then carrying out similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting thereof; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the previous embodiment can be modified or part of technical features can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A customs import and export commodity classifying method based on image text combination retrieval is characterized by comprising the following steps:

step 1), establishing a customs commodity image text database, and storing customs commodity images and corresponding commodity text description information;

step 2) denoising and data enhancement operations are carried out on the images, the format and the size of the images are unified, and word segmentation, stop word removal and vectorization operations are carried out on texts;

step 3) using a convolutional neural network as a commodity image feature encoder, extracting low, medium and high-level features of an image, encoding commodity texts by using a long-term and short-term memory neural network, and extracting text features of commodity description information;

step 4) fusing the low, middle and high-level image characteristics and text characteristics in the step 3) to obtain low, middle and high-level image text fusion characteristics;

step 5) inputting image text data of a customs commodity image text database into a customs commodity image text combination retrieval model to obtain customs commodity fusion characteristics, randomly sampling to obtain a training data set, and training the model by using a triplet loss function;

and 6) inputting the images and texts of the commodities to be searched into a customs commodity image text combined search model to obtain image text fusion characteristics of the commodities to be searched, carrying out hierarchical matching and similarity calculation on the image text fusion characteristics of the commodities to be searched and the customs commodity fusion characteristics, obtaining an HS coding candidate result set of the commodities to be searched according to a similarity matching result, and completing classification of the commodities.

2. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the specific steps of the step 1) are as follows:

step 1.1), automatically collecting commodity pictures, declared and filled commodity names and HS codes through a background, and storing the commodity pictures and the commodity names and HS codes into a customs commodity image text database;

and 1.2) comparing the commodity image in the customs commodity image text database with the commodity, if the commodity is not in the customs commodity image text database, acquiring the image, commodity name and HS code of the commodity after further checking by manpower, and storing the image, commodity name and HS code in the customs commodity image text database.

3. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 2) specifically comprises the following steps:

step 2.1) denoising the commodity image, unifying the format and the size of the image, and carrying out data enhancement on the denoised image by adopting the modes of random clipping, affine transformation and brightness adjustment;

step 2.2) using an NLPIR natural language word segmentation system to segment the text, and then removing stop words from the text to obtain a word dictionary; the value of each word in the dictionary represents the frequency level at which it appears in all sentences, and finally GloVe is used to convert the text into a word vector.

4. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 3) specifically comprises the following steps:

step 3.1) using a ResNet-18 convolutional neural network as a commodity image feature encoder, extracting low-level features as low-level features L of an image on a shallow network of the ResNet-18, extracting middle-level features as middle-level features M of the image on a middle-level network of the ResNet-18, and extracting high-level features as high-level features H of the image on a high-level network of the ResNet-18:

F＝{L，M，H}＝ResNet(m)#(1)

f is a low L layer image feature set, a medium M layer image feature set and a high H layer image feature set of the image M;

step 3.2) inputting the text word vector obtained in the step 2.2) into an LSTM neural network for word coding, so as to obtain a feature vector T of the whole text.

5. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 4) specifically comprises the following steps:

step 4.1) expanding the feature vector T through copying to enable the feature vector T to be identical with the feature dimension of the image;

step 4.2) multiplying the low L, medium M and high H layer image features in the step 3) with text corresponding elements to obtain a joint representation of the image text features: LT, MT, HT;

step 4.3) optimizing using Sigmoid function, convolution and de-averaging normalization.

6. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 5) specifically comprises the following steps:

step 5.1), establishing a customs commodity fusion feature index library, and storing customs commodity fusion features in a lasting manner; randomly sampling the customs commodity fusion characteristics to obtain a training data set;

step 5.2) clustering the image text fusion characteristics of the training data set by adopting a K-means algorithm to obtain a clustering center C _t ；

Step 5.3) fusing the characteristic x of the text fusion of the query image of the training data set with the clustering center C _t Calculating to find the most similar cluster c _t Then carrying out similarity calculation with each image text fusion feature y in the cluster to determine the relative positions of the image text fusion features y in the cluster in a feature space; the similarity function uses cosine similarity, as indicated by the dot product, with the formula:

step 5.4) in most similar cluster c _t Randomly selecting positive sample y ⁺ Randomly selecting a negative sample y from other clusters ^- Repeatedly taking n times, wherein n is an adjustable parameter; training a customs commodity image text combination retrieval model by using a triple loss function to ensure that the similarity between positive sample pairs is as large as possible and the similarity between negative sample pairs is as small as possible, thereby improving the retrieval accuracy; the triplet loss function formula is:

7. The customs import and export commodity classifying method based on image text combination retrieval as claimed in claim 1, wherein the step 6) specifically comprises the following steps:

step 6.1), inputting images and texts of the commodities to be searched into a customs commodity image text combination search model, and carrying out image text data preprocessing, feature extraction and multi-mode fusion;

step 6.2) clustering the customs commodity fusion features of the customs commodity fusion feature index library by using a K-means clustering algorithm to obtain a customs commodity image text fusion feature clustering center C _h ；

Step 6.3) combining the fusion characteristics of the commodities to be searched with a clustering center C _h Calculating to find the most similar cluster c _h And performing cosine similarity calculation with each image text fusion feature in the cluster, obtaining an HS code candidate result set of the commodity to be searched according to a similarity matching result, and displaying the HS codes of the first five of similarity ranks, wherein the HS codes comprise image information, commodity names and tax code information, so that customs classification of the commodity is completed.