CN114049484A

CN114049484A - Commodity image retrieval method and device, equipment, medium and product thereof

Info

Publication number: CN114049484A
Application number: CN202111288769.4A
Authority: CN
Inventors: 李保俊
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-02-15

Abstract

The application discloses a commodity image retrieval method and a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: extracting image characteristic information of a plurality of scales of a content image of a commodity image to be detected; respectively executing expansion convolution processing on image characteristic information corresponding to at least part of scales, and converting each image characteristic information into expansion characteristic information of the corresponding scale; fusing all the expansion characteristic information into comprehensive characteristic information; and searching a target commodity object similar to the comprehensive characteristic information constitution characteristic from a preset commodity characteristic library, wherein the commodity characteristic library stores the comprehensive characteristic information of a plurality of commodity objects. According to the method and the device, effective representation learning can be performed on the commodity pictures, and accurate picture searching service is realized based on comprehensive characteristic information obtained by the representation learning, so that an efficient similar commodity matching effect can be realized in the E-commerce field.

Description

Commodity image retrieval method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of e-commerce information technologies, and in particular, to a method for retrieving a commodity image, and a corresponding apparatus, computer device, computer-readable storage medium, and computer program product.

Background

With the development of science and technology, there is an increasing demand for searching related goods by taking pictures of objects, such as purchasing goods by user through goods retrieval from buyer show-seller show, stocking goods by merchant through taking pictures of hot goods for searching to corresponding goods sources, and so on. Therefore, the retrieval of the commodity image is an important step, and the understanding and identification of the commodity image are important core links.

Considering that the deep convolutional neural network has strong semantic abstract understanding capability, the deep convolutional neural network not only can map image information of the commodity to a high-dimensional space to obtain high-dimensional representation of the commodity, but also can well inhibit background noise, so that the image recognition technology usually adopts a convolutional neural network model to extract feature vectors, and then performs image retrieval matching on the basis of the feature vectors.

In the prior art, many convolutional neural network models capable of realizing image feature extraction are provided, but generally, the convolutional neural network models are realized by pre-training by using an existing network architecture, and the requirements of image characteristics in a special scene are not considered, so that the situation of poor effect often occurs.

In an e-commerce application scene, images in e-commerce pictures are affected by definition, shooting visual angle, object size, background noise and the like, and large differences often occur, so how to more effectively express and learn the images, effective extraction of deep semantic information of the images is realized, and the actual application effect of the model is affected. That is, if the deep semantic information of the commodity picture cannot be effectively extracted, the condition of inaccurate retrieval and matching is easily caused, so that similar commodity objects of a certain commodity picture cannot be effectively obtained, and the requirements of photographing for searching commodities, matching commodity sources and the like cannot be met.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide a commodity image retrieval method and a corresponding apparatus, computer device, computer readable storage medium, and computer program product.

In order to meet various purposes of the application, the following technical scheme is adopted in the application:

a commodity image retrieval method adapted to one of the objects of the present application includes the steps of:

extracting image characteristic information of a plurality of scales of a content image of a commodity image to be detected;

respectively executing expansion convolution processing on image characteristic information corresponding to at least part of scales, and converting each image characteristic information into expansion characteristic information of the corresponding scale;

fusing all the expansion characteristic information into comprehensive characteristic information;

and searching a target commodity object similar to the comprehensive characteristic information constitution characteristic from a preset commodity characteristic library, wherein the commodity characteristic library stores the comprehensive characteristic information of a plurality of commodity objects.

In a deepened embodiment, the method for extracting the image feature information of the specification image of the to-be-detected commodity image in multiple scales comprises the following steps:

acquiring an image of a commodity to be detected of a target commodity object to be retrieved;

zooming the commodity image to be detected to a preset size to obtain a corresponding specification image;

carrying out content positioning on the specification image, and cutting out a content image according to the positioning;

and sequentially extracting the image characteristic information of the content image at different scales through a plurality of preset residual convolution networks.

In a deepened embodiment, the expanding convolution processing is respectively executed on the image feature information corresponding to at least part of scales, and each image feature information is converted into the image feature information of the corresponding scale, and the method comprises the following steps executed for the image feature information of each scale:

performing expansion convolution processing with different expansion rates on the image characteristic information of the current scale by adopting a convolution layer to correspondingly obtain a plurality of intermediate characteristic information;

splicing the plurality of intermediate characteristic information into expanded characteristic information;

performing feature fusion on the expansion feature information by adopting a convolution layer;

normalizing the expansion characteristic information after the characteristic fusion by adopting a batch normalization layer;

activating and outputting the normalized expansion characteristic information through an activation layer;

and performing down-sampling on the expansion characteristic information which is activated and output through global pooling to obtain expansion characteristic information of a corresponding scale, wherein the expansion characteristic information is a high-dimensional vector.

In a preferred embodiment, the integrated feature information is a high-dimensional vector formed by sequentially splicing all the expansion feature information of the same commodity image.

In a further embodiment, the method for retrieving the target commodity object similar to the comprehensive characteristic information composition characteristic from the preset commodity characteristic library comprises the following steps:

calculating the similarity between the comprehensive characteristic information of the commodity image to be detected and the comprehensive characteristic information of each commodity object in a preset commodity characteristic library;

according to the similarity, carrying out reverse sequencing on the commodity objects in the commodity feature library to obtain a reverse sequence table;

determining a plurality of corresponding target commodity objects from the reverse sequence table according to the preset quantity;

and constructing a similar commodity object list, wherein the list comprises the plurality of target commodity objects.

In a preferred embodiment, a preset neural network model is adopted for executing the corresponding steps of the method to extract the corresponding comprehensive characteristic information for the commodity image, and the neural network model is trained to a convergence state in advance.

An object of the present application is to provide a product image search device including: the system comprises a feature extraction module, an expansion conversion module, a feature synthesis module and a retrieval matching module, wherein the feature extraction module is used for extracting image feature information of a plurality of scales of a content image of a commodity image to be detected; the expansion conversion module is used for respectively executing expansion convolution processing on the image characteristic information corresponding to at least part of scales and converting each image characteristic information into expansion characteristic information of the corresponding scale; the characteristic integration module is used for fusing all the expansion characteristic information into integrated characteristic information; the retrieval matching module is used for retrieving a target commodity object with characteristics similar to the comprehensive characteristic information from a preset commodity characteristic library, and the commodity characteristic library stores the comprehensive characteristic information of a plurality of commodity objects.

In a further embodiment, the feature extraction module includes: the image acquisition submodule is used for acquiring an image of the to-be-detected commodity of the target commodity object to be retrieved; the scaling processing submodule is used for scaling the to-be-detected commodity image to a preset size to obtain a corresponding specification image; the content positioning submodule is used for positioning the content of the specification image and cutting out the content image according to the positioning; and the residual convolution submodule is used for sequentially extracting the image characteristic information of the content image at different scales through a plurality of preset residual convolution networks.

In a further embodiment, the dilation conversion module includes the following structure performed for each scale of image feature information: the expansion convolution submodule is used for executing expansion convolution processing with different expansion rates on the image characteristic information of the current scale by adopting the convolution layer and correspondingly obtaining a plurality of intermediate characteristic information; the scale splicing submodule is used for splicing the plurality of intermediate characteristic information into expansion characteristic information; the characteristic fusion submodule is used for carrying out characteristic fusion on the expansion characteristic information by adopting a convolution layer; the batch normalization submodule is used for normalizing the expansion characteristic information after the characteristics are fused by adopting a batch normalization layer; the activation output submodule is used for activating and outputting the normalized expansion characteristic information through an activation layer; and the pooling sampling submodule is used for performing down-sampling on the expansion characteristic information which is activated and output through global pooling to obtain expansion characteristic information of a corresponding scale, wherein the expansion characteristic information is a high-dimensional vector.

In a further embodiment, the retrieving matching module includes: the similarity calculation submodule is used for calculating the similarity between the comprehensive characteristic information of the commodity image to be detected and the comprehensive characteristic information of each commodity object in a preset commodity characteristic library; the reverse sequencing submodule is used for performing reverse sequencing on the commodity objects in the commodity feature library according to the similarity to obtain a reverse sequence table; the target optimization submodule is used for determining a plurality of corresponding target commodity objects from the reverse sequence table according to the preset quantity; and the list construction submodule is used for constructing a similar commodity object list, and the list contains the target commodity objects.

The computer device comprises a central processing unit and a memory, wherein the central processing unit is used for calling and running a computer program stored in the memory to execute the steps of the commodity image retrieval method.

A computer-readable storage medium, which stores a computer program implemented according to the method for retrieving an image of a commercial product in the form of computer-readable instructions, executes the steps included in the method when the computer program is called by a computer.

A computer program product, provided to adapt to another object of the present application, comprises computer programs/instructions which, when executed by a processor, implement the steps of the method described in any of the embodiments of the present application.

Compared with the prior art, the application has the following advantages:

firstly, the method decomposes the commodity image to be detected into a plurality of scales to extract image characteristic information so as to extract deep semantic information of the commodity image on different granularities, not only considers global information in the commodity image, but also considers local information in the commodity image, then further executes cavity convolution with different expansion rates on the image characteristic information of each scale so as to realize lossless characteristic expansion in different receptive field ranges and obtain expansion characteristic information, and finally fuses the expansion characteristic information corresponding to each scale into comprehensive characteristic information to finish the expression learning process of the commodity image. The comprehensive characteristic information obtained by the process comprehensively extracts deep semantic information in the commodity image to be detected, and forms the basis of commodity retrieval.

Further, the comprehensive feature information of the commodity image to be inspected is compared with the comprehensive feature information of the commodity object in the preset commodity feature library in a similar manner, and the comprehensive feature information of the latter is extracted from the commodity image of the corresponding commodity object by the same means.

In addition, this application has realized a basic service, utilizes the basic service that this application provided, can satisfy application demands such as searching for the picture of shooing, goods source matching, advertisement recommendation, according to waiting to examine the commodity image that these corresponding scenes of application demand provided, alright obtain corresponding similar commodity object list for it, only need with this list send to customer end equipment, alright satisfy corresponding demand to improve the user experience of electricity merchant platform.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a functional block diagram of a neural network architecture for implementing the present application;

FIG. 2 is a flowchart illustrating an exemplary embodiment of a merchandise image retrieval method according to the present application;

FIG. 3 is a schematic flow chart illustrating a process of extracting image feature information corresponding to a plurality of scales on the basis of an image of a commodity to be inspected in the embodiment of the present application;

FIG. 4 is a flowchart illustrating a process of converting image feature information into expansion feature information according to an embodiment of the present disclosure;

FIG. 5 is a functional block diagram of the internal structure of a feature processing network in the neural network architecture of the present application;

fig. 6 is a schematic flowchart of a process of matching a similar commodity object list according to a comprehensive feature vector in an embodiment of the present application;

fig. 7 is a functional block diagram of the product image search device according to the present application;

fig. 8 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As will be appreciated by those skilled in the art, "client," "terminal," and "terminal device" as used herein include both devices that are wireless signal receivers, which are devices having only wireless signal receivers without transmit capability, and devices that are receive and transmit hardware, which have receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The commodity image retrieval method can be programmed into a computer program product, is deployed in a service cluster to operate and is realized, so that the method can be executed by accessing an open interface after the computer program product operates and performing man-machine interaction with the computer program product through a graphical user interface.

The first application scenario exemplarily illustrated in the present application is a photo-taking and picture-searching application scenario in the e-commerce field. In the scene, a consumer user of the terminal equipment uploads a commodity picture in an access page of the e-commerce platform to search for similar commodity objects, a server of the e-commerce platform takes the commodity picture uploaded by the consumer user as a commodity image to be detected, the technical scheme realized by the application is called to process the commodity picture, a similar commodity object list is obtained and fed back to the consumer user to be displayed, and the consumer user can further access a target commodity object in the similar commodity object list so as to know the target commodity object meeting the self requirement.

The second application scenario is a source matching application scenario in the e-commerce domain. In the scene, a merchant user uploads a commodity picture needing to find a goods source on a terminal device of the merchant user, a server of the e-commerce platform takes the commodity picture uploaded by the merchant user as a commodity image to be detected, the technical scheme realized by the application is called for processing, a similar commodity object list is obtained and fed back to the merchant user for display, and the merchant user can further access a target commodity object in the similar commodity object list and know the price, the purchase channel and other information of the target commodity object.

The third application scenario is an advertisement recommendation application scenario in the e-commerce domain. In the scene, after a terminal device of any user accesses a certain commodity object, an access record is correspondingly generated in a server of the E-commerce platform, the server acquires a commodity picture of the commodity object as a commodity image to be detected according to the access record, the technical scheme realized by the application is called for processing, a similar commodity object list is acquired and is pushed to the corresponding terminal device for advertisement display, and therefore the corresponding user is attracted to further access the target commodity object.

Therefore, the technical scheme of the application can be applied to various application scenes for searching the images by the images, and has wide universality.

In an exemplary application of the technical solution of the present application, as shown in fig. 1, the representation learning of the commodity image is realized by means of a neural network architecture. The network architecture comprises a content positioning sub-network, a residual convolution sub-network, a feature processing sub-network, a feature fusion sub-network and a similar matching sub-network, wherein the residual convolution sub-network extracts image feature information of a commodity image to be detected in different scales through a plurality of residual convolution networks; the feature processing sub-network correspondingly performs expanded Convolution (also called cavity Convolution and expansion Convolution) on the image feature information of each scale to obtain expanded feature information corresponding to each scale; the feature fusion sub-network is responsible for fusing the expansion feature information of all scales into comprehensive feature information expressed by high-dimensional vectors; and finally, searching a target commodity object similar to the comprehensive characteristic information in a preset commodity characteristic library by the similar matching sub-network for constructing a similar commodity object list. The internal structure of each specific sub-network in the network architecture and the functions implemented by the sub-networks will be further described below, and the table is temporarily pressed.

It can be seen that the functions implemented by the residual convolutional subnetworks, the feature processing subnetworks and the feature fusion subnetworks are processes of representation learning of the commodity image, and therefore, the functions can be realized as the same convolutional neural network model. The content localization sub-network may also be a relatively independent convolutional neural network model. Both convolutional neural network models can be trained to a convergent state in advance and then put into the technical scheme of the application for use. Wherein the content positioning sub-network can be realized by adopting a neural network model such as a Yolo series, an R-CNN series and an SSD series. The convolutional neural network model responsible for representing learning may also be implemented using the Resnet series of neural network models. As for the process of training such neural network model to the convergence state in advance, it should be implemented by those skilled in the art by combining with a classifier such as Softmax and applying a conventional training concept, so detailed description is not allowed.

Referring to fig. 2, in an exemplary embodiment of the method for retrieving a commodity image according to the present application, the method includes the following steps:

step S1100, extracting image characteristic information of a plurality of scales of the content image of the to-be-detected commodity image:

as described above, in different application scenarios, the user may provide a commodity picture for performing similar commodity object retrieval as the commodity image to be checked. The commodity picture can be shot by a user in a real scene mode, and can also be a picture which is preprocessed to highlight the content in the commodity picture. The picture of the merchandise generally includes contents, and the contents generally are a certain merchandise, and the contents are displayed in the picture of the merchandise with a certain angle appearance, such as the shape of a jacket, a mobile phone, a bathtub and the like under a certain viewing angle.

Can through manual cutout preliminary treatment, perhaps through the automatic program, will the commodity image of examining in the content image extract, adjust to standard size, then be used for extracting the image characteristic information wherein. For example, as shown in fig. 1, a content positioning subnet may be used to perform automatic matting on the to-be-detected product image, so as to remove background noise therein as much as possible, and obtain a content image with a standardized specification.

In order to obtain the deep semantic information of the content image more comprehensively, as shown in fig. 1, the content image is fed into a residual convolution subnet to extract image feature information corresponding to different scales, wherein the larger the scale is, the more abundant the global information part belonging to the content image is in the obtained image feature information, and the less the local information is relatively; the smaller the scale is, the more abundant the local information part belonging to the content image is in the obtained image feature information, and relatively speaking, the more sparse the global information is. The setting of the total number of different scales can be flexibly implemented by the technical personnel in the field, so as to avoid the feature transition generalization and the deficiency. In the present application, it is proposed to set the total number of dimensions between 2 and 8, measured preferably 3, 4, 5.

The image feature information is a result of learning the representation of the content image on a corresponding scale, and represents deep semantic information of the content image. It is understood that each image feature information output by the residual convolution sub-network is represented as a three-dimensional matrix, and comprises a plurality of channels.

Step S1200, performing dilation convolution processing on at least part of the image feature information corresponding to the scales, and converting each image feature information into dilation feature information of the corresponding scale:

after the image characteristic information corresponding to different scales is output, the image characteristic information is respectively transmitted to the characteristic processing sub-network for characteristic processing, and the characteristic processing sub-network is realized to be responsible for executing expansion convolution with different expansion rates aiming at the image characteristic information corresponding to each scale, so that the receptive field is expanded under the condition of avoiding characteristic loss, and the synthesis of the image characteristic information of the corresponding scale is realized. In this embodiment, different expansion rates used for performing the expanding convolution are recommended, and the corresponding expanding convolution operations are performed by using the expansion rates 1, 3 and 5.

After the image characteristic information of each scale is subjected to expansion convolution processing, expansion characteristic information of the corresponding scale can be obtained, so that a plurality of expansion characteristic information can be obtained, and each expansion characteristic information realizes the characteristic synthesis of the corresponding image characteristic information, thereby realizing the conversion from the image characteristic information to the corresponding expansion characteristic information.

The task of executing the expansion convolution is realized by the feature processing sub-network, in the feature processing sub-network, the same business logic can be adopted for the conversion processing of the image feature information of different scales, and finally, the expansion feature information can be unified into a form expressed by a high-dimensional vector for the subsequent feature fusion.

Step 1300, fusing all the expansion characteristic information into comprehensive characteristic information:

in order to complete the whole process of representing and learning the commodity image to be detected, feature fusion can be performed on all the expansion feature information corresponding to the commodity image to be detected. The splicing sequence of the expansion feature information may be preset by the feature fusion subnet illustrated in fig. 1.

In this application, the expression form of comprehensive characteristic information, namely splice all expansion characteristic information sequences of the same commodity image into a high-dimensional vector, is not only suitable for the comprehensive characteristic information of the commodity image to be examined, but also suitable for the comprehensive characteristic information of the commodity picture of the commodity object in the preset commodity feature library required by the present application. That is, the neural network architecture of the present application example is not only suitable for the process of the method, but also suitable for the process of pre-constructing the commodity feature library, and performs representation learning on the commodity picture of the corresponding commodity object to obtain the corresponding comprehensive feature information.

Step S1400, retrieving a target commodity object with characteristics similar to the comprehensive characteristic information from a preset commodity characteristic library, wherein the commodity characteristic library stores the comprehensive characteristic information of a plurality of commodity objects:

as described above, the present application constructs a product feature library in advance, where a large amount of comprehensive feature information corresponding to product objects is stored in the product feature library, and the comprehensive feature information is extracted from product pictures of corresponding product objects, and specifically, a residual convolution sub-network of a neural network architecture shown in fig. 1 may be used for representation learning, and if necessary, the content image may also be located in advance by combining with the content locating sub-network.

In addition to the comprehensive characteristic information of the commodity object, other commodity information corresponding to the commodity object can be stored in association with the corresponding commodity object in the commodity characteristic library, including but not limited to a commodity title, commodity details, commodity attributes, commodity pictures and the like.

In a simplified embodiment, the commodity feature library may only include unique feature information of the corresponding commodity information in the commodity database of the e-commerce platform pointed by each commodity object and corresponding comprehensive feature information, so that when the commodity object is determined to be similar to the comprehensive feature information of the image of the commodity to be detected, the commodity information of the corresponding commodity object may be called from the commodity database of the e-commerce platform through the unique feature information. The unique feature information may be directly used by SPU (Standard Product Unit) or SPU (stock Keeping Unit) in commercial products.

The commodity feature library can be constructed in a way of adapting to different application scenes, and mainly establishes a mapping relation from the unique feature information of the commodity object in the commodity feature library to the commodity objects corresponding to the different application scenes. For example, in an application scenario of photo-searching for similar goods, the unique characteristic information may be mapped to an on-sale goods object in an on-sale goods database of a merchant instance of the e-commerce platform. As another example, in an application scenario of source matching, the unique characteristic information may be mapped to an on-sale commodity object in an on-sale commodity database of an e-commerce platform providing source procurement services. For another example, in an application scenario of advertisement recommendation, the unique feature information may be mapped to an on-sale commodity object of a commodity recommendation library of the e-commerce platform. And so on, may be flexibly implemented by those skilled in the art.

So far, the similarity calculation can be performed on the comprehensive feature information of the to-be-detected commodity image and the comprehensive feature information of each commodity object in the commodity feature library by using a similarity matching subnet as shown in fig. 1, and when the similarity calculation is performed on the comprehensive feature information, any similarity calculation method such as an euclidean distance algorithm, a manhattan distance algorithm, a minkoff distance algorithm, a cosine similarity calculation method, a jaccard similarity calculation method, a pearson correlation coefficient algorithm and the like can be adopted for calculation, so that a person skilled in the art can flexibly select the similarity calculation. On the basis of obtaining the similarity information of the commodity image to be detected and each commodity object in the commodity feature library, according to the similarity, determining the commodity object exceeding a preset threshold value as a target commodity object corresponding to the commodity image to be detected, and constructing all the target commodity objects as a similar commodity object list to be pushed to corresponding terminal equipment.

Through this exemplary embodiment, it can be easily seen that this application can gain very rich advantages, for example:

Referring to fig. 3, in a further embodiment, the step S1100 of extracting the image feature information of multiple scales of the specification image of the to-be-detected commodity image includes the following steps:

step S1110, acquiring an image of the target commodity to be retrieved:

in any of the aforementioned exemplary application scenarios, for the graph search basic service to be provided by the present application, the server of the e-commerce platform may obtain an input of a commodity picture, and then use the commodity picture as the to-be-detected commodity image of the present application.

Step S1120, scaling the to-be-detected product image to a predetermined size to obtain a corresponding specification image:

in order to make the content positioning sub-network shown in fig. 1 obtain standard input, the commodity image to be detected can be scaled to a predetermined size according to the input specification requirement of the content positioning sub-network, and a corresponding specification image is obtained.

Step S1130, content positioning is carried out on the specification images, and the content images are cut out according to the positioning:

referring back to fig. 1, for example, a content positioning subnet implemented by using a Yolo-5 model performs content positioning on the specification image to determine a candidate frame corresponding to the content therein, and then cuts out a content image corresponding to the candidate frame. Generally, the content images are unified into an output of standardized size.

Step S1140, sequentially extracting image feature information of the content image at different scales through a plurality of preset residual convolution networks:

as shown in fig. 1, a plurality of residual convolution networks may be preset, and each residual convolution network is responsible for performing image feature extraction on the content image corresponding to one scale, so as to obtain corresponding image feature information. And outputting the image characteristic information extracted by each residual convolution network to a residual convolution network corresponding to the next smaller scale for image characteristic extraction of the smaller scale so as to obtain the image characteristic information of the smaller scale until the last residual convolution network. Since four residual convolution networks are illustrated in fig. 1, image feature information corresponding to four scales can be obtained.

The image characteristic information extracted by the four residual convolution networks is different in size, the characteristics of the characteristic information captured by each residual convolution network are different, the larger the size is, the more abundant the global information in the image characteristic information is, and the less the local information is; conversely, the smaller the scale, the richer the local information in the image feature information, and the richer the global information. Therefore, the deep semantic information with different granularities corresponding to the commodity image to be detected is obtained through the image characteristic information with different scales, and the deep semantic information of the commodity image to be detected can be comprehensively captured due to the existence of a plurality of image characteristic information.

In the embodiment, the content image is obtained by positioning the content of the specification image of the commodity image to be detected provided by the user, the deep semantic information of the content image is comprehensively captured from a plurality of different scales on the basis of the content image, a plurality of image characteristic information is correspondingly obtained, the fine extraction of the deep semantic information of the content image is realized, and the basis for well representing the learning effect is laid.

Referring to fig. 4, in a deepened embodiment, in step S1200, the expanding convolution processing is respectively performed on the image feature information corresponding to at least a part of scales, and each image feature information is converted into the image feature information of a corresponding scale, where the following steps are performed on the image feature information of each scale:

step S1210, performing expansion convolution processing with different expansion rates on the image feature information of the current scale by adopting the convolution layer, and correspondingly obtaining a plurality of intermediate feature information:

in this embodiment, in order to facilitate the implementation of subsequent feature fusion, the expansion convolution processing needs to be performed on the image feature information corresponding to each scale, so as to expand the receptive field and complete the convolution on the basis of keeping the feature information as small as possible from being lost. When performing the dilation convolution, an operation is independently performed for image feature information of each scale. Since the image feature information of the largest scale has captured the global information, it is not necessary to perform the dilation convolution on it, and the corresponding dilation convolution can be performed on the image feature information other than that.

When performing the dilation convolution, a corresponding one of the feature processing networks in the feature processing sub-network shown in fig. 1 is responsible for performing the feature processing operation. In this embodiment, the internal network structure of the feature processing network is shown in fig. 5. As can be seen from the structure of the feature processing network, for the current image feature information input into the feature processing network, the dilation convolution processing with different dilation rates is performed on the current image feature information, and the different dilation rates are set to 1, 3 and 5 in this embodiment, so that a plurality of intermediate feature information corresponding to different dilation rates are obtained accordingly.

Step S1220, concatenating the plurality of intermediate feature information into expanded feature information:

because the characteristics of the features of convolution outputs with different expansion rates are different, the features can be further fused by performing convolution operation after splicing the intermediate feature information, and therefore, the intermediate feature information is simply spliced into the same expansion feature information.

Step S1230, performing feature fusion on the expansion feature information by using a convolution layer:

on the basis of obtaining the expansion feature information, a convolution layer is further adopted to perform feature fusion on the expansion feature information, for example, if the expansion feature information is a feature map of 30 × 24, the feature map is output after convolution or is 30 × 24, wherein the value 24 is the number of channels, and the output 24 channels are obtained after convolution through different weights of the previous 24 channels, so that feature interaction of the expansion feature information is realized, and the feature fusion is completed.

Step S1240, normalization processing is carried out on the expansion characteristic information after the characteristic fusion by adopting a batch normalization layer:

in order to improve the gradient, a Batch Normalization layer (BN) may be further used to normalize the expanded feature information after feature fusion.

Step S1250, activating and outputting the normalized expansion feature information through the activation layer:

and then, activating the normalized expansion characteristic information by using an activation layer so as to obtain activated expansion characteristic information.

Step S1260, down-sampling the activated and output expansion feature information through global pooling to obtain expansion feature information of corresponding scales, wherein the expansion feature information is a high-dimensional vector:

and finally, performing global pooling on the expansion characteristic information which is activated and output by adopting a pooling layer, realizing down-sampling of the expansion characteristic information, obtaining the expansion characteristic information corresponding to the current scale, and converting the expansion characteristic information into a high-dimensional vector after the global pooling.

It is easy to understand that after each image feature information is subjected to feature processing by a corresponding feature processing network, expansion feature information represented by a high-dimensional vector is obtained, the expansion feature information of multiple scales is subsequently subjected to multi-scale feature fusion as described above to obtain a comprehensive feature vector corresponding to the commodity image to be detected, and the comprehensive feature vector integrates the fine features with different scale changes, so that a good representation learning effect is achieved.

According to the method, the specific feature processing network is adopted, on the basis of feature expansion of the image feature information of each scale, factors such as feature interaction, gradient improvement, dimension conversion and the like are considered, the expansion feature information expressed by high-dimensional vectors is finally obtained, the comprehensive feature vectors are conveniently used for splicing, the learning effect can be optimized, and an important foundation is laid for matching similar commodity objects.

Referring to fig. 6, in a further embodiment, the step S1400 of retrieving the target product object similar to the composite characteristic information from the preset product characteristic library includes the following steps:

step S1410, calculating the similarity between the comprehensive characteristic information of the commodity image to be detected and the comprehensive characteristic information of each commodity object in a preset commodity characteristic library:

in this embodiment, it is recommended to calculate the similarity between the comprehensive feature information of the to-be-detected commodity image and the comprehensive feature information of each commodity object in the commodity feature library by using a cosine similarity algorithm, and normalize the similarity to a specific numerical space, so as to facilitate sorting and comparison. The cosine similarity algorithm has the advantage of quick operation, and can better measure the similarity between different comprehensive characteristic information.

After the similarity calculation, the similarity of the commodity image to be detected corresponding to each commodity object in the commodity feature library is determined, and a similarity sequence is formed.

Step S1420, reversely sorting the commodity objects in the commodity feature library according to the similarity to obtain a reverse sequence list:

for the similarity sequence, the similarity sequence can be reversely sorted according to the numerical value of the similarity, so that an inverse sequence table with the similarity arranged from large to small is obtained.

Step S1430, determining a plurality of corresponding target commodity objects from the reverse sequence table according to the preset quantity:

the number of commodity objects in the commodity feature library is generally large, and therefore, the commodity objects need to be screened. And during screening, a Top _ K algorithm can be adopted, wherein K is a preset number which can be freely preset by a person skilled in the art, and the corresponding K commodity objects with the maximum similarity are selected from the reverse sequence table in the algorithm to serve as target commodity objects.

Step S1440, constructing a similar merchandise object list, where the list includes the plurality of target merchandise objects:

in this step, a corresponding similar commodity object list is constructed for the corresponding application scenario, which can meet the needs of the specific application scenario, and generally, the key information such as the unique feature information, the commodity picture, the commodity title, the commodity price, and the like of each commodity object is also included in the similar commodity object list as the data of the relevant field of the commodity object therein.

On the basis, the similar commodity object list can be pushed to the terminal equipment providing the commodity image to be detected to be displayed according to the business logic corresponding to the application scene so as to complete the response corresponding to the application scene, for example, in the application scene of searching the image by the image, the similar commodity object list is displayed in the terminal equipment as a commodity searching result list; in the application scene of goods source matching, the similar goods object list can be displayed in the form of a goods source list in the terminal equipment; in the advertisement recommendation application scenario, the similar commodity object list may be displayed in a terminal device in a rolling advertisement mode. And the like, as will be appreciated by those skilled in the art.

In the embodiment, the cosine similarity algorithm is adopted to solve the similarity between the comprehensive characteristic information of the to-be-detected commodity object and the comprehensive characteristic information of each commodity object in the commodity characteristic library, and the comprehensive characteristic information of the commodity object is pre-extracted in a mode in the neural network architecture of the application, so that the effective similarity calculation between the two comprehensive characteristic information can be realized, the similarity matching is realized, then the target commodity object with larger similarity is selected according to the similarity, and the similar commodity object list is constructed as a response result of the terminal equipment, so that the application is realized by the service capability of image searching, the standardized image searching service by images is conveniently provided in different application scenes, and the target result is quickly and efficiently matched for the user.

Referring to fig. 7, a commodity image retrieval apparatus adapted to one of the objectives of the present application includes: the system comprises a feature extraction module 1100, an expansion conversion module 1200, a feature synthesis module 1300 and a retrieval matching module 1400, wherein the feature extraction module 1100 is used for extracting image feature information of a plurality of scales of a content image of an image of a commodity to be detected; the expansion conversion module 1200 is configured to perform expansion convolution processing on at least part of the image feature information corresponding to the scale, and convert each image feature information into expansion feature information of a corresponding scale; the feature integration module 1300 is configured to fuse all the expanded feature information into integrated feature information; the retrieval matching module 1400 is configured to retrieve a target commodity object with characteristics similar to the composite characteristic information from a preset commodity characteristic library, where the commodity characteristic library stores the composite characteristic information of a plurality of commodity objects.

In a further embodiment, the feature extraction module 1100 includes: the image acquisition submodule is used for acquiring an image of the to-be-detected commodity of the target commodity object to be retrieved; the scaling processing submodule is used for scaling the to-be-detected commodity image to a preset size to obtain a corresponding specification image; the content positioning submodule is used for positioning the content of the specification image and cutting out the content image according to the positioning; and the residual convolution submodule is used for sequentially extracting the image characteristic information of the content image at different scales through a plurality of preset residual convolution networks.

In a further embodiment, the dilation conversion module 1200 includes the following structure performed for each scale of image feature information: the expansion convolution submodule is used for executing expansion convolution processing with different expansion rates on the image characteristic information of the current scale by adopting the convolution layer and correspondingly obtaining a plurality of intermediate characteristic information; the scale splicing submodule is used for splicing the plurality of intermediate characteristic information into expansion characteristic information; the characteristic fusion submodule is used for carrying out characteristic fusion on the expansion characteristic information by adopting a convolution layer; the batch normalization submodule is used for normalizing the expansion characteristic information after the characteristics are fused by adopting a batch normalization layer; the activation output submodule is used for activating and outputting the normalized expansion characteristic information through an activation layer; and the pooling sampling submodule is used for performing down-sampling on the expansion characteristic information which is activated and output through global pooling to obtain expansion characteristic information of a corresponding scale, wherein the expansion characteristic information is a high-dimensional vector.

In a further embodiment, the retrieving matching module 1400 includes: the similarity calculation submodule is used for calculating the similarity between the comprehensive characteristic information of the commodity image to be detected and the comprehensive characteristic information of each commodity object in a preset commodity characteristic library; the reverse sequencing submodule is used for performing reverse sequencing on the commodity objects in the commodity feature library according to the similarity to obtain a reverse sequence table; the target optimization submodule is used for determining a plurality of corresponding target commodity objects from the reverse sequence table according to the preset quantity; and the list construction submodule is used for constructing a similar commodity object list, and the list contains the target commodity objects.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. As shown in fig. 8, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a commodity image retrieval method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may store computer readable instructions, and when the computer readable instructions are executed by the processor, the processor may execute the commodity image retrieval method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 7, and the memory stores program codes and various data required for executing the modules or the sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in the present embodiment stores program codes and data necessary for executing all modules and submodules in the product image search device of the present application, and the server can call the program codes and data of the server to execute the functions of all the submodules.

The present application further provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for retrieving an image of an article of merchandise according to any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In conclusion, the method and the device can realize effective representation learning of the commodity pictures, and realize accurate picture searching service based on the comprehensive characteristic information obtained by the representation learning, so that an efficient similar commodity matching effect can be realized in the E-commerce field.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A commodity image retrieval method is characterized by comprising the following steps:

2. The commodity image retrieval method according to claim 1, wherein extracting image feature information of a plurality of scales of a specification image of a commodity image to be inspected comprises the steps of:

3. The commodity image retrieval method according to claim 1, wherein the expansion convolution processing is respectively performed on the image feature information corresponding to at least a part of the scales, and each image feature information is converted into the image feature information of the corresponding scale, and the method includes the following steps performed for the image feature information of each scale:

4. The commodity image retrieval method according to claim 1, wherein the integrated feature information is a high-dimensional vector formed by sequentially stitching all the expanded feature information of the same commodity image.

5. The commodity image retrieval method according to claim 1, wherein retrieving a target commodity object having a characteristic similar to the integrated characteristic information composition characteristic from a preset commodity characteristic library, comprises:

6. The commodity image retrieval method according to any one of claims 1 to 5, wherein a preset neural network model is adopted for executing the corresponding steps of the method to extract the corresponding comprehensive feature information of the commodity image, and the neural network model is trained to a convergence state in advance.

7. A commodity image retrieval method is characterized by comprising the following steps:

the characteristic extraction module is used for extracting image characteristic information of a plurality of scales of the content image of the commodity image to be detected;

the expansion conversion module is used for respectively executing expansion convolution processing on the image characteristic information corresponding to at least part of scales and converting each image characteristic information into expansion characteristic information of the corresponding scale;

the characteristic integration module is used for fusing all the expansion characteristic information into integrated characteristic information;

and the retrieval matching module is used for retrieving a target commodity object with characteristics similar to the comprehensive characteristic information from a preset commodity characteristic library, and the commodity characteristic library stores the comprehensive characteristic information of a plurality of commodity objects.

8. A computer device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 6, which, when invoked by a computer, performs the steps comprised by the corresponding method.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 6.