CN115618900A

CN115618900A - Method and device for recognizing picture and training neural network

Info

Publication number: CN115618900A
Application number: CN202211081553.5A
Authority: CN
Inventors: 暨凯祥; 刘家佳; 王剑; 陈景东; 黄莹; 黄星; 廖群伟
Original assignee: Alipay Hangzhou Information Technology Co Ltd; Ant Blockchain Technology Shanghai Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd; Ant Blockchain Technology Shanghai Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2023-01-17
Also published as: CN113963352B; CN113963352A

Abstract

The present disclosure discloses a method and apparatus for recognizing pictures and training neural networks. The method comprises the following steps: receiving a code scanning picture, wherein the code scanning picture comprises activity codes corresponding to marketing activities of commodities; performing feature extraction on the code scanning picture by using an image coding module to obtain a first image feature; processing the first image characteristic by using an attention module to obtain the weight of the first image characteristic; weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics; and identifying the code scanning picture according to the second image characteristics to obtain an identification result, wherein the identification result is used for determining whether the code scanning picture is a picture obtained after scanning the movable code on the entity of the commodity.

Description

Method and device for recognizing picture and training neural network

The present disclosure is a divisional application of the application entitled "method and apparatus for recognizing pictures and training neural networks" with application number 202111792.6, application date 2021, 9 and 22.

Technical Field

The present disclosure relates to the field of image recognition, and in particular, to a method and an apparatus for recognizing a picture and training a neural network.

Background

In the marketing activities of physical commodities, a brand party usually prints an activity code for a user to scan and exchange prizes after purchasing on the physical commodity. However, with the rise of industries that profit by illegal means in networks (hereinafter, referred to as network illegal industries for short), the activity codes are often stolen and sold by the network illegal industries. Once the activity code is redeemed by an illegal team in the network illegal industry (fraud for short), huge losses are brought to the brand side.

Based on this, a scheme capable of accurately identifying fraud is needed to effectively prevent illegal industry of network from using the activity code for cashing.

Disclosure of Invention

In view of this, the present disclosure provides a method and an apparatus for recognizing a picture and training a neural network, so as to reduce the possibility that a campaign code corresponding to a marketing campaign of a commodity is utilized by an illegal industry of the network.

In a first aspect, a method for identifying a picture is provided, the method comprising: receiving a code scanning picture, wherein the code scanning picture comprises activity codes corresponding to marketing activities of commodities; performing feature extraction on the code scanning picture by using an image coding module to obtain a first image feature; processing the first image characteristic by using an attention module to obtain the weight of the first image characteristic; weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics; and identifying the code scanning image according to the second image characteristics to obtain an identification result, wherein the identification result is used for determining whether the code scanning image is an image obtained by scanning the movable code on the entity of the commodity.

In a second aspect, there is provided a method of training a neural network, the neural network comprising an image encoding module and an attention module, the method comprising: receiving a code scanning picture, wherein the code scanning picture comprises activity codes corresponding to marketing activities of commodities; performing feature extraction on the code scanning picture by using the image coding module to obtain a first image feature; processing the first image characteristic by using the attention module to obtain the weight of the first image characteristic; weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics; identifying the code scanning image according to the second image characteristics to obtain an identification result, wherein the identification result is used for determining whether the code scanning image is an image obtained by scanning the movable code on the entity of the commodity; determining the training loss of the neural network according to the recognition result; training the neural network according to the training loss.

In a third aspect, an apparatus for recognizing a picture is provided, the apparatus comprising: the image coding module is used for receiving a code scanning picture, and the code scanning picture comprises activity codes corresponding to marketing activities of commodities; performing feature extraction on the code scanning picture to obtain a first image feature; the attention module is used for processing the first image characteristic to obtain the weight of the first image characteristic; weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics; the image coding module is further configured to identify the code scanning picture according to the second image characteristic to obtain an identification result, where the identification result is used to determine whether the code scanning picture is a picture obtained by scanning the active code on the entity of the commodity.

In a fourth aspect, an apparatus for training a neural network is provided, including: the image coding module is used for receiving a code scanning picture, and the code scanning picture comprises activity codes corresponding to marketing activities of commodities; extracting the features of the code scanning image to obtain first image features; the attention module is used for processing the first image characteristic to obtain the weight of the first image characteristic; weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics; the image coding module is further configured to identify the code scanning image according to the second image characteristic to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning the active code on the entity of the commodity; the training module is used for determining the training loss of the neural network according to the recognition result; training the neural network according to the training loss.

In a fifth aspect, there is provided an apparatus comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of the first or second aspect.

A sixth aspect provides a computer readable storage medium having stored thereon executable code which, when executed, is capable of implementing a method as described in the first or second aspect.

In a seventh aspect, there is provided a computer program product comprising executable code which, when executed, is capable of implementing a method as described in the first or second aspect.

The code scanning image is identified to determine whether the code scanning image is an image obtained by scanning the movable code on the entity of the commodity. The image recognition mode adopts an attention mechanism to carry out image recognition, has higher reliability, and can effectively reduce the possibility of utilizing the activity code in the network illegal industry.

Drawings

Fig. 1 is a schematic flow chart of a method for identifying a picture according to an embodiment of the present disclosure.

FIG. 2 is a diagram illustrating an example of a neural network that may be employed by the method of FIG. 1.

Fig. 3 is an exemplary diagram of one possible implementation of step S130 in fig. 1.

Fig. 4 is an exemplary diagram of another possible implementation of step S130 in fig. 1.

Fig. 5 is a schematic flow chart diagram of a method for training a neural network provided by an embodiment of the present disclosure.

Fig. 6 is a schematic flow chart diagram of a method for training a neural network according to another embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of an apparatus for recognizing a picture according to an embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of an apparatus for training a neural network according to an embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of an apparatus in hardware according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments.

For ease of understanding, a brief description of some concepts involved in the embodiments of the present disclosure will be given.

The commodity mentioned in the embodiment of the present disclosure refers to all commodities having entities, and the commodity is a regular commodity of a brand side. For example, the commodity may be drinks, daily necessities, foods, toys, etc.

A marketing campaign refers to a campaign that a brand party develops in order to facilitate the sale of goods. The marketing campaign may be any promotional campaign that includes benefits, such as, for example, a gift purchase, a red envelope purchase, or various coupons purchase, or a multiple purchase of multiple items that are tallied for redemption as a prize. Typically, the branding party will make the marketing campaign for the good into a campaign code and print the campaign code on the good. The benefits contained in the marketing campaign may be obtained by scanning the campaign code after the consumer purchases the goods.

The category of the activity code is not specifically limited in the embodiment of the present disclosure, as long as the activity code can be recognized and interpreted as a marketing activity corresponding to the activity code. For example, the active code may be a two-dimensional code or a barcode.

The illegal network industry can also be called an industry which utilizes a non-formal means to profit in the network. The network illegal industry refers to an illegal behavior which takes the internet as a medium and a network technology as a main means and brings potential threats to the safety and the social and economic stability of a computer information system. Such as network illegal activities like cashing, rebate, etc.

With the rise of the network illegal industry, activity codes for marketing are often stolen and sold by the network illegal industry for cashing or cashing. The subject operator of the network illegal industry may collect the activity code on the entity of the commodity discarded by the consumer from a waste recycle bin, a plastic crushing plant, a crowd gathering place, a cleaning work place, etc., or may steal the activity code in the links of production, transmission and processing of the activity code, and sell the activity code through an online platform. Once the activity code is redeemed by an illegal team in the network illegal industry (fraud for short), huge losses are brought to the brand side.

In order to prevent the loss caused by using the mobile code prize in the network illegal industry, a feasible mode in the prior art is to acquire the IP, the mobile phone number or the device of the user who scans the code prize, and judge whether the user is an illegal team or not by the existing blacklist. And limiting the prize redemption of the user if the judgment result determines that the user is the illegal team.

However, this method causes the following problems: 1. the list fails due to scene misalignment. For example, the existing blacklist is a telecommunications blacklist or credit blacklist, and users in a telecommunications fraud or credit blacklist may also buy certain goods, which belong to real consumers, and using such blacklists for marketing anti-fraud scenarios may lead to mismatch problems. 2. The evolution of the form of network illegitimate industry attacks causes the shortlist to fail. For example, the IP of the past network illegal industry attack is a proxy IP, the account number is a small number, the verification code is a code nu, and the equipment is a working room for installing a farm and a manipulator. However, with the evolution of the attack of the network illegal industry, the IP of the attack of the network illegal industry at present is dialing in seconds, the account number is a running account number, the verification code is technical solution, and the equipment is a cloud mobile phone and a disc player. This series of evolution has led to the existing illegal industry of networks with true and varied IP, cell phone numbers or devices, which have been defeated by the use of blacklist-like methods.

Another feasible method in the prior art is to acquire the number of code scanning times of a user who scans code cashing, and judge whether the number of code scanning times of the user exceeds the limit according to a preset code scanning rule, so as to prevent a large number of illegal teams from cashing in the network illegal industry.

However, such methods cause the following problems: 1. normal consumers can be accidentally injured. For example, a beverage's zip code is printed on two types of packages, one carton and one beverage bottle, and one carton contains 24 bottles of beverage. The store owner of the convenience store usually sells the beverage according to bottles, so the movable codes on the carton are all the store owner himself scanning the codes to draw benefits. If each person who is regulated by the limit rule sweeps for 3 times, the store owner is accidentally injured, and the sales enthusiasm of the store owner is influenced. 2. The illegal network industry is easy to bypass the preset code scanning rule. Because a large number of account numbers and IP pools are formed in a group in the network illegal industry, the countermeasure cost of the network illegal industry is only increased aiming at the limitation of the number of code scanning times of the account numbers and the IP, and the behavior that the network illegal industry illegally obtains marketing funds is not prevented. For example, the illegal industry of the network swaps 10 active accounts for one account per scan to cross the rules.

Based on this, a scheme capable of accurately identifying fraud so as to prevent illegal industry of network from using the activity code for cashing prize is needed.

In order to solve the above problem, the following describes an embodiment of the present disclosure in detail with reference to fig. 1 and 2. Fig. 1 illustrates a method for recognizing a picture according to an embodiment of the present disclosure. The method may be implemented based on a neural network 20 as shown in fig. 2.

Referring to fig. 1, in step S110, a code scan picture is received. The code scanning picture comprises activity codes (such as two-dimension codes) corresponding to marketing activities of the commodities. The code scanning picture can be a picture obtained by scanning the activity code on the entity of the commodity. For example, the code scanning picture is obtained by scanning a code of a movable code printed on the commodity or a commodity packaging box after the consumer purchases the commodity. The code-scanning picture can also be a forged picture of the code-scanning picture. For example, the code scanning picture is a forged picture generated by a network illegal industry based on an electronic code.

In step S120, feature extraction is performed on the code-scanning image to obtain a first image feature. As shown in fig. 2, the scan-coded picture can be input to an image encoding module 21 (or image encoder) of the neural network 20. The image encoding module 21 may employ a general image recognition or image classification network. The image encoding module may be, for example, a convolution-based network such as ResNet, VGG, denseNet, or the like. The image encoding module may be configured to extract a first image feature from the code-scanned picture. The first image feature may also be referred to as a high-order semantic feature of the code-scanned picture. The image encoding module 21 may include a feature extraction module 211. The first image feature 211 of the scanogram can be extracted by the feature extraction module 211.

In steps S130-S140, processing the first image feature by using an attention mechanism to obtain the weight of the first image feature; and weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics.

The attention mechanism mentioned in step S130 may be implemented using the attention module 22 in fig. 2. The attention module 22 is capable of being connected to the image coding module 21 in a pluggable manner. Fig. 2 shows a possible arrangement of the attention module 22. The attention module 22 may also be arranged in other ways according to actual needs. Taking the image encoding module 21 as a convolutional neural network as an example, the attention module 22 may be disposed between any two adjacent layers of the convolutional neural network.

With continued reference to fig. 2, in some embodiments, the attention module 22 may further include a spatial attention module 22a and a channel attention module 22b. The spatial attention module 22a may weight features of the image space dimensions of the first image feature (e.g., both width and height dimensions of the feature map corresponding to the first image feature) to improve spatial saliency of the first image feature. The channel attention module 22b may weight features of the image channel dimensions of the first image feature to improve channel saliency of the first image feature. The addition of the attention module 22 enables the neural network 20 to focus on features that are helpful for the identification or classification of the code-scanned picture, thereby improving the identification or classification result of the code-scanned picture. The implementation of the attention module 22 will be described in more detail below in conjunction with fig. 3 and 4, and will not be described in detail here.

In step S150, the code-scanning picture is identified according to the second image feature, so as to obtain an identification result. The recognition result can be used for determining whether the code scanning picture is a picture obtained after scanning the movable code on the entity of the commodity. For example, the image encoding module in fig. 2 also includes a full-connection layer 212. The second image features obtained in step S140 are input into the fully-connected layer 212 (before the second image features are input into the fully-connected layer, one or more hidden layers may be used to process the second image features), so as to obtain a recognition result (or called a prediction result).

In some embodiments, the recognition result may include a first recognition result. The first recognition result can be used to indicate a label of an active code in the code-scanning picture. The label of the active code may include an entity code and an electronic code. If the label of the activity code is the entity code, the code scanning picture can be judged to be a picture obtained by scanning the activity code on the entity of the commodity; if the label of the active code is an electronic code, it can be determined that the code scanning picture is not a picture obtained by scanning the active code on the entity of the commodity. In other words, the code-scanned picture is likely to be a forged picture manufactured by the network illegal industry based on the active code.

In some embodiments, the recognition result may include a second recognition result. The second recognition result can be used to indicate a label of the background area in the code scanning picture. The label of the background area comprises at least one of the following labels: a label indicating a color of the background area, a label indicating a type of the object in the background area, and a label indicating a material of the object in the background area. If the label of the background area belongs to the preset label, the code scanning picture can be judged to be a picture obtained by scanning the activity code on the entity of the commodity; if the label of the background area does not belong to the preset label, it can be determined that the code scanning picture is not a picture obtained by scanning the activity code on the entity of the commodity. In other words, the code-scanning picture is likely to be a forged picture manufactured by the network illegal industry based on the active code.

The label of the background area can be specified according to the actual condition of the commodity. For example, if the activation code is printed on the cap of the item and/or the package, the label in the background area may include the cap and/or the package. As another example, if the color of the object bearing the active code (which may be the body of the article or the package of the article) is red and/or blue, the label of the background area may include red and/or blue. For another example, if the material of the object (which may be the body of the article or the packaging box of the article) bearing the active code is metal and/or paper, the label in the background area may include metal and/or paper.

In some embodiments, the recognition result may include both the first recognition result and the second recognition result. After the first recognition result and the second recognition result are obtained, the first recognition result and the second recognition result can be integrated to judge whether the code scanning picture is a picture obtained by scanning the activity code on the entity of the commodity. For example, if the first recognition result indicates that the mobile code is an entity code, and the second recognition result indicates that the type of the background region in the code scanning picture is a preset type, it may be determined that the code scanning picture is a picture obtained by scanning the mobile code on the entity of the commodity.

The embodiment of the disclosure provides an identification framework for a code scanning picture, which can effectively identify or classify the code scanning picture (for example, effectively identify an active code in the code scanning picture as an entity code or an electronic code). In addition, with the introduction of the attention mechanism, the region of interest (the region with high relevance to picture identification) in the code scanning picture can be determined, and the significance of the image features in the region of interest is improved, so that the identification performance is improved.

If the types of the attention modules used in step S130 are different, the implementation manner of step S130 is different. The processing flows corresponding to the spatial attention module and the channel attention module are respectively described below.

As shown in FIG. 2, in some embodiments, the attention module 22 may include a spatial attention module 22a. The spatial attention module 22a may process the first image feature to obtain a weight of an image spatial feature in the first image feature. An image space feature may refer to a feature of a first image feature in both dimensions, width w and height h, of the image. The weighting of the image space feature in the first image feature is used to weight the first image feature, which is equivalent to letting the image encoding module 21 pay attention to an area in the image space dimension, which has a large influence on the recognition result, so that the spatial significance of the first image feature can be improved.

As shown in fig. 3, when the attention module 22 includes the spatial attention module 22a, step S130 in fig. 1 may include step S131a and step S132a.

In step S131a, a first maximum pooling operation and a first average pooling operation are performed on the first image feature in the spatial dimension of the first image feature, respectively, to obtain a first maximum pooling result and a first average pooling result.

The first Max Pooling operation may be a Global Max Pooling (GMP) operation. The first Average Pooling operation may be a Global Average Pooling (GAP) operation.

In step S132a, a convolution operation is performed on the first maximum pooling result and the first average pooling result to determine a weight of the image spatial feature in the first image feature.

As shown in fig. 2, the convolution operation Conv may fuse the first maximum pooling result and the first average pooling result. The result of the convolution operation Conv is the weight of the image space feature in the first image feature. Further, in some embodiments, before weighting the weight of the image spatial feature to the first image feature, a sigmoid function may be further employed to compress the weight of the image spatial feature in the first image feature to be in a range of 0-1.

With continued reference to fig. 2, in some embodiments, the attention module 22 may include a channel attention module 22b. The channel attention module 22b may process the first image feature to obtain a weight of an image channel feature in the first image feature. An image space feature may refer to a feature of a first image feature in a channel dimension of an image. The first image features are weighted by the weight of the image channel features in the first image features, which means that the image coding module pays attention to a channel with larger influence on the identification result in the dimension of the image channel, so that the channel significance of the first image features can be improved.

As shown in fig. 4, when the attention module includes the spatial attention module, step S130 in fig. 1 may include step S131b, step S132b, and step S133b.

In step S131b, a second maximum pooling operation and a second average pooling operation are performed on the first image feature in the channel dimension of the first image feature, respectively, to obtain a second maximum pooling result and a second average pooling result.

The second max pooling operation may be a GMP operation. The second average pooling operation may be a GAP operation.

In step S132b, a full-join operation is performed on the second maximum pooling result and the second average pooling result, respectively, to obtain a first full-join result and a second full-join result.

In step S133b, the first full connection result and the second full connection result are added to determine a weight of the image channel feature in the first image feature.

Further, in some embodiments, before weighting the weight of the image channel feature to the first image feature, a sigmoid function may be further employed to compress the weight of the image spatial feature in the first image feature to be in a range of 0-1.

The process described in fig. 1 can be understood as an inference process of a neural network. The training process of the neural network is described below with reference to fig. 5. It should be understood that the training process is similar to the inference process, except that the training process also requires the model parameters of the neural network to be adjusted according to training losses. Therefore, the following description focuses on the differences between the training process and the reasoning process, and therefore, a part not described in detail may refer to the related description of fig. 1.

Referring to fig. 5, in step S510, a code scan picture is received. The code scanning picture may contain a campaign code corresponding to a marketing campaign for the goods. Unlike the inference process, the code-swept pictures input by the training process may be referred to as training samples. The training samples may include positive samples and negative samples. The code scanning picture corresponding to the positive sample can be a picture obtained by scanning the movable code on the entity of the commodity. The code-scanning picture corresponding to the negative sample can be a forged code-scanning picture. The training sample may be collected by the brand of the good.

In step S520, feature extraction is performed on the code-scanning picture by using the image coding module to obtain a first image feature.

In step S530, the attention module is used to process the first image feature to obtain a weight of the first image feature.

In step S540, the first image feature is weighted by using the weight of the first image feature, so as to obtain a second image feature.

In step S550, the code scanning image is identified according to the second image feature to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning an active code on an entity of the commodity.

The description of steps S520 to S550 is similar to that of steps S120 to S150 in fig. 1, and reference may be made to the related description of fig. 1.

In step S560, a training loss of the neural network is determined according to the recognition result. In the training phase, the code-scanning picture may be pre-labeled, that is, the identification result (such as the type or the label of the code-scanning picture) of the code-scanning picture is labeled. The recognition result output based on the neural network may then be compared to the pre-labeled result, resulting in a training loss.

In step S570, the neural network is trained according to the training loss. For example, the model parameters of the neural network may be updated in a back-propagation manner based on the training loss such that the model parameters of the neural network converge toward a direction that reduces the training loss.

The code-swept tile referred to in fig. 5 may be one of a plurality of training samples of a neural network. The neural network may first be trained to a substantially stable state based on a plurality of training samples. The neural network model may then be fine-tuned. One method of fine tuning the neural network is presented below in conjunction with fig. 6.

As shown in fig. 6, in step S610, a target training sample is selected from a plurality of training samples according to training losses corresponding to the plurality of training samples. The training loss of the target training sample is greater than the remaining training samples of the plurality of training samples other than the target training sample. The greater the training loss of a training sample, the harder it is to identify the training sample. Therefore, the target training sample mentioned herein can be understood as a sample selected from a plurality of training samples that is difficult to identify (e.g., difficult to classify) (such sample can also be called hard example). The target training sample may be determined, for example, by sorting training losses corresponding to a plurality of training samples to find a training sample with a training loss in the top N bits. Alternatively, a training sample having a training loss greater than a preset threshold may be selected from a plurality of training samples as the target training sample.

In step S620, the neural network is trained using the target training samples. The neural network is finely adjusted by using the sample difficult to identify, so that the data volume of neural network training can be reduced, the training process can be more targeted, and the overall identification effect of the neural network is improved.

Optionally, in some embodiments, the attention module may include a spatial attention module and a channel attention module. Step S530 may include: processing the first image features by using a spatial attention module to obtain the weight of the image spatial features in the first image features; and/or processing the first image characteristic by using a channel attention module to obtain the weight of the image channel characteristic in the first image characteristic.

Optionally, in some embodiments, step S530 may include: respectively performing a first maximum pooling operation and a first average pooling operation on the first image characteristic in the spatial dimension of the first image characteristic to obtain a first maximum pooling result and a first average pooling result; convolving the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image feature.

Optionally, in some embodiments, step S530 may include: respectively performing a second maximum pooling operation and a second average pooling operation on the first image characteristic in the channel dimension of the first image characteristic to obtain a second maximum pooling result and a second average pooling result; performing full connection operation on the second maximum pooling result and the second average pooling result respectively to obtain a first full connection result and a second full connection result; the first full join result and the second full join result are added to determine a weight of the image channel feature in the first image feature.

Optionally, in some embodiments, the identification result may include a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code scanning picture, the label includes an entity code and an electronic code, the second identification result is used for indicating a label of a background area in the code scanning picture, and the label of the background area includes at least one of the following labels: a label indicating a color of the background area, a label indicating a type of the object in the background area, and a label indicating a material of the object in the background area.

Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1-6, and apparatus embodiments of the present disclosure are described in detail below in conjunction with fig. 7-9. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.

Fig. 7 is a schematic structural diagram of an apparatus for recognizing pictures according to an embodiment of the present disclosure. The apparatus 700 of fig. 7 includes an image encoding module 710 and an attention module 720.

The image encoding module 710 may be configured to receive a code-scanned picture, where the code-scanned picture includes a campaign code corresponding to a marketing campaign for a commodity; extracting the features of the code scanning image to obtain first image features;

attention module 720 may be configured to process the first image feature to obtain a weight of the first image feature; and weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics.

The image encoding module 710 may be further configured to identify the code scanning image according to the second image feature to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning the active code on the entity of the commodity.

Optionally, the attention module 720 may include a spatial attention module and/or a channel attention module, where the spatial attention module is configured to process the first image feature to obtain a weight of an image spatial feature in the first image feature; and/or the channel attention module is used for processing the first image feature to obtain the weight of the image channel feature in the first image feature.

Optionally, the spatial attention module is configured to perform a first maximum pooling operation and a first average pooling operation on the first image feature in a spatial dimension of the first image feature, respectively, to obtain a first maximum pooling result and a first average pooling result; performing a convolution operation on the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image features.

Optionally, the channel attention module is configured to perform a second maximum pooling operation and a second average pooling operation on the first image feature in a channel dimension of the first image feature, respectively, to obtain a second maximum pooling result and a second average pooling result; performing full-connection operation on the second maximum pooling result and the second average pooling result respectively to obtain a first full-connection result and a second full-connection result; adding the first full join result and the second full join result to determine a weight of an image channel feature in the first image feature.

Optionally, the identification result includes a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code scanning picture, the label of the active code includes an entity code and an electronic code, the second identification result is used for indicating a label of a background area in the code scanning picture, and the label of the background area includes at least one of the following labels: a label for indicating a color of the background region, a label for indicating a type of the object in the background region, and a label for indicating a material of the object in the background region.

Optionally, the apparatus 700 of fig. 7 may further include: and the decision module is used for determining that the code scanning picture is a picture obtained after scanning the movable code on the entity of the commodity if the first identification result indicates that the movable code is an entity code and/or the second identification result indicates that the type of the background area in the code scanning picture is a preset type.

Fig. 8 is a schematic structural diagram of an apparatus for training a neural network according to an embodiment of the present disclosure. The apparatus 800 of FIG. 8 may include an image encoding module 810, an attention module 820, and a training module 830.

The image coding module 810 can be configured to receive a code-scanned picture, where the code-scanned picture includes a campaign code corresponding to a marketing campaign for a commodity; and extracting the features of the code scanning image to obtain first image features.

Attention module 820 may be configured to process the first image feature to obtain a weight of the first image feature; and weighting the first image characteristics by using the weight of the first image characteristics to obtain second image characteristics.

The image encoding module 810 may further be configured to identify the code scanning image according to the second image feature to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning the active code on the entity of the commodity.

The training module 830 may be configured to determine a training loss of the neural network according to the recognition result; training the neural network according to the training loss.

Optionally, the attention module 820 may include: the spatial attention module is used for processing the first image features to obtain weights of the image spatial features in the first image features; and/or the channel attention module is used for processing the first image characteristic to obtain the weight of the image channel characteristic in the first image characteristic.

Optionally, the spatial attention module is configured to perform a first maximum pooling operation and a first average pooling operation on the first image feature in a spatial dimension of the first image feature, respectively, to obtain a first maximum pooling result and a first average pooling result; performing a convolution operation on the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image feature.

Optionally, the code scanning picture is one of a plurality of training samples of the neural network model, and the training module is further configured to: selecting a target training sample from the plurality of training samples according to training losses corresponding to the plurality of training samples, wherein the training loss of the target training sample is greater than the training losses of the rest training samples except the target training sample; training the neural network using the target training samples.

Fig. 9 is a schematic structural diagram of an apparatus for recognizing pictures or training a neural network according to yet another embodiment of the present disclosure. The apparatus 900 shown in fig. 9 can perform the method corresponding to any of the foregoing embodiments. The apparatus 900 may be, for example, a computing device having computing functionality. For example, the apparatus 900 may be a server. The apparatus 900 may include a memory 910 and a processor 920. Memory 910 may be used to store executable code. The processor 920 may be configured to execute the executable code stored in the memory 910 to implement the steps of the methods described above. In some embodiments, the apparatus 900 may further include a network interface 930, and data exchange between the processor 920 and an external device may be implemented through the network interface 930.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., digital Video Disk (DVD)), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of identifying a picture, the method comprising:

receiving a code scanning picture, wherein the code scanning picture comprises activity codes corresponding to marketing activities of commodities;

performing feature extraction on the code scanning picture by using an image coding module to obtain a first image feature;

processing the first image characteristic by using an attention module to obtain a second image characteristic;

identifying the code scanning image according to the second image characteristics to obtain an identification result, wherein the identification result is used for determining whether the code scanning image is an image obtained by scanning the movable code on the entity of the commodity; and

and if the identification result indicates that the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained by scanning the activity code on the entity of the commodity.

2. The method of claim 1, wherein the processing the first image feature with the attention module to obtain a second image feature comprises:

processing the first image feature by using the attention module to obtain the weight of the first image feature;

and weighting the first image characteristics by using the weight of the first image characteristics to obtain the second image characteristics.

3. The method of claim 2, the attention module comprising a spatial attention module and/or a channel attention module,

the processing the first image feature by using the attention module to obtain the weight of the first image feature includes:

processing the first image feature by using the spatial attention module to obtain the weight of the image spatial feature in the first image feature; and/or the presence of a gas in the atmosphere,

and processing the first image characteristic by using the channel attention module to obtain the weight of the image channel characteristic in the first image characteristic.

4. The method of claim 3, wherein the processing the first image feature with the spatial attention module to obtain a weight of an image spatial feature in the first image feature comprises:

respectively performing a first maximum pooling operation and a first average pooling operation on the first image feature in the spatial dimension of the first image feature to obtain a first maximum pooling result and a first average pooling result;

performing a convolution operation on the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image feature.

5. The method of claim 3, wherein the processing the first image feature with the channel attention module to obtain a weight of an image channel feature in the first image feature comprises:

respectively performing a second maximum pooling operation and a second average pooling operation on the first image feature in the channel dimension of the first image feature to obtain a second maximum pooling result and a second average pooling result;

respectively carrying out full connection operation on the second maximum pooling result and the second average pooling result to obtain a first full connection result and a second full connection result;

adding the first full join result and the second full join result to determine a weight of an image channel feature in the first image feature.

6. The method according to claim 1, wherein the identification result comprises a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code scanning picture, and the label of the active code comprises an entity code and an electronic code; the second identification result is used for indicating a label of a background area in the code scanning picture, and the label of the background area comprises at least one of the following labels: a label for indicating a color of the background region, a label for indicating a type of the object in the background region, and a label for indicating a material of the object in the background region.

7. The method of claim 1, wherein the determining that the code scanning picture is a picture obtained by scanning the activity code on the entity of the commodity if the identification result indicates that the type of the background area in the code scanning picture is a preset type comprises:

and if the identification result indicates that the movable code is an entity code and the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained by scanning the movable code on the entity of the commodity.

8. A method of training a neural network, the neural network comprising an image encoding module and an attention module, the method comprising:

performing feature extraction on the code scanning picture by using the image coding module to obtain a first image feature;

processing the first image feature by using the attention module to obtain a second image feature;

identifying the code scanning image according to the second image characteristics to obtain an identification result, wherein the identification result is used for determining whether the code scanning image is an image obtained by scanning the movable code on the entity of the commodity;

if the identification result indicates that the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained by scanning the movable code on the entity of the commodity;

determining the training loss of the neural network according to the recognition result;

training the neural network according to the training loss.

9. The method of claim 8, wherein the processing the first image feature with the attention module to obtain a second image feature comprises:

processing the first image characteristic by using the attention module to obtain the weight of the first image characteristic;

10. The method of claim 9, the attention module comprising a spatial attention module and/or a channel attention module,

processing the first image feature by using the spatial attention module to obtain the weight of the image spatial feature in the first image feature; and/or the presence of a gas in the gas,

11. The method of claim 10, wherein the processing the first image feature with the spatial attention module to obtain a weight of an image spatial feature in the first image feature comprises:

12. The method of claim 10, wherein the processing the first image feature with the channel attention module to obtain a weight of an image channel feature in the first image feature comprises:

performing full-connection operation on the second maximum pooling result and the second average pooling result respectively to obtain a first full-connection result and a second full-connection result;

13. The method of claim 8, the code-scan picture being one of a plurality of training samples of the neural network model,

the method further comprises the following steps:

selecting a target training sample from the plurality of training samples according to training losses corresponding to the plurality of training samples, wherein the training loss of the target training sample is greater than the training losses of the rest training samples except the target training sample;

training the neural network using the target training samples.

14. The method of claim 8, wherein the identification result comprises a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code scanning picture, the label of the active code comprises a solid code and an electronic code, and the second identification result is used for indicating a label of a background area in the code scanning picture, and the label of the background area comprises at least one of the following labels: a label for indicating a color of the background region, a label for indicating a type of the object in the background region, and a label for indicating a material of the object in the background region.

15. The method of claim 8, wherein the determining that the code scanning picture is a picture obtained by scanning the activity code on the entity of the commodity if the identification result indicates that the type of the background area in the code scanning picture is a preset type comprises:

and if the identification result indicates that the movable code is an entity code and the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained after scanning the movable code on the entity of the commodity.

16. An apparatus to recognize a picture, the apparatus comprising:

the image coding module is used for receiving a code scanning picture, and the code scanning picture comprises activity codes corresponding to marketing activities of commodities; extracting the features of the code scanning image to obtain first image features;

the attention module is used for processing the first image characteristics to obtain second image characteristics;

the image coding module is further configured to identify the code scanning image according to the second image characteristic to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning the active code on the entity of the commodity; and

and the decision module is used for determining that the code scanning picture is a picture obtained after scanning the activity code on the entity of the commodity if the identification result indicates that the type of the background area in the code scanning picture is a preset type.

17. The apparatus of claim 16, the attention module further to:

processing the first image characteristic to obtain the weight of the first image characteristic;

18. The apparatus of claim 17, the attention module comprising a spatial attention module and/or a channel attention module,

the spatial attention module is used for processing the first image features to obtain weights of image spatial features in the first image features; and/or the presence of a gas in the gas,

the channel attention module is used for processing the first image feature to obtain the weight of the image channel feature in the first image feature.

19. The apparatus of claim 18, the spatial attention module is configured to perform a first maximum pooling operation and a first average pooling operation on the first image feature in a spatial dimension of the first image feature, respectively, to obtain a first maximum pooling result and a first average pooling result; performing a convolution operation on the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image feature.

20. The apparatus of claim 18, the channel attention module is configured to perform a second maximum pooling operation and a second average pooling operation on the first image feature in a channel dimension of the first image feature, respectively, to obtain a second maximum pooling result and a second average pooling result; performing full-connection operation on the second maximum pooling result and the second average pooling result respectively to obtain a first full-connection result and a second full-connection result; adding the first full join result and the second full join result to determine a weight of an image channel feature in the first image feature.

21. The apparatus of claim 16, wherein the identification result comprises a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code-scanning picture, the label of the active code comprises a solid code and an electronic code, the second identification result is used for indicating a label of a background area in the code-scanning picture, and the label of the background area comprises at least one of the following labels: a label for indicating a color of the background region, a label for indicating a type of the object in the background region, and a label for indicating a material of the object in the background region.

22. The apparatus of claim 16, the decision module further to: and if the identification result indicates that the movable code is an entity code and the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained by scanning the movable code on the entity of the commodity.

23. An apparatus to train a neural network, comprising:

the image coding module is further configured to identify the code scanning image according to the second image characteristic to obtain an identification result, where the identification result is used to determine whether the code scanning image is an image obtained by scanning the active code on the entity of the commodity;

the decision module is used for determining that the code scanning picture is a picture obtained after scanning the activity code on the entity of the commodity if the identification result indicates that the type of the background area in the code scanning picture is a preset type;

the training module is used for determining the training loss of the neural network according to the recognition result; training the neural network according to the training loss.

24. The apparatus of claim 23, the attention module further to:

and weighting the first image characteristic by using the weight of the first image characteristic to obtain the second image characteristic.

25. The apparatus of claim 24, the attention module comprising:

the spatial attention module is used for processing the first image features to obtain the weight of the image spatial features in the first image features; and/or the presence of a gas in the atmosphere,

and the channel attention module is used for processing the first image characteristics to obtain the weight of the image channel characteristics in the first image characteristics.

26. The apparatus of claim 25, the spatial attention module is configured to perform a first maximum pooling operation and a first average pooling operation on the first image feature in a spatial dimension of the first image feature, respectively, to obtain a first maximum pooling result and a first average pooling result; performing a convolution operation on the first maximum pooling result and the first average pooling result to determine weights of image spatial features in the first image features.

27. The apparatus of claim 25, the channel attention module to perform a second maximal pooling operation and a second average pooling operation on the first image feature at a channel dimension of the first image feature, respectively, to obtain a second maximal pooling result and a second average pooling result; performing full-connection operation on the second maximum pooling result and the second average pooling result respectively to obtain a first full-connection result and a second full-connection result; adding the first full join result and the second full join result to determine a weight of an image channel feature in the first image feature.

28. The apparatus of claim 23, the code-scan picture being one of a plurality of training samples of the neural network model, the training module further configured to: selecting a target training sample from the plurality of training samples according to training losses corresponding to the plurality of training samples, wherein the training loss of the target training sample is greater than the training losses of the rest training samples except the target training sample; training the neural network using the target training samples.

29. The apparatus according to claim 23, wherein the identification result comprises a first identification result and/or a second identification result, the first identification result is used for indicating a label of an active code in the code scanning picture, the label of the active code comprises a solid code and an electronic code, the second identification result is used for indicating a label of a background area in the code scanning picture, and the label of the background area comprises at least one of the following labels: a label for indicating a color of the background region, a label for indicating a type of the object in the background region, and a label for indicating a material of the object in the background region.

30. The device of claim 23, the decision module further to: and if the identification result indicates that the movable code is an entity code and the type of the background area in the code scanning picture is a preset type, determining that the code scanning picture is a picture obtained by scanning the movable code on the entity of the commodity.

31. An apparatus for identifying a picture, comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of any one of claims 1-7.

32. An apparatus for training a neural network, comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of any one of claims 8-15.