CN113536829A

CN113536829A - Goods static identification method of unmanned retail container

Info

Publication number: CN113536829A
Application number: CN202010286329.4A
Authority: CN
Inventors: 张海军; 李东海
Original assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Current assignee: Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2021-10-22
Anticipated expiration: 2040-04-13
Also published as: CN113536829B

Abstract

The invention provides a static goods identification method for an unmanned retail container, which comprises the steps of constructing a static identification data set through manual acquisition and manual marking; introducing a deformable convolution neural network and a group normalization layer into a backbone network, selecting a focusing loss function for classification in a sub-network and selecting a balance L1 loss function for coordinate regression to construct a first-stage target detection model; training a stage target detection model to obtain grid parameters; and inputting the grid parameters into the unmanned sales counter, and identifying the types and the quantity of the goods. The static goods identification method provided by the invention solves the problem of instability of edge goods detection in the traditional target detection model, and improves the user experience of unmanned goods selling by improving the goods identification rate.

Description

Goods static identification method of unmanned retail container

Technical Field

The invention belongs to the field of object identification of unmanned retail, and particularly relates to a static goods identification method of an unmanned retail container.

Background

As a large class of unattended services, unattended retail mainly refers to retail consumption behavior that is performed in an unattended situation. Unmanned retail refers to a new retail service realized based on intelligent technology without the attendance of a shopping guide and a cashier. Vending machines, which were originally developed in the early 80's of the 19 th century, were an example of an unmanned retail model. Nowadays, a novel vending machine is produced by using new technologies such as mobile payment and two-dimensional codes. The use of these newly developed technologies greatly improves the efficiency of selling goods and the user experience as compared to conventional vending machines. Typically, a consumer needs to open an application providing mobile payment services, such as a payer, WeChat, etc., and then enter a transaction settlement process by scanning the two-dimensional code. However, the business process still needs to follow the traditional operational shopping flow. For example, only one item can be selected at a time, and when a user wants to purchase a plurality of items, the operation needs to be repeated many times, which is inconvenient. In contrast, newly developed unmanned intelligent vending machines can greatly improve the shopping experience by employing advanced computer vision technology. Tencent optimal graph laboratory introduced an unmanned intelligent retail container case equipped with artificial intelligence technology on the computer vision peak in 2018. The intelligent vending machine integrates the technologies such as a deep learning technology, a visual product recognition algorithm and WeChat online payment into an unmanned intelligent retail container based on visual recognition, explores a new shopping mode, and greatly improves the purchasing experience compared with the traditional vending machine. With the rapid development of technologies such as computer vision, RFID, deep learning, Internet of things and the like, the unmanned intelligent vending machine is more and more popular in the e-commerce market as an important unmanned retail form.

The core of the static goods identification method in the environment of the unmanned retail container is a target detection algorithm. According to the development of the target detection method, the field can be roughly divided into two main detection branches, namely a two-stage target detection method and a one-stage target detection method. In recent years, target detection performance in multiple reference datasets has been continuously updated based on two-stage and one-stage detection algorithms of convolutional neural networks. In 2014, Girshick et al proposed an R-CNN target detection method, which is an important algorithm introduced into the field of target detection in the deep learning in recent years. In a later study, Girshick et al proposed an improved Fast RCNN method. Based on the idea of a multitask loss function, Fast RCNN combines classification loss and bounding box regression loss into a unified end-to-end training framework. However, generating positive and negative candidate boxes still requires a selective search algorithm to generate the physical candidate regions, which separates the training process of the detector. In addition, it is very time consuming in the testing phase. To address this problem, Ren et al propose a faster R-CNN with candidate area generation network module to help generate the candidate box. In addition, inspired by a one-stage method of the regression of the OverFeat algorithm, Redmon et al propose a one-stage detection method named YOLO, which omits a candidate bounding box extraction branch (candidate box suggestion stage), and integrates feature extraction, candidate bounding box position regression and classification into the same convolutional network.

When the traditional target detection algorithm is used for detecting and analyzing goods in the unmanned retail container, the goods at the edge of the picture are not detected stably, the frame is lost frequently, the recall rate is reduced, the user experience is poor, and the market popularization of the unmanned retail container is seriously influenced.

Disclosure of Invention

The invention aims to provide a static goods identification method for an unmanned retail container, which is characterized in that a deformable convolution neural network is introduced to construct a one-stage target detection model, so that the goods identification rate is improved, and the problem of unstable edge goods detection in the unmanned retail container is solved.

In order to achieve the purpose, the invention adopts the following technical scheme: a method of static identification of goods for an unmanned retail container, the method comprising: constructing a static data set, and manually marking the label, category and bounding box coordinate information of an image by manually collecting the image; constructing a one-stage target detection model, wherein the one-stage target detection model comprises a backbone network and a sub-network; introducing a deformable convolution neural network into the backbone network, wherein a group normalization layer is selected as a normalization layer of the backbone network; in the sub-network, a focusing loss function is selected to classify the coordinate information of the boundary frame, and a balance L1 loss function is selected to perform coordinate regression on the coordinate information of the boundary frame; training the first-stage target detection model, taking the picture of the static data set as input, extracting features through the backbone network, and taking the coordinate information of the label, the category and the boundary box as output to obtain grid parameters; and inputting the grid parameters into the unmanned retail container to perform static goods identification.

Specifically, the backbone network employs a residual error network.

Preferably, the method for introducing the deformable convolutional neural network into the backbone network is introduced into the last three layers of convolution.

Preferably, the method for training the one-stage target detection model adopts a randomness reduction algorithm and a momentum algorithm.

Specifically, the method for constructing the one-stage target detection model is to construct a DrtNet model on the basis of a RetinaNet model.

The invention has the beneficial effects that: a deformable convolution neural network and a group normalization layer are added on the basis of a RetinaNet model, and a focusing loss function and a balance L1 loss function are selected for classification and regression, so that the recall rate of an original data set is improved, the condition that a boundary frame is lost is prevented to a certain extent, and the goods identification rate of the unmanned retail container is improved.

Drawings

FIG. 1 is a flow chart of a method of static identification of goods for an unmanned retail container of the present invention;

FIG. 2 is a block diagram of a deformable convolutional neural network of the present invention;

FIG. 3 is a schematic view of an embodiment of an unmanned retail container of the present invention;

FIG. 4 is a flow chart of the operation of an unmanned retail container according to an embodiment of the present invention;

FIG. 5 is a graph showing the results of the beverage item test according to the embodiment of the present invention.

The reference numerals in the figures denote:

1. an unmanned retail container; 2. a cabinet door; 3. a camera is provided.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention is provided with examples. It should be understood that the examples described herein are only for the purpose of illustrating the present invention, and are not intended to limit the scope of the present invention.

Referring to fig. 1, fig. 1 is a flow chart of the static goods identification method of the unmanned retail container of the present invention. Wherein,

step S1: and constructing a static data set, and manually marking the label, category and bounding box coordinate information of the image by manually collecting the image.

In this embodiment, the goods category is constructed by randomly selecting 10 beverages from the market, which are: plus Duodao (JDB), pulsation (MZ), Fenda (FT), Master Kong Iced Black Tea (IBT), Nutrition express line (NE), unified assam tea milk Green (JGMT), beauty juice source (MM), Baisui mountain (GTEN), unified assam original milk tea (UAMT) and force emperor (VVW). The entire data set of static identification is in the VOC2007 data format, with pictures having two sizes, mainly 1280 x 720 and 1920 x 1080. The total data set was 34052 pictures, all of which were labeled, for a total of 155153 beverages.

Step S2: constructing a one-stage target detection model, wherein the one-stage target detection model comprises a backbone network and a sub-network; introducing a deformable convolution neural network into a backbone network, wherein a group normalization layer is selected as a normalization layer of the backbone network; in the sub-network, a focusing loss function is selected to classify the coordinate information of the bounding box, and a balance L1 loss function is selected to perform coordinate regression on the coordinate information of the bounding box.

Step S3: training the stage target detection model in the step S2, taking the picture of the static data set in the step S1 as input, performing feature extraction through a backbone network, and taking the coordinate information of the label, the category and the bounding box as output to acquire grid parameters. The process specifically uses a small batch of stochastic gradient descent, momentum algorithm for training.

Step S4: and inputting the grid parameters into the unmanned retail container to perform static goods identification.

In the step S2, the method for constructing the one-stage target detection model is to construct a DrtNet model on the basis of the RetinaNet model, the backbone network adopts a residual network, and the introduction of the deformable convolution neural network is introduced in the last three-layer convolution. In this embodiment, please refer to fig. 2, which is a schematic diagram of a deformable convolutional neural network. In the embodiment of the invention, the adaptive learning variable is introduced in the deformable convolution operation, and the rule of the traditional convolution kernel operation is not changed. Also for each output y (p)₀) All up-sample 9 positions from the input feature map, where the 9 positions are the center position x (p)₀) Diffused all around, but with an increase of Δ p_nThe sample points are allowed to diffuse into a non-gidd shape. See formula (1)

The group normalization divides the feature map into a plurality of groups according to the channel dimension, normalizes each group, and changes the dimension of the feature map from [ N, C, H, W ] to [ N, G, C// G, H, W ], wherein the normalized dimension is [ C// G, H, W ]. The normalization mode of GN avoids the influence of Batch size on the model, and can solve the problems mentioned above, so that the designed one-stage target detection network completely replaces the Batch normalization layer with the group normalization layer.

The classification loss function uses a focus loss function which enables a one-stage target detection network to achieve the same accuracy as that of fast RCNN by adjusting the calculation formula of the loss function, and is an improved version of the cross entropy loss function. Adding a regulating factor (1-p) before the cross entropy_t)^γAnd gamma is more than or equal to 0, as shown in formula (2)

FL(p_t)＝-(1-p_t)^γlog(p_t) (2)

As γ gets larger, the penalty function is almost zero in the easily classified part, and p_tThe smaller part (indistinguishable samples), the loss function value is still larger. Thus, when the class imbalance is large, the loss function values of the samples can be accumulated, and the samples which are difficult to distinguish can contribute more loss function values.

In practical use, a slight accuracy improvement can be generated by adding an alpha balance factor on the basis of the formula (2), see formula (3):

FL(p_t)＝-α_t(1-p_t)^γlog(p_t) (3)

in the coordinate regression, a balance L1 loss function is used, wherein the sample loss is greater than or equal to 1.0 and is an outlier sample, and the sample loss is less than or equal to 1.0 and is an accurate sample. The key idea in balancing the L1 loss function is to raise the regression gradient of the key part, i.e. the gradient from the exact samples, to rebalance the samples and tasks involved, thus enabling a more balanced training in terms of classification, global positioning and exact positioning. Equation (4) for designing a lift gradient is as follows:

wherein alpha controls the lifting of the gradient of the accurate sample, and the gradient of the accurate sample can be lifted by setting smaller alpha without influencing the value of the outlier sample. The upper bound of the regression error is controlled and adjusted, so that different tasks can be more balanced. Alpha and gamma control balance from a sample and task level, and the two factors for controlling different aspects are mutually enhanced to achieve more balanced training. From the gradient equation, the balance L1 loss function can be derived as shown in equation (5):

referring to fig. 3 and fig. 4, fig. 3 is a schematic structural diagram of an unmanned retail container according to an embodiment of the present invention, and fig. 4 is a flowchart of an operation of the unmanned retail container according to the embodiment of the present invention.

When the user opens the cabinet door 2 of the unmanned retail container 1, the sensor is triggered to enable the camera 3 to take a first picture. After the user selects the goods, the hands extend out of the unmanned retail container 1, the infrared sensor detects the leaving of the hands, and the camera 3 is triggered to take a second picture. The type and the number of the goods taken by the user are determined by the placing pictures of the front and the back unmanned retail containers 1 for the goods taken by the user.

Referring to fig. 5, the result of identifying the beverage goods according to the embodiment of the present invention is obtained by using the static goods identification method of the unmanned retail container according to the embodiment of the present invention. As shown in FIG. 5, the method of the present invention can improve the recall rate of the original data set, and in this embodiment, the bounding box of the beverage item is not lost.

Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A static goods identification method for an unmanned retail container is characterized by comprising the following steps:

constructing a static data set, and manually marking the label, category and bounding box coordinate information of an image by manually collecting the image;

constructing a one-stage target detection model, wherein the one-stage target detection model comprises a backbone network and a sub-network; introducing a deformable convolution neural network into the backbone network, wherein a group normalization layer is selected as a normalization layer of the backbone network; in the sub-network, a focusing loss function is selected to classify the coordinate information of the boundary frame, and a balance L1 loss function is selected to perform coordinate regression on the coordinate information of the boundary frame;

training the first-stage target detection model, taking the picture of the static data set as input, extracting features through the backbone network, and taking the coordinate information of the label, the category and the boundary box as output to obtain grid parameters;

and inputting the grid parameters into the unmanned retail container to perform static goods identification.

2. The static goods identification method of claim 1, wherein the backbone network employs a residual network.

3. The static goods identification method of claim 1, wherein the method for introducing the deformable convolutional neural network into the backbone network is introduced into the last three layers of convolution.

4. The method for statically identifying goods according to claim 1, wherein the method for training the one-stage target detection model adopts a stochastic gradient descent algorithm with momentum.

5. The static goods identification method of claim 1, wherein the method for constructing the one-stage target detection model is to construct a DrtNet model based on a RetinaNet model.