CN111931674B

CN111931674B - Article identification management method, device, server and readable storage medium

Info

Publication number: CN111931674B
Application number: CN202010829881.3A
Authority: CN
Inventors: 秦永强; 高达辉
Original assignee: Innovation Qizhi Chengdu Technology Co ltd
Current assignee: Innovation Qizhi Chengdu Technology Co ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2024-04-02
Anticipated expiration: 2040-08-18
Also published as: CN111931674A

Abstract

The application provides an article identification management method, an article identification management device, a server and a readable storage medium. The method comprises the following steps: acquiring a first image obtained by shooting an object on a goods shelf by a camera, and acquiring attitude parameters of the camera when the first image is shot; based on the gesture parameters and a target transformation matrix corresponding to the gesture parameters, performing spatial transformation on the first image to obtain a second image so that the perspective distortion degree of the second image is smaller than that of the first image; identifying the article category and price tag region of the article in the second image through the trained deep learning model; determining a price tag region corresponding to the object as a target price tag region based on the position of the object of the same object class in the second image and the position of the price tag region in the second image; and determining price information according to the target price tag region, and establishing an association relation between the price tag information and the object, so that the problem of low efficiency of counting the object and the corresponding object price tag can be solved.

Description

Article identification management method, device, server and readable storage medium

Technical Field

The present invention relates to the field of computer image processing technologies, and in particular, to an article identification management method, an apparatus, a server, and a readable storage medium.

Background

The same type of articles are usually placed in the same area on a supermarket shelf, and price tags are arranged so that buyers can check the selling price of the current articles. There is a demand for a statistical item to be sold in the market at present, so that suppliers can adjust wholesale prices, production scales and the like according to the data. Although the service system of the supermarket stores information such as selling prices of various articles, the data is not generally disclosed to the outside. At present, the data needs personnel to manually record the prices of various articles and the corresponding relation of the articles, so that the efficiency is low.

Disclosure of Invention

The application provides an article identification management method, an article identification management device, a server and a readable storage medium, which can improve the problem of low efficiency of counting articles and corresponding article price tags.

In order to achieve the above objective, the technical solution provided in the embodiments of the present application is as follows:

in a first aspect, an embodiment of the present application provides an article identification management method, where the method includes:

acquiring a first image obtained by shooting an object on a goods shelf by a camera, and acquiring attitude parameters of the camera when the first image is shot;

based on the gesture parameters and a target transformation matrix corresponding to the gesture parameters, performing spatial transformation on the first image to obtain a second image, so that the perspective distortion degree of the second image is smaller than that of the first image;

Identifying an article category and a price tag region of the article in the second image through a trained deep learning model;

determining a price tag region corresponding to an article as a target price tag region based on the position of the article in the second image and the position of the price tag region in the second image;

and determining price tag information according to the target price tag region, and establishing an association relation between the price tag information and the article.

In the embodiment, the captured image is transformed and corrected, and then the transformed and corrected image is identified, so that the accuracy of identifying the article and the price tag is improved, the identification and the association of the article and the article price tag can be automatically realized, and the problem of low efficiency of counting the article and the corresponding article price tag is solved.

With reference to the first aspect, in some optional embodiments, before spatially transforming the first image to obtain a second image, the method further includes:

and determining the target transformation matrix corresponding to the attitude parameter through the attitude parameter of the camera when shooting the first image based on the corresponding relation between the attitude parameter and the transformation matrix.

In the above embodiment, the current target transformation matrix may be quickly determined based on the correspondence.

and inputting the attitude parameters and the first image into a trained camera attitude estimation network to obtain a target transformation matrix for transforming the first image into the second image.

In the above embodiment, the object transformation matrix of the first image may be quickly determined through the camera pose estimation network.

With reference to the first aspect, in some optional embodiments, before inputting the pose parameters and the first image into a trained camera pose estimation network, the method further comprises:

determining a current transformation matrix of each group of distorted images and front views obtained by shooting at different gesture parameters through a camera gesture estimation network;

transforming each distorted image through the current transformation matrix to obtain a transformed image;

when the similarity between the transformed image and the front view is smaller than a threshold value, adjusting the camera pose estimation network based on the parallax between the transformed image and the front view to obtain an adjusted current transformation matrix, and transforming the distorted image through the adjusted current transformation matrix until the similarity between the transformed image and the front view is larger than or equal to the threshold value;

And determining to obtain a trained camera pose estimation network when the similarity between the transformed image and the front view is greater than or equal to the threshold value.

In the above embodiment, training learning is performed on the camera pose estimation network, so that the camera pose estimation network can quickly and accurately determine the target transformation matrix of the first image based on the pose parameters.

With reference to the first aspect, in some optional embodiments, before identifying the item category and the price tag region of the item in the second image by the trained deep learning model, the method further comprises:

acquiring a training image set, wherein the training image set comprises a plurality of training images with an article region and a price tag region, and each training image is provided with a label for representing the article category in the article region and a label for representing the price tag region;

and training the deep learning model through the training image set to obtain the trained deep learning model.

In the above embodiment, training learning is performed on the deep learning model, so that the deep learning model can identify the item type and price of the item in the image.

With reference to the first aspect, in some optional embodiments, determining price tag information according to the target price tag region, and establishing an association relationship between the price tag information and the article, includes:

Identifying the target price tag region through a character identification algorithm to obtain text information serving as the price tag information;

and establishing an association relation between the price tag information and the article, wherein the price tag information comprises price information of the article.

In the embodiment, the price tag information can be obtained by identifying the price tag region, and then the association relationship between the article and the price tag information is established so as to be convenient for viewing.

With reference to the first aspect, in some optional embodiments, the method further includes:

and sending the price tag information of the article to a designated terminal, wherein the price tag information comprises the article category and the price of the article.

In the above embodiment, by transmitting the price information to the specified terminal, it is possible to facilitate the user of the specified terminal to view the item and the price of the item.

In a second aspect, embodiments of the present application further provide an article identification management device, where the device includes:

the acquisition unit is used for acquiring a first image obtained by shooting an article on a goods shelf by a camera and attitude parameters of the camera when the first image is shot;

the transformation unit is used for carrying out space transformation on the first image based on the gesture parameters and the target transformation matrix corresponding to the gesture parameters to obtain a second image so that the perspective distortion degree of the second image is smaller than that of the first image;

The identification unit is used for identifying the article type and the price tag region of the article in the second image through the trained deep learning model;

a region determining unit, configured to determine, based on a position of an item of a same item category in the second image and a position of the price tag region in the second image, that a price tag region corresponding to the item is a target price tag region;

and the price tag processing unit is used for determining price tag information according to the target price tag region and establishing an association relation between the price tag information and the article.

In a third aspect, embodiments of the present application further provide a server, where the server includes a memory and a processor coupled to each other, and the memory stores a computer program, where the computer program, when executed by the processor, causes the server to perform the method described above.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, where a computer program is stored, which when run on a computer causes the computer to perform the above-mentioned method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below. It is to be understood that the following drawings illustrate only certain embodiments of the present application and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may derive other relevant drawings from the drawings without inventive effort.

Fig. 1 is a schematic communication diagram of a server and a user terminal provided in an embodiment of the present application.

Fig. 2 is a block schematic diagram of a server according to an embodiment of the present application.

Fig. 3 is a flow chart of an article identification management method according to an embodiment of the present application.

Fig. 4a is a schematic diagram of a first image according to an embodiment of the present application.

Fig. 4b is a schematic diagram of a second image provided in an embodiment of the present application.

Fig. 5 is a functional block diagram of an article identification management device according to an embodiment of the present application.

Icon: 10-a server; 11-a processing module; 12-a memory module; 13-a communication module; 20-user terminal; 100-article identification management means; 110-an acquisition unit; a 120-transform unit; 130-an identification unit; 140-a region determination unit; 150-price tag processing unit.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that the terms "first," "second," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1 and fig. 2 in combination, the server 10 provided in the embodiments of the present application may establish a communication connection with the user terminal 20, and may perform data interaction through a network. The user can take an image obtained by shooting the goods on the shelf by the camera as a first image, and send the first image to the server 10 through the user terminal 20, and the server 10 analyzes and processes the first image to obtain the association relationship between the goods and the price tag in the first image.

The server 10 includes a processing module 11 and a storage module 12 coupled to each other, and the storage module 12 stores a computer program that, when executed by the processing module 11, enables the server 10 to perform each step in the article identification management method described below.

In this embodiment, the server 10 may further include other modules. Referring to fig. 2, in the present embodiment, the server 10 may further include a communication module 13. The processing module 11, the storage module 12, the communication module 13 and the various elements of the article identification management device 100 are electrically connected directly or indirectly to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The user terminal 20 may be, but is not limited to, a smart phone, a personal computer (Personal Computer, PC), a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a mobile internet device (Mobile Internet Device, MID), etc. The network may be, but is not limited to, a wired network or a wireless network.

Referring to fig. 3, the embodiment of the present application further provides an article identification management method, which may be applied to the above-mentioned server 10, and executed or implemented by the server 10 in each step of the method. The method may include steps S210 to S250 as follows:

step S210, acquiring a first image obtained by shooting an article on a shelf by a camera and attitude parameters of the camera when the first image is shot;

step S220, based on the gesture parameters and a target transformation matrix corresponding to the gesture parameters, performing spatial transformation on the first image to obtain a second image, so that the perspective distortion degree of the second image is smaller than that of the first image;

step S230, identifying the article category and the price tag region of the article in the second image through the trained deep learning model;

step S240, determining a price tag region corresponding to the object as a target price tag region based on the position of the object of the same object category in the second image and the position of the price tag region in the second image;

Step S250, price tag information is determined according to the target price tag region, and an association relation between the price tag information and the article is established.

In the embodiment, the photographed image is transformed and corrected, and then the transformed and corrected image is identified, so that the accuracy of identifying the article and the price tag is improved. In addition, the identification and association of the articles and the article price tags can be automatically realized based on the deep learning model, so that the problem of low efficiency of counting the articles and the corresponding article price tags is solved.

The steps in the method will be described in detail as follows:

in step S210, the server 10 may acquire an image obtained by capturing an object on a shelf by a camera, where the image is the first image. The camera may be a camera module on the user terminal 20 or a camera device independent of the user terminal 20. For example, when the user terminal 20 is a smart phone, the camera may be a camera module on the smart phone. The articles on the shelf are usually sold articles, and can be set according to actual conditions. The articles are usually sold commodities, and the types of the sold commodities can be determined according to actual conditions. For example, commodities include, but are not limited to, bottled beverages, bagged breads, canned milk powders, etc., and are not particularly limited herein.

The user is usually a staff member who needs to collect the association relation of the articles and the price tags of the articles. When a user needs to collect an item and a corresponding price tag, the user needs to go to a supermarket, and the user shoots the item on the shelf through a camera, so that a first image is obtained. If the camera is an image capturing module on the user terminal 20, after capturing the first image from the camera, the user may upload the first image to the server 10 through a corresponding application program on the user terminal 20, so that the server 10 obtains the first image. If the camera is a device independent of the user terminal 20, the user may transmit the first image photographed by the camera to the user terminal 20 and then the user terminal 20 transmits the first image to the server 10.

Wherein, the user terminal 20 may immediately send the first image to the server 10 after obtaining the first image. Alternatively, the first image is automatically transmitted to the server 10 at intervals of a specified duration after the first image is obtained. Alternatively, the server 10 may detect in real time whether or not a new first image exists in a designated folder (the designated folder may be set according to actual conditions, and is not specifically limited herein) of the user terminal 20, and immediately acquire the new first image from the user terminal 20 to store in the storage module 12 of the server 10 itself when the new first image exists in the designated folder.

After acquiring the first image, the server 10 may estimate pose parameters of the camera when capturing the first image based on the distorted shape of the object in the first image. The gesture parameters include, but are not limited to, the spatial position of the camera relative to the reference plane in which the photographed object is located, the included angle between the photographing direction and the reference plane, and the like. For example, the actual object appears as a rectangular frame, but is a trapezoid frame on the first image, at this time, the server 10 may estimate the pose parameter of the camera when capturing the first image based on the difference between the rectangular frame and the trapezoid frame.

The manner of calculating the pose parameter of the camera when the first image is captured according to the first image may be determined according to the actual situation, and is not specifically limited herein. For example, the server 10 stores an unsupervised learning network based on monocular depth and pose estimation, and then the first image is identified and detected by the unsupervised learning network, so that the pose parameters of the camera can be obtained. The identification of the first image by the unsupervised learning network to obtain the pose parameters of the camera is well known to those skilled in the art and will not be described here.

As an alternative embodiment, the camera may be provided with a sensor such as a gyroscope, a gravity sensor, a distance measuring sensor, etc. for detecting the pose parameters of the camera. Alternatively, when the camera is a camera module on the user terminal 20, the user terminal 20 is provided with a gyroscope, a gravity sensor, a ranging sensor, etc. for detecting an attitude parameter when the camera captures an image. When acquiring the first image, the server 10 may acquire the pose parameter of the camera when capturing the first image.

In step S220, after obtaining the pose parameters of the camera, the server 10 may determine a target transformation matrix suitable for the current first image based on the pose parameters, and then spatially transform the first image by using the target transformation matrix to obtain the second image.

The transformation process of the server 10 to transform the first image into the second image can be implemented by the following formula:

in the formula, K is a camera internal reference matrix, which can be obtained through camera calibration, and is well known to those skilled in the art, and is not described herein.A spatial transformation matrix from the first image to the second image estimated for the camera pose estimation network. />And estimating the inverse of the parallax obtained by the network for the camera gesture, namely, the depth value corresponding to the pixel point. P is p _t Refers to a certain pixel point on the distorted image; k (K) ^-1 Is the inverse of matrix K; p is p _s For p on the transformed image _t Is a pixel of (a) a pixel of (b).

Referring to fig. 4a and fig. 4b in combination, fig. 4a is a schematic diagram of a first image obtained by capturing a commodity on a supermarket shelf. Fig. 4b shows a schematic diagram of a second image that may be obtained by spatial transformation of the first image shown in fig. 4 a. Of course, in the actual application scenario, the content of the first image and the second image may be determined according to the actual situation, and is not limited to those shown in fig. 4a and fig. 4 b.

Understandably, the server 10 may store correspondence between the pose parameters and the transformation matrix. After the server 10 obtains the first image shown in fig. 4a and the pose parameter of the camera when capturing the first image, the transformation matrix corresponding to the pose parameter can be determined as the target transformation matrix of the pose parameter based on the correspondence between the pose parameter and the transformation matrix. The server 10 then spatially transforms the first image based on the target transformation matrix to obtain a second image as shown in fig. 4 b.

It will be appreciated that the first image is typically subject to perspective distortion, which is understood to be distortion that causes the captured image to appear as near-far-small due to the imaging direction not being directly opposite the object. For example, when a rectangular frame in an actual environment is presented in the first image, it is a generally non-rectangular quadrangular frame. Wherein, the second image obtained after the space transformation has no perspective distortion or the perspective distortion is improved. Therefore, the second image is favorable for accurately identifying the deep learning model, so that the accuracy and the reliability of identification are improved.

In step S230, the trained deep learning model may be used to identify the category of the item in the image, and may also identify the price tag region in the image. After obtaining the second image, the server 10 may input the second image into a trained deep learning model, and then identify the item category and each price tag region of the items in the second image from the deep learning model. The price tag area is the area provided with the commodity price tag. Based on this, the server 10 may obtain the item category of the item in the second image, the location of the item in the second image, and the location of the price tag field in the second image.

In step S240, understandably, a plurality of types of objects and a plurality of price tag regions are generally included in the second image. The server 10 may determine that the price tag region corresponding to the item is the target price tag region for the item based on the location of the item in the second image, the location of the price tag region in the second image. The method for determining the target price tag region of the object can be as follows: among all the price tag areas, for similar articles, the price tag area closest to the article is determined as the target price tag area of the article.

For example, in fig. 4b, there are three categories of items and three price tags. The region where each price tag is located is the price tag region. During the determination of the target price tag region for item a, server 10 may determine that price tag a is the closest price tag to item a based on the location of item a and the location of each price tag region, at which point the region of price tag a is the target price tag region for item a.

In this embodiment, by performing spatial transformation on the first image and then identifying the second image obtained after transformation, instead of directly identifying the article type and the price tag on the first image, accuracy and reliability of identifying the article type and the price tag can be improved. In addition, the accuracy between the detected object and the target price tag region can be prevented from being influenced due to the fact that the distortion degree between the object and the price tag in the first image is large. For example, in fig. 4a, the conventional server 10 is likely to erroneously determine that the target price of the item C is the price label a, but that the target price of the item C is actually the price label C due to the large degree of distortion.

In step S250, after obtaining the target price tag region of the object, the server 10 may recognize the characters in the target price tag region, and convert the character information in the picture format in the target price tag region into the character information in the text format. And then taking the character information obtained by recognition as price tag information of the article and correlating the price tag information with the article. The server 10 can collect and count the article information and the article price tag information by storing the article information, the article price tag information and the association relation, thereby improving the problems of low efficiency of counting articles and corresponding article price tags and low accuracy of identifying article categories. Wherein the item information includes, but is not limited to, the brand, name, category, etc. of the item.

In this embodiment, step S250 may include: identifying the target price tag region through a character identification algorithm to obtain text information serving as the price tag information; and establishing an association relation between the price tag information and the article, wherein the price tag information comprises price information of the article. The character recognition algorithm may be selected according to the actual situation, for example, the server 10 may convert character information in a picture format in the target price tag region into character information in a text format through an optical character recognition (Optical Character Recognition, OCR) algorithm.

Understandably, the corresponding text information can be quickly and accurately extracted from the target price tag region through a character recognition algorithm, and the text information is price tag information of the corresponding article. Price tag information includes, but is not limited to, the selling price of the item, the item category, the item name, etc. The item category may be identified by the server 10 from the target price tag region, or may be identified and detected by the server 10 through a deep learning model on an image of the item.

As an alternative embodiment, before step S220, the method may further include: and determining the target transformation matrix corresponding to the attitude parameter through the attitude parameter of the camera when shooting the first image based on the corresponding relation between the attitude parameter and the transformation matrix.

In this embodiment, the server 10 may store the correspondence between a plurality of continuous gesture parameter ranges and transformation matrices, and different gesture parameter ranges correspond to different transformation matrices. When the gesture parameters of the camera are in the corresponding gesture parameter ranges, the transformation matrix corresponding to the gesture parameter ranges is determined to be the target transformation matrix corresponding to the gesture parameters. For example, the attitude parameter is the angle of the camera's shooting direction to the vertical plane of the shelf on which the item is placed, which is typically 0 ° -90 °.

In order to ensure the sharpness of the overall object in the captured image, it is often necessary to ensure that the included angle is greater than a specified angle. The specified angle may be set according to the actual situation, for example, the specified angle is 45 °. Between 45 ° and 90 °, a parameter range may be set at intervals of a preset angle (the preset angle may be set according to practical situations, for example, may be a smaller angle of 1 °, 2 °, 5 °, etc.), and each parameter range corresponds to a spatial transformation matrix. After acquiring the gesture parameters when shooting the first image, the server 10 can determine the included angle between the shooting direction of the camera and the vertical plane of the goods shelf for placing the goods based on the gesture parameters, and then determine the transformation matrix corresponding to the parameter range where the included angle is located as the target transformation matrix of the gesture parameters. Understandably, based on the correspondence, the target transformation matrix corresponding to the current posture parameter can be quickly and accurately determined.

As an alternative embodiment, before step S220, the method may further include: and inputting the attitude parameters and the first image into a trained camera attitude estimation network to obtain a target transformation matrix for transforming the first image into the second image.

In this embodiment, the camera pose estimation network may be selected according to the actual situation, and may be, but is not limited to, a convolutional neural network (Convolutional Neural Networks, CNN). The trained camera pose estimation network may have the ability to determine a target transformation matrix corresponding to a pose parameter based on the pose parameter. That is, the camera pose estimation network may derive the target transformation matrix from the pose parameters. Based on this, the object transformation matrix of the first image can be quickly determined by the camera pose estimation network.

In this embodiment, before inputting the pose parameters and the first image into the trained camera pose estimation network, the method may further include a step of training the camera pose estimation network. For example, the method further comprises:

The number of sets of distorted images and front views obtained by photographing with different attitude parameters can be determined according to practical situations, and is not particularly limited herein. In each set of distorted images, the front view, the distorted image is understood to be the image obtained by the camera capturing the items on the shelf in a non-direct direction (the camera capturing direction is not 90 ° from the vertical plane in which the items are located). A front view is understood to mean an image taken of an item on a shelf with the camera facing (the camera taking direction is 90 deg., or nearly 90 deg. from the vertical plane in which the item is located). In each set of distorted images, front views, the contained items are the same or there is a plot of the same items.

In the training process, for each group of distorted images and front views, the camera pose estimation network can use the front views as target images after the distorted images are transformed, and based on the target images, an initial transformation matrix used for transforming the distorted images into the front views can be obtained by combining pose parameter estimation. And then transforming the distorted image through an initial transformation matrix to obtain a transformed image. Then judging the similarity between the transformed image and the front view, if the similarity is smaller than a threshold (the threshold can be set according to practical conditions, for example, 99%), indicating that the transformation does not reach the standard, adjusting a camera pose estimation network according to the parallax (such as the visual angle difference on the depth of the image) between the transformed image and the front view, and then adjusting an initial transformation matrix by the adjusted camera pose estimation network based on the transformed image and the front view to obtain an adjusted transformation matrix; and then carrying out image transformation on the distorted image by utilizing the adjusted transformation matrix until the similarity between the transformed image and the front view is greater than or equal to a threshold value.

When the similarity between the transformed image and the front view is greater than or equal to the threshold, the transformed image is the same as or close to the front view, and the network convergence is also indicated. The method comprises the steps of learning and training a plurality of groups of distorted images and front views under different visual angles until a camera gesture estimation network converges, so that the distorted images subjected to spatial transformation are close to the corresponding front views, and the camera gesture estimation network at the moment is the trained camera gesture estimation network and has the capability of automatically determining a corresponding target transformation matrix according to gesture parameters and a first image.

Prior to step S230, the method may further comprise the step of training the deep learning model. For example, the method may further comprise: acquiring a training image set, wherein the training image set comprises a plurality of training images with an article region and a price tag region, and each training image is provided with a label for representing the article category in the article region and a label for representing the price tag region; and training the deep learning model through the training image set to obtain the trained deep learning model.

In this embodiment, the deep learning model may be determined according to the actual situation. For example, the deep learning model may be a CNN model, a recurrent neural network (Recurrent Neural Network, RNN) model, or the like. In training the deep learning model, a large number of training images may be provided. That is, the number of training images included in the training image set may be determined according to actual conditions, and is not particularly limited herein. In each training image, a corresponding label is typically provided for each item in the training image to represent the actual item category, item name, etc. of that item. In addition, the price tag position in the training image is also provided with a label which represents the price tag region of the position. After the labeling of the training image is completed, the labeled training image is input into a deep learning model, and the deep learning model can learn and train the training image to obtain the function of identifying the object type and the price tag region in the image.

After training, the method can further comprise a process of testing and calibrating the deep learning model so as to improve the accuracy and reliability of the recognition of the deep learning model. The test calibration procedure may be: and inputting the training image without the label into a deep learning model which is trained, judging whether the identification of the article type and the price tag region in the output training image is accurate, adjusting the output result when the output article type and the price tag region are inconsistent with the actual article type and the price tag region, and training the deep learning model again so as to enable the article type and the price tag region output by the adjusted deep learning model to be consistent with the actual article type and the price tag region. The deep learning model after the test is finished is a trained deep learning model.

As an alternative embodiment, the method may further comprise: and sending the price tag information of the article to a designated terminal. Price tag information includes, but is not limited to, item category, price, item name, etc. of the item.

In this embodiment, the designated terminal may be the user terminal 20 that transmits the first image, or other designated terminal devices, and the designated terminal is typically a terminal device held by a person who needs to view the category of the item and the price, and may be set according to the actual situation. After obtaining the price tag information of the article, the server 10 may directly transmit the price tag information to the designated terminal. Or collecting and sorting the price tag information, and sending the sorted price tag information to the appointed terminal.

For example, the first image acquired by the server 10 has corresponding identification information for characterizing the name of the supermarket/mall where the first image is located and the shooting period. The server 10 may analyze all the first images in the same time period (the time period may be set according to practical situations, for example, the same day) of the same supermarket/market, so as to obtain the item category and the corresponding price tag in each first image. The server 10 may use the detection and identification of all the first images in the same supermarket/market within the same period as one detection task, record the article category and price tag obtained in the detection task in a form file or text file, and then send the obtained form file or text file to the designated terminal. Based on this, the related person can check information such as the article type and the corresponding price of each article at each detection task through the designated terminal, and then perform analysis processing based on the obtained information. For example, for a supplier, the supply amount, wholesale price, and the like may be adjusted based on the item category, selling price, and the like of the corresponding item.

Referring to fig. 5, the embodiment of the present application further provides an article identification and management device 100, where the article identification and management device 100 includes at least one software function module that may be stored in a storage module 12 or cured in an Operating System (OS) of the server 10 in the form of software or Firmware (Firmware). The processing module 11 is configured to execute executable modules stored in the storage module 12, such as software functional modules and computer programs included in the article identification management device 100.

The article identification management apparatus 100 may include an acquisition unit 110, a conversion unit 120, an identification unit 130, a region determination unit 140, and a price tag processing unit 150.

An obtaining unit 110, configured to obtain a first image obtained by photographing an object on a shelf with a camera, and an attitude parameter of the camera when the first image is obtained.

The transforming unit 120 is configured to spatially transform the first image based on the gesture parameter and a target transformation matrix corresponding to the gesture parameter to obtain a second image, so that a perspective distortion degree of the second image is smaller than that of the first image.

And an identification unit 130, configured to identify the item category and the price tag region of the item in the second image through the trained deep learning model.

And a region determining unit 140, configured to determine, based on the position of the item in the second image and the position of the price tag region in the second image, that the price tag region corresponding to the item is a target price tag region.

And the price tag processing unit 150 is used for determining price tag information according to the target price tag region and establishing an association relationship between the price tag information and the article.

The price tag processing unit 150 may be configured to identify the target price tag region by using a character recognition algorithm, so as to obtain text information as the price tag information; and establishing an association relation between the price tag information and the article, wherein the price tag information comprises price information of the article.

Optionally, the article identification management device 100 further includes a matrix determining unit. Before the transformation unit 120 performs step S220, the matrix determining unit is configured to determine, based on a correspondence between an attitude parameter and a transformation matrix, the target transformation matrix corresponding to the attitude parameter by an attitude parameter of the camera when capturing the first image.

Optionally, the article identification management device 100 further includes an input unit. Before the transformation unit 120 performs step S220, the input unit is configured to input the pose parameters and the first image into a trained camera pose estimation network, so as to obtain a target transformation matrix for transforming the first image into the second image.

Optionally, the article identification management device 100 further includes a matrix determining unit, an adjusting unit, and a training determining unit. Before the input unit inputs the gesture parameters and the first image into a trained camera gesture estimation network, the matrix determining unit is used for determining a current transformation matrix of the distorted image through the camera gesture estimation network for each group of distorted images and front views shot by different gesture parameters; the transformation unit 120 is configured to transform each of the distorted images through the current transformation matrix, so as to obtain a transformed image; the adjusting unit is used for adjusting the camera gesture estimation network based on the parallax of the transformed image and the front view to obtain an adjusted current transformation matrix when the similarity of the transformed image and the front view is smaller than a threshold value, and transforming the distorted image through the adjusted current transformation matrix until the similarity of the transformed image and the front view is larger than or equal to the threshold value; and the training determining unit is used for determining to obtain a trained camera pose estimation network when the similarity between the transformed image and the front view is greater than or equal to the threshold value.

Optionally, the article identification management device 100 may further include a model training unit. Before the identification unit 130 performs step S230, the acquisition unit 110 may be further configured to acquire a training image set including a plurality of training images having an item region and a price tag region, each of the training images being provided with a tag characterizing an item category in the item region and a tag characterizing the price tag region; the model training unit is used for training the deep learning model through the training image set to obtain the trained deep learning model.

Optionally, the article identification management apparatus 100 may further include a transmitting unit configured to transmit price tag information of the article to a designated terminal, where the price tag information includes an article category and a price of the article.

It should be noted that, for convenience and brevity of description, specific working processes of the server 10 and the article identification management device 100 described above may refer to corresponding processes of each step in the foregoing method, and will not be described in detail herein.

In this embodiment, the processing module 11 may be an integrated circuit chip with signal processing capability. The processing module 11 may be a general purpose processor. For example, the processor may be a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a network processor (Network Processor, NP), or the like; the various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed with digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic, discrete hardware components.

The communication module 13 is used for establishing a communication connection between the server 10 and the user terminal 20 through a network, and transmitting and receiving data through the network.

The memory module 12 may be, but is not limited to, random access memory, read only memory, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, and the like. In this embodiment, the storage module 12 may be used to store the first image, the second image, the deep learning model, the camera pose estimation network, and the like. Of course, the storage module 12 may also be used to store a program, which is executed by the processing module 11 upon receiving an execution instruction.

Embodiments of the present application also provide a computer-readable storage medium. The readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to execute the article identification management method as described in the above embodiments.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented in hardware, or by means of software plus a necessary general hardware platform, and based on this understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disc, a mobile hard disk, etc.), and includes several instructions to cause a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods described in the respective implementation scenarios of the present application.

In summary, the present application provides a method, an apparatus, a server, and a readable storage medium for article identification management. The method comprises the following steps: acquiring a first image obtained by shooting an object on a goods shelf by a camera, and acquiring attitude parameters of the camera when the first image is shot; based on the gesture parameters and a target transformation matrix corresponding to the gesture parameters, performing spatial transformation on the first image to obtain a second image so that the perspective distortion degree of the second image is smaller than that of the first image; identifying the article category and price tag region of the article in the second image through the trained deep learning model; determining a price tag region corresponding to the object as a target price tag region based on the position of the object of the same object class in the second image and the position of the price tag region in the second image; and determining price tag information according to the target price tag region, and establishing an association relationship between the price tag information and the article. In the scheme, the photographed image is transformed and corrected, and then the transformed and corrected image is identified, so that the accuracy of identifying the article and the price tag is improved, the identification and the association of the article and the article price tag can be automatically realized, and the problem of low efficiency of counting the article and the corresponding article price tag is solved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, system, and method may be implemented in other manners as well. The above-described apparatus, systems, and method embodiments are merely illustrative, for example, flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. An article identification management method, the method comprising:

acquiring a first image obtained by shooting an object on a goods shelf by a camera, and acquiring attitude parameters of the camera when the first image is shot; the distortion degree between the article and the price tag in the first image is larger;

based on the gesture parameters and a target transformation matrix corresponding to the gesture parameters, performing spatial transformation on the first image to obtain a second image, so that the perspective distortion degree of the second image is smaller than that of the first image; wherein the article and the price tag in the second image are in a front view state;

Determining price tag information according to the target price tag region, and establishing an association relationship between the price tag information and the article;

wherein, before spatially transforming the first image to obtain a second image, the method further comprises: and inputting the attitude parameters and the first image into a trained camera attitude estimation network to obtain a target transformation matrix for transforming the first image into the second image.

2. The method of claim 1, wherein prior to inputting the pose parameters and the first image into a trained camera pose estimation network, the method further comprises:

3. The method of claim 1, wherein prior to identifying the item category and price tag region of the item in the second image by the trained deep learning model, the method further comprises:

4. The method of claim 1, wherein determining price tag information from the target price tag region and establishing an association of the price tag information with the item comprises:

5. The method according to claim 1, wherein the method further comprises:

6. An article identification management device, the device comprising:

the acquisition unit is used for acquiring a first image obtained by shooting an article on a goods shelf by a camera and attitude parameters of the camera when the first image is shot; the distortion degree between the article and the price tag in the first image is larger;

the transformation unit is used for carrying out space transformation on the first image based on the gesture parameters and the target transformation matrix corresponding to the gesture parameters to obtain a second image so that the perspective distortion degree of the second image is smaller than that of the first image; wherein the article and the price tag in the second image are in a front view state; before spatially transforming the first image to obtain a second image, the transforming unit is further configured to: inputting the attitude parameters and the first image into a trained camera attitude estimation network to obtain a target transformation matrix for transforming the first image into the second image;

7. A server comprising a memory, a processor coupled to each other, the memory storing a computer program, which when executed by the processor, causes the server to perform the method of any of claims 1-5.

8. A computer readable storage medium, characterized in that the computer program is stored in the readable storage medium, which, when run on a computer, causes the computer to perform the method according to any one of claims 1-5.