CN111311672A

CN111311672A - Method and device for detecting gravity center of object, electronic equipment and storage medium

Info

Publication number: CN111311672A
Application number: CN202010088293.9A
Authority: CN
Inventors: 吴华栋; 高鸣岐; 周韬; 成慧
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2020-02-12
Filing date: 2020-02-12
Publication date: 2020-06-19

Abstract

The present disclosure relates to a method and apparatus for detecting a center of gravity of an object, an electronic device, and a storage medium. The method comprises the following steps: acquiring a training image, a gravity center label graph corresponding to the training image and a mask corresponding to the training image, wherein the gravity center label graph is used for representing the real position of the gravity center of an object in the training image, and the pixel value of a pixel in the mask represents whether a corresponding pixel in the training image belongs to the object or not; inputting the training image into a neural network, and outputting a gravity center prediction graph corresponding to the training image through the neural network; and training the neural network according to the gravity center label graph, the gravity center prediction graph corresponding to the training image and the mask. The neural network obtained by training is adopted to detect the gravity center of the object, so that the probability of determining the position which does not belong to the object as the gravity center can be reduced, and the accuracy of the determined gravity center is improved.

Description

Method and device for detecting gravity center of object, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for training a neural network for detecting a center of gravity of an object, a method and an apparatus for detecting a center of gravity of an object, an electronic device, and a storage medium.

Background

With the development of computer software and hardware technology, artificial intelligence technology is becoming mature. As an important application of artificial intelligence to landing, robots have received wide attention. The robot can be applied to the fields of national defense and military, industrial production, logistics and the like. In the processes of logistics sorting, industrial production and the like, when a plurality of objects of various types are placed on a container or a table surface tightly or loosely, how to accurately detect the gravity center of each object is a problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a technical solution for detecting the center of gravity of an object.

According to an aspect of the present disclosure, there is provided a training method of a neural network for detecting a center of gravity of an object, including:

acquiring a training image, a gravity center label graph corresponding to the training image and a mask corresponding to the training image, wherein the gravity center label graph is used for representing the real position of the gravity center of an object in the training image, and the pixel value of a pixel in the mask represents whether a corresponding pixel in the training image belongs to the object or not;

inputting the training image into a neural network, and outputting a center-of-gravity prediction graph corresponding to the training image via the neural network, wherein the center-of-gravity prediction graph corresponding to the training image is used for representing the position of the center of gravity of the object in the training image predicted by the neural network;

and training the neural network according to the gravity center label graph, the gravity center prediction graph corresponding to the training image and the mask.

Because whether the corresponding pixel in the training image belongs to the object can be determined according to the pixel value of the pixel in the mask corresponding to the training image, the training of the neural network is carried out by combining the mask corresponding to the training image, so that the neural network can learn and distinguish the object region and the non-object region in the input image during the training, the neural network can focus more attention on the region of the closely-arranged object, and the neural network can learn the capability of processing the closely-arranged object. The neural network obtained by training can better distinguish the objects even in the application scene that the objects are closely arranged, thereby reducing the probability of determining the positions which do not belong to the objects as the gravity centers and improving the accuracy of the determined gravity centers.

In one possible implementation, any pixel in the gravity center label map includes three channels of pixel values, where a pixel value of a first channel of the pixels represents whether the pixel belongs to a true result of an object, a pixel value of a second channel of the pixels represents a true distance of the pixel from a gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a true distance of the pixel from the gravity center of the object in the training image on a second coordinate axis;

any pixel in the gravity center prediction graph corresponding to the training image comprises pixel values of three channels, wherein a pixel value of a first channel of the pixels represents a prediction result of whether the pixel belongs to an object, a pixel value of a second channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a second coordinate axis.

According to this embodiment, the probability of determining a position not belonging to an object as the center of gravity can be reduced, and the accuracy of the determined center of gravity can be improved.

In one possible implementation manner, the training the neural network according to the center-of-gravity label map, the center-of-gravity prediction map corresponding to the training image, and the mask includes:

obtaining a difference image according to the difference value of the pixel values of the corresponding pixels in the gravity center prediction image corresponding to the gravity center label image and the training image;

obtaining the value of the loss function of the neural network according to the product of the mask and the pixel value of the corresponding pixel in the difference value image;

training the neural network according to the values of the loss function.

In this implementation, a mask is introduced to weight when calculating the loss function, thereby enabling the neural network to focus more on the discrimination of closely spaced objects.

In a possible implementation manner, the acquiring a training image, a barycentric label map corresponding to the training image, and a mask corresponding to the training image includes:

obtaining a training image according to an image of a simulation scene, wherein the simulation scene comprises an object model and a background model;

and determining a gravity center label graph corresponding to the training image and a mask corresponding to the training image according to the parameters of the object model.

According to the implementation mode, the simulation data can be utilized to train the neural network, and the problem of gravity center detection of the object in the real scene is solved. The gravity center label graph and the mask corresponding to the training image are acquired by the simulation system, so that the labeling cost can be greatly reduced, and the cost of the whole system is reduced.

In a possible implementation manner, the obtaining a training image according to an image of a simulation scene includes:

and randomly adjusting the object model and/or the background model in the simulation scene to obtain a plurality of training images.

In this implementation, a large number of training images can be obtained by randomly adjusting the object model and/or the background model in the simulation scene. The neural network obtained based on the training can have higher accuracy and robustness.

According to an aspect of the present disclosure, there is provided a method of detecting a center of gravity of an object, including:

acquiring an image to be detected;

inputting the image to be detected into a neural network obtained by training the neural network for detecting the gravity center of the object by using a training method, and outputting a gravity center prediction image corresponding to the image to be detected through the neural network;

and determining the position information of the gravity center of the object in the image to be detected according to the gravity center prediction image corresponding to the image to be detected.

Through acquireing and waiting to detect the image, will wait to detect the image input neural network that the training method training that is used for detecting the neural network of object focus obtains, via neural network output wait to detect the focus prediction picture that the image corresponds, and according to wait to detect the focus prediction picture that the image corresponds, confirm wait to detect the position information of the object focus in the image, from this equipment such as robot or arm can be according to wait to detect the position information of the object focus in the image and snatch the object, thereby can improve the success rate of snatching the object.

In a possible implementation manner, the determining, according to the gravity center prediction map corresponding to the image to be detected, position information of a gravity center of an object in the image to be detected includes:

determining a gravity center voting graph corresponding to the image to be detected according to the gravity center prediction graph corresponding to the image to be detected, wherein the pixel value of any pixel in the gravity center voting graph represents the pixel number voted for the pixel in the gravity center prediction graph corresponding to the image to be detected;

and determining the position information of the gravity center of the object in the image to be detected according to the gravity center voting chart corresponding to the image to be detected.

By adopting the implementation mode, the voting can be carried out on the pixels in the image to be detected based on the pixel level, so that the accuracy of the determined gravity center of the object can be improved.

In a possible implementation manner, the determining, according to the gravity center prediction map corresponding to the image to be detected, a gravity center voting map corresponding to the image to be detected includes:

for any pixel in the gravity center prediction image corresponding to the image to be detected, if the pixel is determined to belong to an object according to the pixel value of the first channel of the pixel, determining a voting pixel corresponding to the pixel according to the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel;

and determining the gravity center voting chart corresponding to the image to be detected according to the voting pixels corresponding to the pixels in the gravity center prediction chart corresponding to the image to be detected.

In the implementation mode, the gravity center position of the object is voted based on the pixel level, so that the obtained gravity center voting graph can accurately determine the gravity center position of the object.

According to an aspect of the present disclosure, there is provided a training apparatus for a neural network for detecting a center of gravity of an object, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training image, a gravity center label graph corresponding to the training image and a mask corresponding to the training image, the gravity center label graph is used for representing the real position of the gravity center of an object in the training image, and the pixel value of a pixel in the mask represents whether a corresponding pixel in the training image belongs to the object or not;

a first prediction module, configured to input the training image into a neural network, and output a center-of-gravity prediction map corresponding to the training image via the neural network, where the center-of-gravity prediction map corresponding to the training image is used to represent a position of a center of gravity of an object in the training image predicted by the neural network;

and the training module is used for training the neural network according to the gravity center label graph, the gravity center prediction graph corresponding to the training image and the mask.

In one possible implementation, the training module is configured to:

training the neural network according to the values of the loss function.

In one possible implementation manner, the first obtaining module is configured to:

According to an aspect of the present disclosure, there is provided an apparatus for detecting a center of gravity of an object, including:

the second acquisition module is used for acquiring an image to be detected;

the second prediction module is used for inputting the image to be detected into a neural network obtained by training of the training device of the neural network for detecting the gravity center of the object, and outputting a gravity center prediction image corresponding to the image to be detected through the neural network;

and the determining module is used for determining the position information of the gravity center of the object in the image to be detected according to the gravity center prediction image corresponding to the image to be detected.

In one possible implementation, the determining module is configured to:

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the present disclosure, a training image, a center-of-gravity label map corresponding to the training image, and a mask corresponding to the training image are acquired, the training image is input to a neural network, a center-of-gravity prediction map corresponding to the training image is output via the neural network, the neural network is trained according to the center-of-gravity label map, the center-of-gravity prediction map corresponding to the training image, and the mask, and the neural network obtained by training is used to detect the center of gravity of an object, so that the probability of determining a position not belonging to the object as the center of gravity can be reduced, and the accuracy of the determined center of gravity can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a training method of a neural network for detecting the center of gravity of an object provided by an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of a neural network in a training method of the neural network for detecting the center of gravity of an object provided by an embodiment of the present disclosure.

Fig. 3 shows a flowchart of a method for detecting the center of gravity of an object according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of a training apparatus of a neural network for detecting the center of gravity of an object according to an embodiment of the present disclosure.

Fig. 5 is a block diagram illustrating an apparatus for detecting the center of gravity of an object according to an embodiment of the present disclosure.

Fig. 6 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure.

Fig. 7 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a training method of a neural network for detecting the center of gravity of an object provided by an embodiment of the present disclosure. The execution subject of the training method of the neural network for detecting the center of gravity of the object may be a training device of the neural network for detecting the center of gravity of the object. For example, the training method of the neural network for detecting the center of gravity of the object may be performed by a terminal device or a server or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable device. In some possible implementations, the training method for a neural network for detecting the center of gravity of an object may be implemented by a processor calling computer-readable instructions stored in a memory. As shown in fig. 1, the training method of the neural network for detecting the center of gravity of an object includes steps S11 to S13.

In step S11, a training image, a gravity center label map corresponding to the training image, and a mask corresponding to the training image are obtained, where the gravity center label map is used to represent the true position of the gravity center of the object in the training image, and the pixel value of the pixel in the mask represents whether the corresponding pixel in the training image belongs to the object.

In the embodiment of the present disclosure, the center of gravity of the figure of the object may be taken as the center of gravity of the object, that is, only the shape of the object may be considered without considering the density of the object.

In the embodiment of the present disclosure, the sizes of the training image, the gravity center label map corresponding to the training image, and the mask corresponding to the training image may be the same.

In a possible implementation manner, the training image is a two-dimensional image, so that on the premise that a three-dimensional model of the object is not required to be obtained in advance, a more accurate center of gravity of the object can be obtained by using the two-dimensional training image, and thus, the hardware cost can be reduced, and the calculation overhead can be reduced.

In one possible implementation, the training image is an RGB (Red, Red; Green, Green; Blue, Blue) image.

In the embodiments of the present disclosure, the pixel values of the pixels belonging to the object in the mask are different from the pixel values of the pixels not belonging to the object. For example, the pixel value of a pixel belonging to an object is 1, and the pixel value of a pixel not belonging to an object is 0.1. In one possible implementation, the pixel values of the pixels belonging to the object in the mask are the same, and the pixel values of the pixels not belonging to the object are the same. Wherein, the pixel belongs to an object, and the pixel is a pixel of a certain object; a pixel does not belong to an object, meaning that the pixel is not a pixel of any object.

In a possible implementation manner, the acquiring a training image, a barycentric label map corresponding to the training image, and a mask corresponding to the training image includes: obtaining a training image according to an image of a simulation scene, wherein the simulation scene comprises an object model and a background model; and determining a gravity center label graph corresponding to the training image and a mask corresponding to the training image according to the parameters of the object model. The background model may include one or more models of the ground, a table, a box, a shelf, a table top, ambient lighting, and the like.

In this implementation, before the training image is obtained according to the image of the simulation scene, the simulation scene may be constructed first. For example, a simulation scenario may be built that is similar to a real scenario.

In this implementation, the parameters of the object model may include one or more of a type parameter, a shape parameter, a size parameter, a position parameter, and the like of the object model. Wherein the position parameter of the object model may represent a position of the object model in the simulated scene. According to the shape parameters and the size parameters of the object model, the position of the gravity center of the object model in the object model can be determined; according to the position of the gravity center of the object model in the object model and the position parameters of the object model (namely the position of the object model in the simulation scene), the position of the gravity center of the object model in the training image can be determined, and thus the gravity center label map corresponding to the training image can be obtained. According to the parameters of the object model, which pixels in the training image have the object can be determined, so that the mask corresponding to the training image can be obtained.

As an example of this implementation, multiple kinds of object models may be included in the simulation scenario, enabling the neural network to learn the ability to handle different kinds of objects.

As an example of this implementation, the obtaining a training image according to an image of a simulation scene includes: and randomly adjusting the object model and/or the background model in the simulation scene to obtain a plurality of training images. In this implementation, a Domain Randomization (Domain Randomization) method may be employed to randomly adjust the object model and/or the background model in the simulation scene. For example, the color and texture of the ground in the simulation scene, the color and texture of the table model, the color and texture of the box model, the direction and intensity of the ambient lighting, the position and angle at which the object model is placed, the color and texture of the object model, the size and shape of the object model, the type, number, and placement of the object model, and the like can be randomly adjusted. After the random adjustment of the object model and/or the background model in the simulation scene, an image of the current simulation scene may be saved as a training image, for example, an RGB image of the current simulation scene may be saved as a training image. In this example, a large number of training images can be obtained by randomly adapting the object model and/or the background model in the simulation scene. The neural network obtained based on the training can have higher accuracy and robustness.

In another possible implementation manner, a barycentric label map corresponding to a training image and a mask corresponding to the training image may be obtained in a manual labeling manner.

In step S12, the training image is input to a neural network, and a centroid prediction map corresponding to the training image is output via the neural network, where the centroid prediction map corresponding to the training image is used to indicate the position of the centroid of the object in the training image predicted by the neural network.

For example, the training image is I, the neural network is f_θThen, the center-of-gravity prediction graph corresponding to the training image can be represented as Y ═ f_θ(I)。

In the embodiment of the present disclosure, the centroid prediction map corresponding to the training image may be the same as the training image in size. That is, the size of the centroid prediction map output by the neural network may be the same as the size of the input image of the neural network.

In a possible implementation manner, a pixel value of any pixel in the center-of-gravity prediction map corresponding to the training image may indicate whether the pixel belongs to the center of gravity of the object, so that the center of gravity of the object can be predicted at a pixel level, and the accuracy of the determined center of gravity of the object can be improved.

Fig. 2 shows a schematic diagram of a neural network in a training method of the neural network for detecting the center of gravity of an object provided by an embodiment of the present disclosure. In the training process of the neural network, the input image can be a training image; in the actual use process of the neural network, the input image may be an image to be detected. The neural network may be a full convolutional neural network comprising two parts, an encoder and a decoder, and may have a plurality of hopping connections between convolutional layers of the encoder and convolutional layers of the decoder. The encoder may encode the input image, compressing the input image into a smaller feature map that may implicitly represent important information in the input image, such as location information of the center of gravity of the object. The decoder can generate a pixel-level center-of-gravity prediction map step by upsampling according to the feature map so as to clearly and accurately represent the position information of the center of gravity of the object. The encoder may be composed of a plurality of convolutional layers with a convolutional kernel size of 3 × 3, and each convolutional layer may be followed by Batch Normalization (Batch Normalization). The decoder may consist of a number of deconvolution layers with a convolution kernel of 3 x 3, each of which may be followed by a batch normalization process. The signature obtained for each convolutional layer of the encoder can be connected to the deconvolution layer of the decoder as part of the input to the deconvolution layer, and the neural network can better perceive the position information of the object's center of gravity through these larger, shallow layer signatures. Each layer of the neural network before the last deconvolution layer may use a ReLU (Rectified Linear Unit) function as an activation function, and the last deconvolution layer may use a Tanh function as an activation function. When the neural network is trained, tested and used, the input image can be normalized firstly.

In one possible implementation, any pixel in the gravity center label map includes three channels of pixel values, where a pixel value of a first channel of the pixels represents whether the pixel belongs to a true result of an object, a pixel value of a second channel of the pixels represents a true distance of the pixel from a gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a true distance of the pixel from the gravity center of the object in the training image on a second coordinate axis; any pixel in the gravity center prediction graph corresponding to the training image comprises pixel values of three channels, wherein a pixel value of a first channel of the pixels represents a prediction result of whether the pixel belongs to an object, a pixel value of a second channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a second coordinate axis. For example, the first coordinate axis is an x-axis and the second coordinate axis is a y-axis. According to this embodiment, the probability of determining a position not belonging to an object as the center of gravity can be reduced, and the accuracy of the determined center of gravity can be improved.

For example, a center of gravity label map may be written as

The centroid prediction map corresponding to the training image may be denoted as Y, and the pixel values in the ith row and the jth column in the centroid label map may be denoted as Y

Training image correspondencesThe pixel values of the ith row and the jth column in the gravity center prediction map can be written as Y_ijkWhere 1 ≦ i ≦ H, 1 ≦ j ≦ W, 1 ≦ k ≦ C, H denotes the height of the training image (the height of the gravity center label map corresponding to the training image is equal to the height of the training image), W denotes the width of the training image (the width of the gravity center label map corresponding to the training image is equal to the width of the training image), and C denotes the number of channels of any pixel in the gravity center label map and the gravity center prediction map corresponding to the training image, for example, C ≦ 3.

In this implementation, for any pixel in the gravity center label map, if it is determined that the pixel belongs to an object according to a pixel value of a first channel of the pixel, a pixel value of a second channel of the pixel may represent a true distance between the pixel and a gravity center of an object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixel may represent a true distance between the pixel and a gravity center of an object in the training image on a second coordinate axis; if it is determined that the pixel does not belong to the object according to the pixel value of the first channel of the pixel, the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel may be a second preset value, for example, 0. For any pixel in the gravity center prediction graph corresponding to the training image, if it is determined that the pixel belongs to an object according to a pixel value of a first channel of the pixel, a pixel value of a second channel of the pixel may represent a predicted distance between the pixel and a gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixel may represent a predicted distance between the pixel and a gravity center of the object in the training image on a second coordinate axis; if it is determined that the pixel does not belong to the object according to the pixel value of the first channel of the pixel, the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel may be a second preset value, for example, 0. For example, if the pixel in the ith row and the jth column in the centroid prediction map corresponding to the training image belongs to the object, the pixel value Y of the first channel (k is 1) of the pixel is determined as the pixel value Y_ij11 is ═ 1; if the pixel does not belong to an object, Y_ij10. If Y is_ij1When the value is 0, Y_ij2＝Y_ij30. When Y is_ij1When 1, the pixel value Y of the second channel (k 2) of the pixel_ij2The predicted distance of the pixel from the center of gravity of the object in the training image on the x-axis can be represented, and the pixel value Y of the third channel (k 3) of the pixel can be represented_ij3The predicted distance of the pixel from the center of gravity of the object in the training image on the Y-axis, i.e., Y, can be represented_ij2＝x_ij-x_R，Y_ij3＝y_ij-y_RWherein x is_ijA coordinate value, x, representing the pixel on the x-axis_RCoordinate values, y, representing the center of gravity of the object in the training image on the x-axis_ijA coordinate value, y, representing the pixel on the y-axis_RAnd coordinate values on the y-axis representing the center of gravity of the object in the training image.

It should be noted that, although the center-of-gravity prediction map and the center-of-gravity label map corresponding to the training image are described as above in the above implementation manner, those skilled in the art will understand that the present disclosure is not limited thereto, and those skilled in the art may flexibly set the center-of-gravity prediction map and the center-of-gravity label map corresponding to the training image according to the actual application scene requirements, as long as the center-of-gravity prediction map corresponding to the training image can represent the position of the center of gravity of the object in the training image predicted by the neural network, and the center-of-gravity label map can represent the real position of the center of gravity of the object in the training image. For example, in another possible implementation manner, any pixel in the center-of-gravity prediction map and the center-of-gravity label map corresponding to the training image may include only a pixel value of one channel, where the pixel value of the pixel represents a probability that the pixel belongs to the center of gravity of the object.

In step S13, the neural network is trained based on the centroid label map, the centroid prediction map corresponding to the training image, and the mask.

In the embodiment of the disclosure, since whether the corresponding pixel in the training image belongs to the object can be determined according to the pixel value of the pixel in the mask corresponding to the training image, the training of the neural network is performed by combining the mask corresponding to the training image, so that the neural network can learn to distinguish the object region and the non-object region in the input image during the training, thereby the neural network can pay more attention to the region of the closely-spaced objects, and the neural network can learn the capability of processing the closely-spaced objects. The neural network obtained by training can better distinguish the objects even in the application scene that the objects are closely arranged, thereby reducing the probability of determining the positions which do not belong to the objects as the gravity centers and improving the accuracy of the determined gravity centers.

In one possible implementation manner, the training the neural network according to the center-of-gravity label map, the center-of-gravity prediction map corresponding to the training image, and the mask includes: obtaining a difference image according to the difference value of the pixel values of the corresponding pixels in the gravity center prediction image corresponding to the gravity center label image and the training image; obtaining the value of the loss function of the neural network according to the product of the mask and the pixel value of the corresponding pixel in the difference value image; training the neural network according to the values of the loss function.

As an example of this implementation, the square of the difference value of the pixel values of the corresponding pixels in the gravity center prediction map corresponding to the training image and the gravity center label map may be determined as the pixel value of the corresponding pixel of the difference map. In the case where any one pixel includes pixel values of three channels, the difference values of the pixel values of the respective channels of the respective pixels may be calculated, respectively.

As another example of this implementation, an absolute value of a difference value of pixel values of corresponding pixels in the gravity center prediction map corresponding to the training image and the gravity center label map may be determined as a pixel value of a corresponding pixel of the difference map.

As an example of this implementation, the product of the mask and the pixel value of the corresponding pixel in the difference map may be determined and the sum of the products of all pixels may be used as the value of the loss function of the neural network.

In other examples, the product of the sum and the first preset value may also be used as a value of a loss function of the neural network. For example, the first preset value is 0.8, 1.1, or 1.2, etc.

For example, the loss function L can be calculated using equation 1:

wherein M represents a mask, M_ij1＝M_ij2＝M_ij3For example, if the pixel of the ith row and the jth column belongs to the object, M is_ij1＝M_ij2＝M_ij3When the pixel in the ith row and the jth column does not belong to the object, M is 1_ij1＝M_ij2＝M_ij30.1. For mask M, M may also be used_ijTo indicate.

In one possible implementation, a random gradient descent method may be used to train the neural network, the batch size may be 64, and all parameters of the neural network may be initialized randomly.

Fig. 3 shows a flowchart of a method for detecting the center of gravity of an object according to an embodiment of the present disclosure. The subject of the method of detecting the center of gravity of an object may be an apparatus for detecting the center of gravity of an object. For example, the method for detecting the center of gravity of an object may be performed by a terminal device or a server or other processing device. Among other things, the terminal device may be a robot (e.g., a sorting robot), a robot arm, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the method of detecting the center of gravity of an object may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 3, the method of detecting the center of gravity of an object includes steps S31 through S33.

In step S31, an image to be detected is acquired.

In a possible implementation manner, the image to be detected is a two-dimensional image. According to the implementation mode, on the premise that a three-dimensional model of the object does not need to be obtained in advance, the accurate gravity center of the object can be obtained by utilizing the two-dimensional image to be detected, so that the hardware cost can be reduced, and the calculation overhead is reduced.

In a possible implementation manner, the image to be detected is an RGB image.

In step S32, the image to be detected is input to the neural network obtained by training the neural network for detecting the center of gravity of the object, and the center of gravity prediction map corresponding to the image to be detected is output via the neural network.

In the embodiment of the disclosure, the gravity center prediction image corresponding to the image to be detected and the image to be detected may have the same size.

In a possible implementation manner, any pixel in the gravity center prediction graph corresponding to the image to be detected includes pixel values of three channels, where a pixel value of a first channel of the pixel represents a prediction result of whether the pixel belongs to an object, a pixel value of a second channel of the pixel represents a prediction distance between the pixel and a gravity center of the object in the image to be detected on a first coordinate axis, and a pixel value of a third channel of the pixel represents a prediction distance between the pixel and a gravity center of the object in the image to be detected on a second coordinate axis.

For example, the centroid prediction map corresponding to the image to be detected can be designated as Y', and the centroid prediction map corresponding to the image to be detected can be designated as Y

The pixel value of the ith row and the jth column of' can be recorded as Y_ijkWhere 1 ≦ i ≦ H, 1 ≦ j ≦ W, 1 ≦ k ≦ C, H denotes the height of the to-be-detected image (the height of the gravity center prediction map corresponding to the to-be-detected image is equal to the height of the to-be-detected image), W denotes the width of the to-be-detected image (the width of the gravity center prediction map corresponding to the to-be-detected image is equal to the width of the to-be-detected image), and C denotes the number of channels of any pixel in the gravity center prediction map corresponding to the to-be-detected image, for example, C ═.

In this implementation, the weight corresponding to the image to be detectedIf it is determined that the pixel belongs to an object according to a pixel value of a first channel of the pixel, a pixel value of a second channel of the pixel may represent a predicted distance between the pixel predicted by the neural network and a center of gravity of the object in the image to be detected on a first coordinate axis, and a pixel value of a third channel of the pixel may represent a predicted distance between the pixel predicted by the neural network and a center of gravity of the object in the image to be detected on a second coordinate axis; if it is determined that the pixel does not belong to the object according to the pixel value of the first channel of the pixel, the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel may be a fifth preset value, for example, 0. For example, if the pixel in the ith row and the jth column in the centroid prediction map corresponding to the to-be-detected image belongs to the object, the pixel value Y of the first channel (k is 1) of the pixel is determined as the pixel value Y_ij1' -1; if the pixel does not belong to an object, Y_ij1' -0. If Y is_ij1' is 0, then Y_ij2′＝Y_ij3' -0. When Y is_ij1When' 1, the pixel value Y of the second channel (k 2) of the pixel_ij2' may represent a predicted distance on the x-axis between the pixel predicted by the neural network and the gravity center of the object in the image to be detected, and a pixel value Y of a third channel (k 3) of the pixel_ij3' may represent a predicted distance on the y-axis between the pixel predicted by the neural network and the center of gravity of the object in the image to be detected.

In step S33, the position information of the center of gravity of the object in the image to be detected is determined according to the center of gravity prediction map corresponding to the image to be detected.

This disclosed embodiment is through acquireing and waiting to detect the image, will wait to detect the image input neural network that the training method training that is used for detecting the neural network of object focus obtains, via neural network output wait to detect the focus prediction picture that the image corresponds, and according to wait to detect the focus prediction picture that the image corresponds, confirm wait to detect the position information of the object focus in the image, from this equipment such as robot or arm can be according to wait to detect the position information of the object focus in the image and snatch the object to can improve the success rate of snatching the object. In the embodiment of the disclosure, the neural network predicts the gravity center of an object in an input image based on the pixel level, and has high robustness and accuracy and strong interpretability.

In a possible implementation manner, the determining, according to the gravity center prediction map corresponding to the image to be detected, position information of a gravity center of an object in the image to be detected includes: determining a gravity center voting graph corresponding to the image to be detected according to the gravity center prediction graph corresponding to the image to be detected, wherein the pixel value of any pixel in the gravity center voting graph represents the pixel number voted for the pixel in the gravity center prediction graph corresponding to the image to be detected; and determining the position information of the gravity center of the object in the image to be detected according to the gravity center voting chart corresponding to the image to be detected. Any pixel in the gravity center voting graph can only comprise a pixel value of one channel, and the pixel value of the channel represents the number of pixels voted for the pixel in the gravity center predicting graph corresponding to the image to be detected. For example, if the pixel value of the pixel a in the gravity center voting map is 5, it can be shown that 5 pixels in the gravity center prediction map corresponding to the image to be detected vote for the pixel a, that is, 5 pixels predict the pixel a as the gravity center of the object. By adopting the implementation mode, the voting can be carried out on the pixels in the image to be detected based on the pixel level, so that the accuracy of the determined gravity center of the object can be improved.

As an example of this implementation, the determining, according to the gravity center prediction map corresponding to the image to be detected, a gravity center voting map corresponding to the image to be detected includes: for any pixel in the gravity center prediction image corresponding to the image to be detected, if the pixel is determined to belong to an object according to the pixel value of the first channel of the pixel, determining a voting pixel corresponding to the pixel according to the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel; and determining the gravity center voting chart corresponding to the image to be detected according to the voting pixels corresponding to the pixels in the gravity center prediction chart corresponding to the image to be detected. For example, for a pixel B in a gravity center prediction image corresponding to the image to be detected, the pixel B is on a first coordinate axisThe coordinate value of (A) is x_BThe coordinate value of the pixel B on the second coordinate axis is y_BIf the value of the first channel of the pixel B is 1, the pixel value of the second channel of the pixel B is B, the pixel value of the third channel of the pixel B is g, and the voting pixel of the pixel B is denoted as the pixel C (i.e., the pixel B votes for the pixel C), it can be determined that the coordinate value of the pixel C on the first coordinate axis is x_BB, the coordinate value of the pixel C on the second coordinate axis is y_B-g. In this example, the pixel value of any pixel in the barycentric voting map may be equal to the number of pixels voted for that pixel, e.g., if pixel B votes for pixel C, then the pixel value of pixel C is incremented by 1. The example votes for the gravity center position of the object based on the pixel level, and the gravity center voting graph obtained by the voting can accurately determine the gravity center position of the object.

In this implementation manner, the pixel with the largest pixel value in the gravity center voting chart may be determined as the gravity center of the object in the image to be detected, and for example, the pixel with the pixel value greater than or equal to the third preset value in the gravity center voting chart may be determined as the gravity center of the object in the image to be detected. In this implementation, the greater the pixel value of a pixel in the gravity center voting map, the greater the probability that the pixel is the gravity center of the object.

In another possible implementation manner, the determining, according to the gravity center prediction map corresponding to the image to be detected, position information of a gravity center of an object in the image to be detected includes: and determining the pixel with the pixel value of the first channel being 1 and the minimum sum of the square of the pixel value of the second channel and the square of the pixel value of the third channel in the gravity center prediction image as the gravity center of the object in the image to be detected.

In another possible implementation manner, the determining, according to the gravity center prediction map corresponding to the image to be detected, position information of a gravity center of an object in the image to be detected includes: and determining the pixel with the pixel value of the first channel being 1 and the sum of the square of the pixel value of the second channel and the square of the pixel value of the third channel being less than or equal to a fourth preset value as the center of gravity of the object in the image to be detected.

The method for detecting the gravity center of the object provided by the embodiment of the disclosure can be applied to various scenes. For example, in a logistics sorting scene, an object may be an express package, and according to the position information of the gravity center of the object detected by the embodiment of the disclosure, a robot or a mechanical arm and the like can accurately grab the express package; in an industrial assembly scene, an object can be a component, according to the position information of the gravity center of the object detected by the embodiment of the disclosure, a robot or a mechanical arm and the like can accurately grab the component and place the component on another component; in a garbage classification scene, an object can be garbage, according to the position information of the gravity center of the object detected by the embodiment of the disclosure, a robot or a mechanical arm and the like can accurately grab the garbage and place the garbage into a corresponding classification box; in an unmanned vending scene, an object can be a cargo, and according to the position information of the gravity center of the object detected by the embodiment of the disclosure, a robot or a mechanical arm and the like can accurately catch up the specified cargo and deliver the cargo to a customer; in the goods discernment scene, the object can be the goods, according to the position information of the object focus that this disclosed embodiment detected, robot or arm etc. can accurately be grabbed the goods and scan the two-dimensional code.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the present disclosure also provides a training apparatus for a neural network used for detecting a center of gravity of an object, an apparatus for detecting a center of gravity of an object, an electronic device, a computer-readable storage medium, and a program, which may be used to implement the training method for a neural network used for detecting a center of gravity of an object or the apparatus and method for detecting a center of gravity of an object provided by the present disclosure, and corresponding technical solutions and descriptions and corresponding descriptions of the method portions are not repeated.

Fig. 4 shows a block diagram of a training apparatus of a neural network for detecting the center of gravity of an object according to an embodiment of the present disclosure. As shown in fig. 4, the training apparatus for a neural network for detecting the center of gravity of an object includes: a first obtaining module 41, configured to obtain a training image, a gravity center label map corresponding to the training image, and a mask corresponding to the training image, where the gravity center label map is used to represent a true position of a gravity center of an object in the training image, and a pixel value of a pixel in the mask represents whether a corresponding pixel in the training image belongs to the object; a first prediction module 42, configured to input the training image into a neural network, and output a center-of-gravity prediction map corresponding to the training image via the neural network, where the center-of-gravity prediction map corresponding to the training image is used to represent a position of a center of gravity of an object in the training image predicted by the neural network; and a training module 43, configured to train the neural network according to the gravity center label graph, the gravity center prediction graph corresponding to the training image, and the mask.

In one possible implementation, any pixel in the gravity center label map includes three channels of pixel values, where a pixel value of a first channel of the pixels represents whether the pixel belongs to a true result of an object, a pixel value of a second channel of the pixels represents a true distance of the pixel from a gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a true distance of the pixel from the gravity center of the object in the training image on a second coordinate axis; any pixel in the gravity center prediction graph corresponding to the training image comprises pixel values of three channels, wherein a pixel value of a first channel of the pixels represents a prediction result of whether the pixel belongs to an object, a pixel value of a second channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a prediction distance between the pixel and the gravity center of the object in the training image on a second coordinate axis.

In one possible implementation, the training module 43 is configured to: obtaining a difference image according to the difference value of the pixel values of the corresponding pixels in the gravity center prediction image corresponding to the gravity center label image and the training image; obtaining the value of the loss function of the neural network according to the product of the mask and the pixel value of the corresponding pixel in the difference value image; training the neural network according to the values of the loss function.

In a possible implementation manner, the first obtaining module 41 is configured to: obtaining a training image according to an image of a simulation scene, wherein the simulation scene comprises an object model and a background model; and determining a gravity center label graph corresponding to the training image and a mask corresponding to the training image according to the parameters of the object model.

In a possible implementation manner, the first obtaining module 41 is configured to: and randomly adjusting the object model and/or the background model in the simulation scene to obtain a plurality of training images.

Fig. 5 is a block diagram illustrating an apparatus for detecting the center of gravity of an object according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus for detecting the center of gravity of an object includes: a second obtaining module 51, configured to obtain an image to be detected; the second prediction module 52 is configured to input the image to be detected into the neural network obtained by training the training device of the neural network for detecting the center of gravity of the object, and output a center-of-gravity prediction map corresponding to the image to be detected through the neural network; and the determining module 53 is configured to determine position information of the center of gravity of the object in the image to be detected according to the center of gravity prediction map corresponding to the image to be detected.

In a possible implementation manner, the determining module 53 is configured to: determining a gravity center voting graph corresponding to the image to be detected according to the gravity center prediction graph corresponding to the image to be detected, wherein the pixel value of any pixel in the gravity center voting graph represents the pixel number voted for the pixel in the gravity center prediction graph corresponding to the image to be detected; and determining the position information of the gravity center of the object in the image to be detected according to the gravity center voting chart corresponding to the image to be detected.

In a possible implementation manner, the determining module 53 is configured to: for any pixel in the gravity center prediction image corresponding to the image to be detected, if the pixel is determined to belong to an object according to the pixel value of the first channel of the pixel, determining a voting pixel corresponding to the pixel according to the pixel value of the second channel of the pixel and the pixel value of the third channel of the pixel; and determining the gravity center voting chart corresponding to the image to be detected according to the voting pixels corresponding to the pixels in the gravity center prediction chart corresponding to the image to be detected.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.

The disclosed embodiments also provide a computer program product comprising computer readable code which, when run on a device, a processor in the device executes instructions for implementing a method of training a neural network for detecting the center of gravity of an object or a method of detecting the center of gravity of an object as provided in any of the above embodiments.

Embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the method for training a neural network for detecting the center of gravity of an object or the method for detecting the center of gravity of an object provided in any of the above embodiments.

An embodiment of the present disclosure further provides an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the above-described method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 provided by an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G/LTE, 5G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 shows a block diagram of an electronic device 1900 provided by an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may further include a power component 1926 configured to perform power management of the electronic device 1900, and a wired or wireless network interface 1950 configured to configure the electronic deviceDevice 1900 is connected to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows, stored in memory 1932

Mac OS

Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of training a neural network for detecting the center of gravity of an object, comprising:

2. The method of claim 1, wherein any pixel in the center of gravity label map comprises three channels of pixel values, wherein a pixel value of a first channel of the pixels represents a true result of whether the pixel belongs to an object, a pixel value of a second channel of the pixels represents a true distance of the pixel from a center of gravity of an object in the training image on a first coordinate axis, and a pixel value of a third channel of the pixels represents a true distance of the pixel from the center of gravity of an object in the training image on a second coordinate axis;

3. The method according to claim 1 or 2, wherein the training the neural network according to the gravity center label map, the gravity center prediction map corresponding to the training image, and the mask comprises:

training the neural network according to the values of the loss function.

4. The method of any one of claims 1 to 3, wherein the obtaining of the training image, the barycentric label map corresponding to the training image, and the mask corresponding to the training image comprises:

5. The method of claim 4, wherein deriving a training image from the image of the simulated scene comprises:

6. A method of detecting the center of gravity of an object, comprising:

acquiring an image to be detected;

inputting the image to be detected into a neural network obtained by training according to the method of any one of claims 1 to 5, and outputting a gravity center prediction graph corresponding to the image to be detected through the neural network;

7. The method according to claim 6, wherein the determining the position information of the gravity center of the object in the image to be detected according to the gravity center prediction map corresponding to the image to be detected comprises:

8. The method according to claim 7, wherein the determining the gravity center voting chart corresponding to the image to be detected according to the gravity center prediction chart corresponding to the image to be detected comprises:

9. An exercise apparatus for a neural network for detecting the center of gravity of an object, comprising:

10. An apparatus for detecting the center of gravity of an object, comprising:

the second acquisition module is used for acquiring an image to be detected;

a second prediction module, configured to input the image to be detected into a neural network trained by the apparatus according to claim 9, and output a centroid prediction map corresponding to the image to be detected via the neural network;

11. An electronic device, comprising:

one or more processors;

a memory for storing executable instructions;

wherein the one or more processors are configured to invoke the memory-stored executable instructions to perform the method of any one of claims 1 to 8.

12. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 8.