CN112101342A

CN112101342A - Box key point detection method and device, computing equipment and computer readable storage medium

Info

Publication number: CN112101342A
Application number: CN201910521575.0A
Authority: CN
Inventors: 杨小平
Original assignee: SF Technology Co Ltd; Shenzhen SF Taisen Holding Group Co Ltd
Current assignee: SF Technology Co Ltd; Shenzhen SF Taisen Holding Group Co Ltd; SF Tech Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2020-12-18

Abstract

The embodiment of the invention discloses a box body key point detection method and device, computing equipment and a computer readable storage medium. The method comprises the following steps: acquiring first image data, wherein the first image data comprises data of an image of a first box body; detecting the first image data by using a first detection network model to output first detection result data, wherein the first detection result data comprises first coordinate data of a first key point of the first box body; acquiring second image data with a preset size corresponding to the first key point from the first image data; and detecting the second image data by using a second detection network model to output second detection result data, wherein the second detection result data at least comprises second coordinate data of the first key point. The embodiment of the invention can accurately and intelligently detect the key points of the box body.

Description

Box key point detection method and device, computing equipment and computer readable storage medium

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of computers, in particular to a box body key point detection method and device, a computing device and a computer readable storage medium.

[ background of the invention ]

The logistics industry has a need for accurate measurement of cargo volume.

Currently, the cargo volume is generally measured by manual measurement. This approach has a low accuracy of the measured results and is inefficient.

Currently, there is no technology in the art for accurate intelligent detection of the vertices of the box.

Therefore, a new technical solution is needed to solve the above technical problems.

[ summary of the invention ]

The invention aims to provide a box body key point detection method and device, computing equipment and a computer readable storage medium, which can accurately and intelligently detect key points of a box body.

In order to solve the above problems, the technical solution of the embodiment of the present invention is as follows:

in a first aspect, a box body key point detection method is provided, and the method includes the following steps: acquiring first image data, wherein the first image data comprises data of an image of a first box body; detecting the first image data by using a first detection network model to output first detection result data, wherein the first detection result data comprises first coordinate data of a first key point of the first box body; acquiring second image data with a preset size corresponding to the first key point from the first image data; and detecting the second image data by using a second detection network model to output second detection result data, wherein the second detection result data at least comprises second coordinate data of the first key point.

In a second aspect, a box key point detection device is provided, the device comprising: the first acquisition module is used for acquiring first image data, wherein the first image data comprises data of an image of a first box body; the detection module is used for detecting the first image data by using a first detection network model so as to output first detection result data, wherein the first detection result data comprises first coordinate data of a first key point of the first box body; the second acquisition module is used for acquiring second image data with a preset size corresponding to the first key point from the first image data; the detection module is further configured to detect the second image data by using a second detection network model to output second detection result data, where the second detection result data at least includes second coordinate data of the first keypoint.

In a third aspect, a computing device is provided, where the computing device includes a processor and a memory, where the memory is used to store program instructions, and when the computing device runs, the processor is used to execute the program instructions in the memory to execute the box keypoint detection method according to the first aspect.

In a fourth aspect, there is provided a computer-readable storage medium storing program instructions for causing a computer to execute the box keypoint detection method of the first aspect.

Compared with the prior art, the embodiment of the invention adopts the mode of a cascade network model to carry out fine screening on the first key point of the first box body, that is, the first image data of the first box is detected by using the first detection network model based on the gesture detection technology to output first coordinate data of a first key point of the first box, then, a second detection network model based on the face detection technology is used for detecting second image data comprising the first key points so as to output second coordinate data of the first key points of the first box body, therefore, a more accurate position of a key point of the box body can be obtained, and practice proves that the technical scheme of the embodiment of the invention can control the position offset (coordinate error) of the first key point obtained by detection within 3 pixels.

In order to make the aforementioned and other objects of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below:

[ description of the drawings ]

Fig. 1 is a schematic operating environment diagram of a box body key point detection device and method according to an embodiment of the invention.

FIG. 2 is a flowchart of a method for detecting key points of a box body according to an embodiment of the present invention.

Fig. 3 is a flowchart of a step of training a first detection network model in the box keypoint detection method according to the embodiment of the present invention.

Fig. 4 is a flowchart of a step of training a second detection network model in the box keypoint detection method according to the embodiment of the present invention.

Fig. 5 is a block diagram of a box key point detection device according to an embodiment of the present invention.

[ detailed description ] embodiments

In embodiments of the present invention, the term "module" generally refers to: hardware, a combination of hardware and software, and so forth. For example, a module may be a process running on a processor, an object, an executable, a thread of execution, a program, and so on. Both an application running on a processor and the processor can be a module. One or more modules may be located in one computer and/or distributed between two or more computers.

Referring to fig. 1, fig. 1 is a schematic view of an operating environment of a box body key point detection apparatus and method according to an embodiment of the present invention.

Examples of applications of the box key point detection apparatus of the embodiments of the present invention include, but are not limited to, Personal computers, servers, hand-held or laptop devices, mobile devices (such as mobile phones, Personal Digital Assistants (PDAs), media players, etc.), multiprocessor systems, consumer electronics, and the like.

The box key point detection apparatus and method according to the embodiments of the present invention may be implemented in devices and systems corresponding to the above application examples, where the devices and systems corresponding to the application examples may include any combination of the processor 103, the memory 101, the interface circuit 102, the camera 104, the communication circuit 105, and the like, and any combination of the processor 103, the memory 101, the interface circuit 102, the camera 104, the communication circuit 105, and the like is used to implement the functions and steps of the box key point detection apparatus and method according to the embodiments of the present invention.

Referring to fig. 2, fig. 2 is a flowchart of a box key point detection method according to an embodiment of the present invention.

The box body key point detection method provided by the embodiment of the invention comprises the following steps:

step 201, acquiring first image data, wherein the first image data includes data of an image (first image) of a first box. Specifically, the step 201 includes: and shooting an image of the first box body to acquire the first image data. The first box body is a target box body (a box body to be detected).

Step 202, detecting the first image data by using a first detection network model to output first detection result data, where the first detection result data includes first coordinate data of a first keypoint (e.g., a vertex) of the first box and a first serial number of the first keypoint relative to the first box. The detected first keypoints are numbered (marked with a first serial number) in the process of detecting the first keypoints. The combination of the first serial number and the first coordinate data is used for determining length, width and height data (the lengths of three different edges with the same first key point) of the first box body. For a manner of determining the length, width and height data of the first box by combining the first sequence number and the first coordinate data, for example, the following is given: taking any one first key point (serial number A) as a starting point, finding out other three first key points (serial number B, serial number C and serial number D) in a clockwise direction or an anticlockwise direction, wherein at the moment, a connecting line between the first key point (serial number B) and the first key point (serial number A) in the three first key points forms a first edge, a connecting line between the third first key point (serial number D) and the first key point (serial number A) in the three first key points forms a second edge, taking the first key point (serial number A) as a starting point, finding out other three first key points (serial number E, serial number F and serial number G) in the clockwise direction or the anticlockwise direction, wherein a connecting line between the first key point (serial number G) and the first key point (serial number A) forms a third edge, and the three edges (connecting lines) and the four first key points (serial numbers A, C and D) are used for calculating the first edge, Serial number B, serial number D, serial number G), the length, width, and height data of the first box can be calculated. Specifically, the step 202 includes: and calling the first detection network model, controlling the first detection network model to receive the first image data, and controlling the first detection network model to detect the first image data so as to output the first detection result data.

The first detection network model is a model created based on a technique of gesture detection. The input (received) of the first detection network model is the first image data, the output of the first detection network model is the first coordinate data of the first key points of the first box, of course, the output of the first detection network model may also include the connecting line data between the first key points of the first box, and the first detection network model is centered on a technology based on gesture detection, wherein the technology based on gesture detection may be, for example, an openpos based technology, a DeepCut based technology, an RMPE (Regional Multi-person position Estimation) based technology. Among them, it is preferable to adopt the openspace-based technology, and the first detection network model created by the openspace-based technology can obtain higher efficiency and stability in detecting the first image data compared with the models created by other technologies, because: firstly, the first detection network model created based on the openposition technology can directly detect the key points (vertexes) of the box body without detecting (deriving) the key points through a way (indirect way) for detecting other characteristics of the box body; the first detection network model is created based on the openposition technology, the outline (image) of the box body does not need to be drawn before the key points are detected, on one hand, the step of drawing the outline (image) of the box body can be saved, the efficiency of detecting the key points is improved, on the other hand, the situation that the key points cannot be detected due to failure of drawing the outline (image) of the box body can be avoided, and therefore the stability of detecting the key points of the box body is improved.

That is, the first detection network model detects a key point (corresponding to a joint at the distal end of an arm of a human body) in the first image data and a connecting line (corresponding to a trunk of the arm of the human body) between the key point by a technique based on posture detection, the connecting line data being expressed in a vector form, and outputs first coordinate data of the first key point of the first box.

Step 203, obtaining second image data with a preset size corresponding to the first key point from the first image data. Specifically, the step 203 includes: and based on the first coordinate data of the first key point, cutting out a second image with the preset size and comprising the first key point from the first image corresponding to the first image data, and outputting the second image data corresponding to the second image. For example, a rectangular second image is cut out with the coordinates corresponding to the first coordinate data as the center.

The predetermined size is a size of X pixels by Y pixels, where X and Y may be the same or different, for example, the predetermined size is 20 pixels by 20 pixels.

Step 204, detecting the second image data by using a second detection network model to output second detection result data, wherein the second detection result data at least comprises second coordinate data of the first key point.

The second detection network model is a model created based on a face detection technique.

The input (received) of the second detection network model is the second image data, the output of the second detection network model is the second coordinate data of the first keypoints of the first bin, and the second detection network model is centered around a face detection based technique, which may be, for example, a Multi-Task Convolutional Neural network (MTCNN) based technique, a Facenet based technique.

Specifically, the step 204 includes: and calling the second detection network model, controlling the second detection network model to receive the first coordinate data of the first key point, and controlling the second detection network model to detect the second image data based on the first coordinate data so as to output second detection result data.

Wherein the second detection network model performs feature point (keypoint) regression based on the first coordinate data to output the second detection result data more accurate than the first detection result data.

Further, the second detection network model comprises an iterative algorithm. And the second detection network model performs iterative operation according to the previous detection result to generate a more accurate detection result.

The first coordinate data and the second coordinate data are both two-dimensional coordinate data or three-dimensional coordinate data. The accuracy of the second coordinate data is higher than the accuracy of the first coordinate data.

Scene one: taking an image of the first box to obtain the first image data, receiving the first image data, calling the first detection network model, inputting the first image data to the first detection network model, and controlling the first detection network model (based on a pose detection technique) to detect the first image data, so that the first detection network model outputs two-dimensional coordinate data (with a first accuracy) of a vertex of the first box, cutting out second image data of a second image including 50 pixels by 50 pixels of the first keypoint from the first image data, calling the second detection network model, and controlling the second detection network model (based on a face detection technique) to detect the second image data, so that the second detection network model outputs two-dimensional coordinate data (with a second accuracy) of the vertex of the first box, wherein the second accuracy is higher than the first accuracy.

Scene two: receiving the captured first image data, inputting the first image data to the first detection network model, and controlling the first detection network model (based on a gesture detection technique) to detect the first image data, so that the first detection network model outputs first coordinate data (with a first accuracy) of the first keypoint, truncating a second image with a size of 40 pixels by 30 pixels, including the first keypoint, from a first image corresponding to the first image data, and outputting the second image data corresponding to the second image, inputting the second image data to the second detection network model, and controlling the second detection network model (based on a face detection technique) to detect the first keypoint based on the second image data, so that the second detection network model outputs second coordinate data (with a second coordinate data) of the first keypoint Accuracy), wherein the second accuracy is higher than the first accuracy.

Scene three: receiving the first image data, controlling the first detection network model to receive the first image data, and controlling the first detection network model (based on a gesture detection technique) to detect the first image data, so that the first detection network model outputs first coordinate data of the first keypoint and connecting line data of the first keypoint, acquires a second image including 20 pixels by 20 pixels of the first keypoint from a first image corresponding to the first image data with a point corresponding to the first coordinate data as a center, and outputs the second image data corresponding to the second image, and controls the second detection network model (based on a face detection technique) to detect the second image data according to at least one of the first coordinate data of the first keypoint and the connecting line data of the first keypoint, so that the second detection network model outputs second coordinate data of the first keypoint.

Referring to fig. 3, fig. 3 is a flowchart of a step of training a first detection network model in the box keypoint detection method according to the embodiment of the present invention.

Before the step of acquiring the first image data, the box body key point detection method further comprises the following steps:

step 301, importing the first training data prepared in advance.

Step 302, training the first detection network model by using first training data, where the first training data includes third image data of a second box and labeled data for a second keypoint of the second box.

The second box may be a different box than the first box, and data of the image of the second box and labeled data for a second keypoint of the second box are used for training purposes.

Referring to fig. 4, fig. 4 is a flowchart of a step of training a second detection network model in the box keypoint detection method according to the embodiment of the present invention.

Before the step of detecting the second image data by using the second detection network model to output the second detection result data, the box body key point detection method further includes the following steps:

step 401, importing the second training data prepared in advance.

Step 402, the block trains the second detection network model using second training data, wherein the second training data includes image data acquired based on third coordinate data of a third key point of a third box body obtained through training in advance.

The third coordinate data may be data output by training the first detection network model, or coordinate data of a separately created key point.

The third box and the second box may be the same box or different boxes.

In a case where data provided for a second detection network model to be trained is image data acquired based on coordinate data of a second keypoint output by the first detection network model (i.e., the third box is the same box as the second box), the box keypoint detection method further includes the steps of:

and in the process of training the first detection network model, controlling the first detection network model to mark a second sequence number for the second key point. For example, in the training process, the first detection network model first detects the second keypoint located at the upper left corner of the first image and marks the second keypoint located at the upper left corner of the first image as sequence number 1, and then detects the second keypoint located at the lower left corner of the first image and marks the second keypoint located at the lower left corner of the first image as sequence number 2, and so on.

In a case where data provided for a second detection network model to be trained is not image data acquired based on coordinate data of a second keypoint output by the first detection network model (that is, a process of the first training module training the first detection network model is independent of a process of the second training module training the second detection network model, and the third box is a different box from the second box), the box keypoint detection method further includes the steps of:

independently training the second detection network model with second training data.

In the embodiment of the present invention, the step of training the first detection network model and the step of training the second detection network model may be independent from each other, or may be performed in sequence.

Scene four: the step of training the first detection network model is independently performed, the step of training the second detection network model is independently performed, and then, the

steps

201, 202, 203, and 204 are independently performed.

Scene five: the step of training the first detection network model is performed first, the step of training the second detection network model is performed, and then step 201, step 202, step 203, and step 204 are performed.

In the above technical solution, the first keypoint of the first box is finely screened in a manner of using a cascade network model, that is, the first image data of the first box is detected by using the first detection network model based on a gesture detection technology to output first coordinate data of the first keypoint of the first box, and then the second image data including the first keypoint is detected by using the second detection network model based on a face detection technology to output second coordinate data of the first keypoint of the first box, so that a more accurate box keypoint position can be obtained, and through practice verification, the technical solution of the embodiment of the present invention can control a position offset (coordinate error) of the detected first keypoint within 3 pixels.

Referring to fig. 5, fig. 5 is a block diagram of a box body key point detection device according to an embodiment of the present invention.

The box body key point detection device implemented by the invention comprises a first acquisition module 501, a detection module 502 and a second acquisition module 503.

The first obtaining module 501 is configured to obtain first image data, where the first image data includes data of an image of a first box. The first obtaining module 501 is configured to capture an image of the first box to obtain the first image data, and at this time, the first obtaining module 501 may be, for example, a camera integrated in the box keypoint detecting device; or, the first obtaining module 501 is configured to receive the first image data from a camera integrated in the box key point detecting apparatus, at this time, the first obtaining module 501 may be, for example, an interface circuit in a Main Board (Main Board); alternatively, the first obtaining module 501 is configured to receive the first image data from another image capturing device in communication with the box key point detecting apparatus, and in this case, the first obtaining module 501 may be, for example, a communication circuit/a signal transceiver circuit. The first box body is a target box body (a box body to be detected).

The detecting module 502 is configured to detect the first image data by using a first detecting network model to output first detection result data, where the first detection result data includes first coordinate data of a first keypoint (e.g., a vertex) of the first box and a first sequence number of the first keypoint relative to the first box. The detecting module 502 is configured to number (mark a first serial number) the detected first keypoint in the process of detecting the first keypoint. The combination of the first serial number and the first coordinate data is used for determining length, width and height data (the lengths of three different edges with the same first key point) of the first box body. For a manner of determining the length, width and height data of the first box by combining the first sequence number and the first coordinate data, for example, the following is given: taking any one first key point (serial number A) as a starting point, finding out other three first key points (serial number B, serial number C and serial number D) in a clockwise direction or an anticlockwise direction, wherein at the moment, a connecting line between the first key point (serial number B) and the first key point (serial number A) in the three first key points forms a first edge, a connecting line between the third first key point (serial number D) and the first key point (serial number A) in the three first key points forms a second edge, taking the first key point (serial number A) as a starting point, finding out other three first key points (serial number E, serial number F and serial number G) in the clockwise direction or the anticlockwise direction, wherein a connecting line between the first key point (serial number G) and the first key point (serial number A) forms a third edge, and the three edges (connecting lines) and the four first key points (serial numbers A, C and D) are used for calculating the first edge, Serial number B, serial number D, serial number G), the length, width, and height data of the first box can be calculated.

The detection module 502 is configured to invoke the first detection network model, control the first detection network model to receive the first image data, and control the first detection network model to detect the first image data, so as to output the first detection result data.

The first detection network model is a model created based on a technique of gesture detection. The input (received) of the first detection network model is the first image data, the output of the first detection network model is the first coordinate data of the first key points of the first box, of course, the output of the first detection network model may also include the connecting line data between the first key points of the first box, and the first detection network model is centered on a technology based on gesture detection, wherein the technology based on gesture detection may be, for example, an openpos based technology, a DeepCut based technology, an RMPE (Regional Multi-person position Estimation) based technology. Among them, the technique based on opencast is preferably employed.

That is, the first detection network model is configured to detect a key point (corresponding to a joint at the distal end of an arm of a human body) in the first image data and a connecting line (corresponding to a trunk of the arm of the human body) between the key point by a technique based on posture detection, the connecting line data being expressed in a vector form, and output first coordinate data of the first key point of the first box.

The second obtaining module 503 is configured to obtain second image data with a predetermined size corresponding to the first key point from the first image data.

The second obtaining module 503 is configured to intercept a second image with the predetermined size, which includes the first keypoint, from the first image corresponding to the first image data based on the first coordinate data of the first keypoint, and output the second image data corresponding to the second image. For example, the second obtaining module 503 is configured to intercept a rectangular second image by taking the coordinate corresponding to the first coordinate data as a center.

The second obtaining module 503 may be, for example, a program in computer software (e.g., input method software, browser software, instant messaging software) for implementing a screenshot function or an image processing program, and the second obtaining module 503 may also be, for example, an image processing chip.

The detecting module 502 is further configured to detect the second image data by using a second detecting network model to output second detection result data, where the second detection result data at least includes second coordinate data of the first keypoint.

The detection module 502 is configured to invoke the second detection network model, control the second detection network model to receive the first coordinate data of the first key point, and control the second detection network model to detect the second image data based on the first coordinate data, so as to output the second detection result data.

The detection module 502 is configured to control the second detection network model to perform feature point (keypoint) regression based on the first coordinate data, so as to output the second detection result data that is more accurate than the first detection result data.

Further, the second detection network model comprises an iterative algorithm. And the second detection network model is used for carrying out iterative operation according to the previous detection result so as to generate a more accurate detection result.

Scene one: a camera (the first obtaining module 501) integrated in a computer (the box key point detecting device) captures an image of the first box to obtain the first image data. An operating system program (the detection module 502) run by a processor of the computer receives the first image data, calls the first detection network model, inputs the first image data to the first detection network model, and controls the first detection network model (based on a gesture detection technology) to detect the first image data, so that the first detection network model outputs two-dimensional coordinate data (with a first accuracy) of the vertex of the first box. An image processing program run by the processor of the computer (the second acquisition module 503) crops second image data of a second image comprising 50 pixels by 50 pixels of the first keypoint from the first image data. An operating system program (the detection module 502) run by a processor of the computer calls the second detection network model and controls the second detection network model (based on a face detection technology) to detect the second image data, so that the second detection network model outputs two-dimensional coordinate data (with a second accuracy) of the vertex of the first box, wherein the second accuracy is higher than the first accuracy.

Scene two: an interface circuit (the first obtaining module 501) in a main board of a mobile terminal (the box key point detecting device) receives the first image data shot by a camera from the camera integrated in the mobile terminal. The operating system program (the detection module 502) of the mobile terminal inputs the first image data into the first detection network model and controls the first detection network model (based on a gesture detection technology) to detect the first image data, so that the first detection network model outputs first coordinate data (with a first accuracy) of the first keypoint. The processor of the mobile terminal runs a screenshot program (the second obtaining module 503), and intercepts a second image with size 40 pixels by 30 pixels including the first keypoint from a first image corresponding to the first image data, and outputs the second image data corresponding to the second image. The operating system program of the mobile terminal (the detection module 502) inputs the second image data into the second detection network model, and controls the second detection network model (based on the face detection technology) to detect the first keypoint on the basis of the second image data, so that the second detection network model outputs second coordinate data (with a second accuracy) of the first keypoint, wherein the second accuracy is higher than the first accuracy.

Scene three: a communication circuit (the first acquisition module 501) of a server (the box key point detection apparatus) receives the first image data from other image pickup devices communicating with the server. The processor of the server runs a detection control program (the detection module 502), controls the first detection network model to receive the first image data, and controls the first detection network model (based on a gesture detection technology) to detect the first image data, so that the first detection network model outputs the first coordinate data of the first keypoint and the connecting line data of the first keypoint. An image processor (GPU) of the server runs an image Processing program (the second obtaining module 503) to obtain a second image with 20 pixels by 20 pixels including the first keypoint from the first image corresponding to the first image data, with a point corresponding to the first coordinate data as a center, and outputs the second image data corresponding to the second image. The processor of the server runs a detection control program (the detection module 502) to control the second detection network model (based on a face detection technology) to detect the second image data according to at least one of the first coordinate data of the first keypoint and the connecting line data of the first keypoint, so that the second detection network model outputs the second coordinate data of the first keypoint.

The box body key point detection device further comprises a first training module, wherein the first training module is used for importing first training data prepared in advance and training the first detection network model by using the first training data, and the first training data comprises third image data of a second box body and labeled data of second key points of the second box body. The second box may be a different box than the first box, and data of the image of the second box and labeled data for a second keypoint of the second box are used for training purposes.

The first training module may be, for example, a CAFFE (conditional Architecture for Fast Feature Embedding) based deep learning platform.

The box body key point detection device further comprises a second training module, wherein the second training module is used for importing second training data which are prepared in advance and training the second detection network model by using the second training data, and the second training data comprise image data which are obtained based on third coordinate data of third key points of a third box body which are obtained through training in advance. The third coordinate data may be data output by training the first detection network model, or coordinate data of a separately created key point.

The third box and the second box may be the same box or different boxes.

Likewise, the second training module may be, for example, a CAFFE-based deep learning platform. The second training module and the first training module may be the same module or may be different modules independent from each other.

In a case where data provided for a second detection network model to be trained is image data obtained based on coordinate data of a second keypoint output by the first detection network model (that is, the third box and the second box are the same box), the first training module controls the first detection network model to mark a second sequence number for the second keypoint during training of the first detection network model.

For example, in the training process, the first detection network model first detects the second keypoint located at the upper left corner of the first image and marks the second keypoint located at the upper left corner of the first image as sequence number 1, and then detects the second keypoint located at the lower left corner of the first image and marks the second keypoint located at the lower left corner of the first image as sequence number 2, and so on.

In a case where data provided for a second detection network model to be trained is not image data acquired based on coordinate data of a second keypoint output by the first detection network model (that is, a process of the first training module training the first detection network model is independent of a process of the second training module training the second detection network model, and the third box and the second box are different boxes), the second training module trains the second detection network model independently using second training data.

The first training module and the second training module may represent one or a combination of more than one of an operating system program, a training control program, a deep learning platform program, etc., run by a processor.

The box key point detection device implemented by the invention can be implemented by a CPU (Central Processing Unit), a GPU, an NPU (Neural network Processing Unit), other general processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic devices, a discrete Gate or transistor logic device, a discrete hardware component, and the like, wherein the general Processor can be a microprocessor or any conventional Processor, and the like.

The box body key point detection device implemented by the invention can also be realized by software, and at the moment, the box body key point detection device implemented by the invention and each module thereof can also be a software module.

The box key point detection device implemented by the present invention may correspond to the method described in the embodiment of the present invention, and the above and other operations and/or functions of each module in the box key point detection device are used to implement the corresponding flow of the box key point detection method implemented by the present invention.

The computing device of the embodiment of the invention comprises a processor and a memory. The processor and the memory communicate through a bus. The memory is used for storing program instructions, and when the computing device runs, the processor is used for calling the program instructions stored in the memory and executing the program instructions so as to execute the box key point detection method implemented by the invention. The computing device may be, for example: personal computers, servers, hand-held or laptop devices, mobile devices (including mobile phones, personal digital assistants, media players, and the like), multiprocessor systems, consumer electronics, and the like. For example, the computing device may be a handheld scanner for couriers/logistics operators, or may be a server located within a logistics warehouse for automatically identifying the circulating enclosures.

It should be understood that the processor may be a CPU, GPU, NPU, other general purpose processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or any conventional processor or the like.

The memory may include both read-only memory and random access memory, and provides program instructions and data to the processor. The memory may also include non-volatile random access memory. The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory.

The computer-readable storage medium of an embodiment of the present invention stores program instructions for causing a computer to execute the box body key point detection method implemented by the present invention.

Embodiments of the invention may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product (the carrier of which may, for example, be the computer-readable storage medium of an embodiment of the invention). The computer program product includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a Solid State Drive (SSD).

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application.

In summary, although the present invention has been described with reference to the preferred embodiments, the above-described preferred embodiments are not intended to limit the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore, the scope of the present invention shall be determined by the appended claims.

Claims

1. A box body key point detection method is characterized by comprising the following steps:

acquiring first image data, wherein the first image data comprises data of an image of a first box body;

detecting the first image data by using a first detection network model to output first detection result data, wherein the first detection result data comprises first coordinate data of a first key point of the first box body;

acquiring second image data with a preset size corresponding to the first key point from the first image data;

and detecting the second image data by using a second detection network model to output second detection result data, wherein the second detection result data at least comprises second coordinate data of the first key point.

2. The box key point detection method according to claim 1, wherein the first detection network model is a model created based on a technique of gesture detection;

3. The box key detecting method according to claim 1, wherein the step of obtaining second image data of a predetermined size corresponding to the first key from the first image data comprises:

and based on the first coordinate data of the first key point, cutting out a second image with the preset size and comprising the first key point from the first image corresponding to the first image data, and outputting the second image data corresponding to the second image.

4. A box keypoint detection method according to claim 1, wherein before the step of acquiring first image data, said method further comprises the steps of:

training the first detection network model by using first training data, wherein the first training data comprises third image data of a second box body and labeled data aiming at a second key point of the second box body.

5. The box keypoint detection method of claim 4, further comprising the steps of:

and marking a second serial number for the second key point in the process of training the first detection network model.

6. The box key point detecting method according to claim 1, wherein before the step of detecting the second image data by using the second detection network model to output second detection result data, the method further comprises the steps of:

and training the second detection network model by using second training data, wherein the second training data comprises image data acquired based on third coordinate data of a third key point of a third box body obtained through training in advance.

7. A box key point detection device, characterized in that, the device includes:

the first acquisition module is used for acquiring first image data, wherein the first image data comprises data of an image of a first box body;

the detection module is used for detecting the first image data by using a first detection network model so as to output first detection result data, wherein the first detection result data comprises first coordinate data of a first key point of the first box body;

the second acquisition module is used for acquiring second image data with a preset size corresponding to the first key point from the first image data;

the detection module is further configured to detect the second image data by using a second detection network model to output second detection result data, where the second detection result data at least includes second coordinate data of the first keypoint.

8. The box key point detection device of claim 7, wherein the second obtaining module is configured to intercept a second image with the predetermined size including the first key point from the first image corresponding to the first image data based on the first coordinate data of the first key point, and output the second image data corresponding to the second image.

9. A computing device comprising a processor and a memory, the memory storing program instructions, the processor being configured to execute the program instructions in the memory when the computing device is run to perform the method of box keypoint detection of any of claims 1 to 6.

10. A computer-readable storage medium storing program instructions for causing a computer to execute the box keypoint detection method of any one of claims 1 to 6.