CN114511613B

CN114511613B - Key point detection method, model training method, device, equipment and storage medium

Info

Publication number: CN114511613B
Application number: CN202011281144.0A
Authority: CN
Inventors: 王建国
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2023-04-18
Anticipated expiration: 2040-11-16
Also published as: CN114511613A

Abstract

The embodiment of the invention provides a key point detection method, a model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: the positions of the key points of the target object in the small-size first image are determined, and the position offset corresponding to the key points of the target object is determined according to the large-size second image and the positions of the key points in the first image. And determining the position of the key point in the large-size second image according to the position offset. Therefore, in the method, the key points are firstly preliminarily detected in the small-size first image, and based on the preliminary detection result, the key points are finely detected by using the position offset obtained according to the large-size second image, that is, the key points in the large-size image are accurately detected. Meanwhile, the method does not directly detect the large-size second image, so that the calculated amount of the key point detection process can be reduced, and the detection efficiency is improved.

Description

Key point detection method, model training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a key point detection method, a model training method, a device, equipment and a storage medium.

Background

Since the birth of artificial intelligence, with the increasing maturity of theory and technology, the application fields are also expanding, such as the fields of intelligent security, intelligent logistics, image processing and the like. In the field of intelligent logistics, key points in a parcel image can be detected by means of an artificial intelligence technology, and then intelligent measurement of parcel volume is realized according to the detected key points. In the field of image processing, for example, a face can be beautified or special effects can be added to improve the display effect of the face in an image. The addition of the beauty or special effects needs to be followed by the artificial intelligence technology to detect the key points of the human face in the image.

Meanwhile, with the rapid development of shooting equipment, the shot images are also large-size images, i.e., high-resolution images, and therefore, how to accurately detect key points in the large-size images becomes a problem to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, a device, an apparatus, and a storage medium for detecting keypoints, so as to ensure accuracy of keypoint detection for large-size images.

In a first aspect, an embodiment of the present invention provides a method for detecting a key point, including:

determining the position of a key point of a target object in a first image containing the target object;

determining a position offset corresponding to the key point according to a second image containing the target object and the position of the key point in the first image, wherein the size of the second image is larger than that of the first image;

and determining the position of the key point in the second image according to the position offset.

In a second aspect, an embodiment of the present invention provides a key point detecting device, including:

the first determining module is used for determining the positions of key points of a target object in a first image containing the target object;

a second determining module, configured to determine, according to a second image that includes the target object and a position of the key point in the first image, a position offset corresponding to the key point, where a size of the second image is larger than a size of the first image;

and a third determining module, configured to determine, according to the position offset, a position of the keypoint in the second image.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is used to store one or more computer instructions, and when executed by the processor, the one or more computer instructions implement the keypoint detection method in the first aspect. The electronic device may also include a communication interface for communicating with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the keypoint detection method according to the first aspect.

In a fifth aspect, an embodiment of the present invention provides a method for detecting a keypoint, including:

receiving a request for calling a detection service, and executing the following steps according to a processing resource corresponding to the detection service:

in response to a shooting operation, determining the position of a key point of a target object in a first image containing the target object;

determining a position offset corresponding to the key point according to a second image containing the target object and the position of the key point in the first image, wherein the request comprises the first image and the second image, and the size of the second image is larger than that of the first image;

determining the position of the key point in the second image according to the position offset;

and displaying the second image marked with the key points.

In a sixth aspect, an embodiment of the present invention provides a key point detecting device, including:

a receiving module, configured to receive a request for invoking a detection service;

an execution module, configured to execute the following steps according to the processing resource corresponding to the detection service:

and displaying the second image marked with the key points.

In a seventh aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory is used to store one or more computer instructions, and when the one or more computer instructions are executed by the processor, the method for detecting a keypoint in the fifth aspect is implemented. The electronic device may also include a communication interface for communicating with other devices or a communication network.

In an eighth aspect, the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the keypoint detection method according to the fifth aspect.

In a ninth aspect, an embodiment of the present invention provides a model training method, including:

acquiring a first training sample and a second training sample containing a target object, wherein the size of the first training sample is larger than that of the second training sample;

inputting the first training sample into a first prediction model to output, by the first prediction model, positions of key points of the target object in the first training sample;

determining the reference position offset corresponding to the key point according to the output result of the first prediction model;

and taking the second training sample as input, taking the reference position offset as supervision information, and training a second prediction model.

In a tenth aspect, an embodiment of the present invention provides a model training apparatus, including:

the device comprises an acquisition module, a comparison module and a processing module, wherein the acquisition module is used for acquiring a first training sample and a second training sample containing a target object, and the size of the first training sample is larger than that of the second training sample;

an input module, configured to input the first training sample into a first prediction model, so that the first prediction model outputs a position of a keypoint of the target object in the first training sample;

a determining module, configured to determine a reference position offset corresponding to the key point according to an output result of the first prediction model;

and the training module is used for taking the second training sample as input, taking the reference position offset as supervision information and training a second prediction model.

In an eleventh aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is used to store one or more computer instructions, and the one or more computer instructions, when executed by the processor, implement the model training method in the ninth aspect. The electronic device may also include a communication interface for communicating with other devices or a communication network.

In a twelfth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the model training method according to the ninth aspect.

In a thirteenth aspect, an embodiment of the present invention provides a model training method, including:

receiving a request for calling a training service, and executing the following steps according to a processing resource corresponding to the training service:

responding to input operation of a user, and acquiring a first training sample and a second training sample containing a target object, wherein the first training sample and the second training sample are contained in the request, and the size of the first training sample is larger than that of the second training sample;

taking the second training sample as input, taking the reference position offset as supervision information, and training a second prediction model;

and outputting the model parameters of the second prediction model.

In a fourteenth aspect, an embodiment of the present invention provides a model training apparatus, including:

the receiving module is used for receiving a request for calling the training service;

an execution module, configured to execute the following steps according to the processing resource corresponding to the training service:

responding to an input operation of a user, acquiring a first training sample and a second training sample containing a target object, wherein the first training sample and the second training sample are contained in the request, and the size of the first training sample is larger than that of the second training sample;

and outputting the model parameters of the second prediction model.

In a fifteenth aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is used to store one or more computer instructions, and the one or more computer instructions, when executed by the processor, implement the model training method in the thirteenth aspect. The electronic device may also include a communication interface for communicating with other devices or a communication network.

In a sixteenth aspect, the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the model training method according to the thirteenth aspect.

The method for detecting the key points, provided by the embodiment of the invention, comprises the steps of firstly acquiring a first image which is small in size and contains a target object, and determining the positions of the key points of the target object in the first image. And simultaneously determining the position offset corresponding to the key point according to the large-size second image containing the target object and the position of the key point in the first image. And finally, determining the position of the key point in the second image according to the position offset.

Therefore, in the method, the key points are firstly preliminarily detected in the small-size first image, and based on the preliminary detection result, the key points are finely detected by using the position offset obtained according to the large-size second image, that is, the fine key point detection is realized on the large-size image. Meanwhile, the detection method does not directly detect the large-size second image, so that the calculation amount of the key point detection process can be reduced, and the detection efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting a key point according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a key point detection method provided in the implementation of FIG. 1;

FIG. 3 is a flow chart of a second prediction model training process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram corresponding to the training flowchart provided by the embodiment shown in FIG. 3;

FIG. 5 is a flowchart of another method for detecting a keypoint according to an embodiment of the present invention;

fig. 6 is a schematic view of the application of the keypoint detection method provided by the embodiment of the invention in a live broadcast or self-timer scene;

fig. 7 is a schematic diagram of a key point detection method applied to an identity recognition scene according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a case where the key point detection method provided by the embodiment of the present invention is applied to an express delivery scene;

FIG. 9 is a flowchart of a model training method according to an embodiment of the present invention;

FIG. 10 is a flow chart of another model training method provided by embodiments of the present invention;

fig. 11 is a schematic structural diagram of a key point detection apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device corresponding to the keypoint detection apparatus provided in the embodiment shown in fig. 11;

fig. 13 is a schematic structural diagram of another key point detecting device according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of an electronic device corresponding to the keypoint detection apparatus provided in the embodiment shown in fig. 13;

FIG. 15 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;

FIG. 16 is a schematic structural diagram of an electronic device corresponding to the model training apparatus provided in the embodiment shown in FIG. 15;

FIG. 17 is a schematic structural diagram of another model training apparatus according to an embodiment of the present invention;

fig. 18 is a schematic structural diagram of an electronic device corresponding to the model training apparatus provided in the embodiment shown in fig. 17.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "the plural" typically includes at least two, but does not exclude the presence of at least one.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship.

Depending on the context, the words "if" or "if" as used herein may be interpreted as "at ...when" or "when ...when" or "in response to a determination" or "in response to a recognition". Similarly, the phrase "if determined" or "if identified (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (a stated condition or event)" or "in response to an identification (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrases "comprising one of ..." does not exclude the presence of additional like elements in an article or system comprising the element.

Before explaining the key point detection method provided by the embodiment of the invention, the practical significance of key point detection can be exemplarily explained as follows:

the description in the background art is accepted, and for the intelligent logistics field, key point detection demands can exist at different stages of express logistics. For example, at the end of the logistics process, courier personnel need to accurately measure the size of the package when picking at the home to allow the user to pay the correct shipping cost. For another example, in the parcel loading stage, accurate measurement of parcel size is required to ensure the loading rate of a truck. In practice, measurement of package size may be achieved by taking images and identifying the locations of package keypoints in the images.

For the image processing field, when a user carries out self-shooting or video live broadcasting, face slimming, large-eye and other beauty treatment can be carried out on the face in the image so as to improve the display effect of the face. Special effects can also be added to the human face or the specific part of the human body in the image, such as adding glasses special effects to the eye part, adding headwear to the head part and the like. The above effects can be achieved only when the key points of the human face or the human body are detected in the image.

With the development of photographing devices, images photographed in the above scenes are generally large-sized, i.e., high-resolution images. On one hand, if the large-size image is used to directly detect the key points, the amount of calculation in the detection process is greatly increased, resulting in low detection efficiency.

On the other hand, in practice, the large-size image may be reduced to a small-size image, and then the small-size image may be subjected to the keypoint detection, so as to obtain the position of the keypoint of the target object in the small-size image. The target object may be a package, a human body, a human face, and the like in each of the above scenes. Then, the position of the key point in the small-size image is mapped to the large-size image by using the size relation between the two images before and after zooming, so that the key point detection of the large-size image is indirectly realized. However, in consideration of errors existing in the process of detecting key points of the small-size image, after mapping, the errors are reflected to the large-size image in multiples, so that the detection of the key points of the large-size image is inaccurate.

In order to overcome the above problems, the method provided by the present invention can be used. The detection accuracy is guaranteed, meanwhile, the calculated amount can be reduced, and the detection efficiency is guaranteed.

The application range can be further expanded, and the key point detection method provided by the embodiment of the invention can be applied to scenes such as intelligent buildings, intelligent retail, intelligent tourism, intelligent finance and the like, and identity recognition and payment are realized by detecting key points of human faces. Besides, the method and the device can be applied to any other scenes needing key point detection, and the application scenes of key point detection are not limited by the invention.

Based on the above description, some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in the embodiments of the methods described below is merely an example, and is not strictly limited.

Fig. 1 is a flowchart of a keypoint detection method according to an embodiment of the present invention, where the keypoint detection method according to the embodiment of the present invention may be executed by a detection device. It will be appreciated that the detection device may be implemented as software, or a combination of software and hardware. As shown in fig. 1, the method comprises the steps of:

s101, in a first image containing a target object, determining the position of a key point of the target object in the first image.

The photographing apparatus may photograph the target object so that the detecting apparatus acquires a first image of a small size containing the target object. In different application scenarios, the target object is also different, for example, the target object may be a human face, a human body, an express package, or the like.

Alternatively, the position of the key point of the target object in the first image may be determined by a detection algorithm. The first image may also be input to the first predictive model to output the location of the keypoint in the first image from the first predictive model. Wherein the first predictive model is typically a convolutional neural network based model. The position of the key point is specifically the pixel coordinate of the pixel point corresponding to the key point in the first image. The preliminary detection of the key points is completed through the steps.

S102, according to a second image containing the target object and the positions of the key points in the first image, determining the position offset corresponding to the key points, wherein the size of the second image is larger than that of the first image.

The detection device may also acquire a second image of a large size containing the target object by shooting by the shooting device. Wherein, the size of the second image is larger than that of the first image, and the sizes of the two images can have a multiple relation. The size of the image may be understood as the resolution of the image, and then, for example, the size of the first image may be 200 × 200, and the size of the second image may be 2000 × 2000.

It should be noted that, in different scenes, the first image and the second image may be acquired in different manners, and the detection device and the shooting device may have different relationships, and specific contents may be referred to in the following description.

Based on the second captured image, optionally, a corresponding position offset of the key point of the target object in the second image may be obtained by means of a second prediction model, and the position offset may be expressed as a pixel coordinate offset.

Specifically, in an optional manner, the positions of the keypoints in the first image may be mapped to the second image according to the size relationship between the first image and the second image, so as to determine the initial positions of the keypoints in the second image. And then taking each key point in the second image as a center, intercepting an image area with a preset size from the second image, inputting the intercepted image area into a second prediction model, and outputting the corresponding position offset of the key point in the second image by the second prediction model.

It is easy to understand that when the number of the key points in the second image is large, the number of the cut-out image regions is also large, and there tends to be coincidence between the plurality of image regions, so that the calculation pressure of the second prediction model in determining the positional deviation amount is large, and there also exists a certain degree of repetitive calculation.

Therefore, from the viewpoint of calculation amount, another alternative may be: after the initial position of the keypoint in the second image is obtained in the above manner, the complete second image may be directly input to the second prediction model, so that the second prediction model determines the position offset corresponding to the keypoint according to the second image and the initial position. In this way, since the second prediction model directly predicts the position shift amount using one complete second image, the above-described process of cutting out the image region does not exist, and therefore, the amount of calculation of the second prediction model can be reduced.

For the initial position of the key point in the second image, the example of the image size is carried, the size difference between the two images is 10 times, that is, 1 pixel point in the first image and 10 pixel points in the second image have a mapping relationship. Assuming that the key points of the target object correspond to N pixel points in the first image, the N pixel points in the first image may be mapped into the second image according to the size relationship between the two, and the key points correspond to 10 × N pixel points in the second image. The pixel coordinates of the 10 × n pixel points in the second image are also the initial positions of the key points in the second image. And the number of the position offset determined by the second prediction model is also 10 × n, that is, the position offset corresponds to 10 × n pixel points for describing the key point in the second image one by one.

And S103, determining the position of the key point in the second image according to the position offset.

Finally, on the basis of the preliminary detection result obtained in step 101, the position offset obtained in step 102 is used to perform fine detection on the key point of the target object, so as to obtain the position of the key point in the large-size second image.

Continuing with the example in step 102, optionally, the keypoint detection of the second image may be implemented by using 10 × n pixel points in the second image and 10 × n position offset output by the second prediction model:

optionally, the 10 × n position offset and the pixel coordinates of the 10 × n pixel points obtained after mapping may be correspondingly added to obtain the pixel coordinates of the pixel points corresponding to the key point of the target object in the second image, that is, the target position of the key point in the second image is obtained.

For example, assuming that the initial position of a key point of the target object in the second image is (x, y), that is, the pixel coordinate of a pixel point a among 10 × n pixel points in the second image is (x, y), and the position offset amount output by the second prediction model and corresponding to the pixel point a is (Δ x, Δ y), the target position of the key point of the target object in the second image is: (x + Δ x, y + Δ y). Wherein, pixel A is any one of 10 × N pixels.

Optionally, an adjustment parameter may also be set for the keypoint located at the specific part of the target object, and the sum of the initial position and the product of the adjustment parameter and the position offset is determined as the target position, that is, the accuracy of keypoint detection is ensured by setting the adjustment parameter. Taking a human face as an example, the detection difficulty of the key points corresponding to the eyes and eyebrows is often high, and the requirement on the detection accuracy is high, so that the adjustment parameters can be set, and the key points corresponding to the outline of the face can be free of setting the adjustment parameters.

In this embodiment, a first image with a small size is obtained first, and the position of a key point of a target object in the first image is determined. And simultaneously determining the position offset corresponding to the key point of the target object according to the large-size second image and the position of the key point in the first image. Finally, the positions of the key points in the second image are determined according to the position offset.

As can be seen, in this embodiment, the key points are initially detected in the small-sized first image, and based on the initial detection result, the key points are finely detected by using the position offset obtained from the large-sized second image. On one hand, the key point detection is not directly carried out by using the large-size second image, so that the calculation amount in the detection process is greatly reduced, and the detection efficiency is improved. On the other hand, in the process of detecting the key points, because the position offset is introduced, the influence of the error of the primary detection result on the detection of the large-size second image can be avoided, and the accuracy of detecting the key points is ensured.

For the acquisition of the first image and the second image, in an alternative manner, shooting devices with different shooting capabilities can be used to respectively shoot the target object so as to obtain the first image with the small size and the second image with the large size. The shooting device may be a camera in general.

In a live video scene or a self-timer scene, the shooting device may be a plurality of cameras with different shooting capabilities configured on a terminal device used by a user, and the cameras are used for respectively shooting images with different sizes. The terminal device can be a mobile phone, a tablet computer and the like. Since the terminal device itself has a certain data processing capability, the terminal device can be used as a detection device to execute the method provided by each embodiment of the present invention. After the key point detection of the face, a special effect or a beautifying effect added to the face can be displayed on the terminal equipment.

In the logistics scene, a plurality of cameras with different shooting capabilities can be arranged on the terminal equipment used by express delivery personnel. Similar to the above scenario, the terminal device may also be used as a detection device to detect the package key point, and display the size of the package and the freight amount to be paid by the user on the terminal device.

In each of the above scenarios, the detection device may also be a remote server independent of the terminal device.

In the scenes such as intelligent buildings and the like needing identity recognition, the shooting equipment can be cameras with different shooting capabilities arranged on a gate machine of the building, and the detection equipment can be a remote server. When the user is in front of the gate, the camera can shoot the first image and the second image and send the first image and the second image to the server, and the server performs key point detection on the large-size image according to the method provided by the embodiment of the invention to realize identity recognition.

In the intelligent payment scene, the shooting device may be a camera configured on the settlement device, and the detection device may be a remote server. And after the detection equipment detects key points of the face, the payment can be automatically finished.

Of course, the gate or the accounting device may also be used directly as a detection device if it has sufficient computing power.

When images of different sizes are acquired in the above manner, it is necessary to use a plurality of cameras having different shooting capabilities, which obviously increases the cost of the apparatus. In order to overcome this drawback, in another alternative image capturing manner, the shooting device may be a high-definition camera provided on a terminal device, a gate, or a settlement device used by a user, and is used for capturing a large-size image. Then, the taken image may be subjected to a reduction process by the detection apparatus to obtain a small-sized image.

In addition, it is easy to understand that in the process of detecting the key points, the image region where the target object is located needs to be focused, and other image regions are likely to affect the key point detection. Therefore, for an image captured by a high-definition camera (which may be referred to as an original image), it is also possible to optionally perform detection of a target object thereon and extract an image area (which may be referred to as a target image area) where the target object is located from the original image. Then, the target image area is reduced to obtain a small-size image (i.e., the first image) and a large-size image (i.e., the second image).

Wherein the detection of the target object may be achieved by an independent object detection model. And the size of the first image is to comply with the input requirements of the first prediction model and the size of the second image is to comply with the input requirements of the second prediction model. In practical applications, the size of the second image may be equal to or slightly smaller than the size of the target image area.

When the size of the second image is equal to the target image area and the target object is a human face, the above and the implementation process of the embodiment shown in fig. 1 can be understood in conjunction with fig. 2.

When the size of the second image is slightly smaller than the target image area, after the step 104 is executed, the positions of the key points in the second image need to be mapped into the target image area according to the size relationship between the second image and the target image area, so as to complete the key point detection on the large-scale image.

It has been mentioned in the implementation shown in fig. 1 that the position offset can be determined from the second predictive model, and for the training process of this model, optionally, as shown in fig. 3, the following steps can be included:

s201, a first training sample containing the target object is obtained, and the size of the first training sample is the same as that of the second image.

A first training sample containing a target subject is obtained. Optionally, in order to ensure the training effect of the model, the original training sample may be obtained first, the image region where the target object is located is extracted, and then the size of the image region is adjusted to obtain the first training sample, and the second training sample may also be obtained at the same time.

And the sizes of the original training sample, the first training sample and the second training sample are reduced in sequence. The first training sample may be the same size as the second image to meet the input requirements of the second predictive model. The second training samples are the same size as the first image to meet the input requirements of the first predictive model.

S202, determining reference position offset corresponding to the key point of the target object in the first training sample by means of the first prediction model.

Then, a second training sample of small size is input to the first predictive model. Wherein the first predictive model may be trained to converge. And the prediction result output by the first prediction model is the position of the key point of the target object in the second training sample. And calculating the reference position offset corresponding to the key point of the target object in the first training sample according to the output prediction result.

Alternatively, the specific calculation process of the reference position offset may be:

after obtaining the first training sample with a large size, the user may label the key points of the target object first therein, and the labeling result is the reference positions of the key points in the first training sample. And then, mapping the prediction result output by the first prediction model into the first training sample according to the size relationship between the first training sample and the second training sample to obtain the predicted position of the key point in the first training sample. And comparing the predicted position with the reference position marked by the user, wherein the difference value of the predicted position and the reference position marked by the user is the corresponding reference position offset of the key point in the second image.

For example, it is assumed that the size of the first training sample is 10 times that of the second training sample, and the user marks the pixel points corresponding to the key points of the target object in the first training sample, where the number of the marked pixel points may be 10 × n. Meanwhile, a second training sample of a small size may be input to the first prediction model. After mapping the predicted position output by the first prediction model, 10 × n pixel points may also be obtained in the first training sample. And finally, calculating the pixel coordinates of the manually marked 10 × n pixel points, and calculating the difference between the pixel coordinates of the manually marked 10 × n pixel points and the pixel coordinates of the mapped 10 × n pixel points, wherein the obtained 10 × n difference is also the reference position offset.

The magnitude of the reference position offset actually reflects the error in the detection of the keypoint of the first predictive model. The smaller the reference offset, the smaller the error of the first prediction model, and the higher the detection accuracy.

S203, the first training sample is used as input, the reference position offset is used as supervision information, and a second prediction model is trained.

And finally, taking the large-size first training sample as input, taking the reference position information as supervision information, and training the second prediction model.

In this embodiment, the monitoring information used in the training process of the second prediction model is the position offset corresponding to the key point of the target object, and compared with the case that the user directly marks the key point of the target object in the training sample, it is obviously more difficult for the user to directly mark the position offset, so that the monitoring information, i.e., the reference position offset, can be obtained by means of the output result of the first prediction model and the mark of the user on the key point in the training sample, thereby ensuring that the training of the second prediction model is easier.

For the first prediction model used in the process of training the second prediction model, a small-sized second training sample can be obtained first, and the user marks the key points of the target object in the second training sample. And inputting the second training sample into the first prediction model, taking the labeling result of the user as supervision information, and training the first prediction model until the model converges.

Based on the trained first predictive model, the second predictive model may then be trained to converge in the manner of the embodiment shown in fig. 3. It is easy to understand that training of the model usually requires multiple rounds, and optionally, the calculated reference position offset and the predicted position offset output by the second prediction model may also be input into a preset loss function, and the model parameters of the second prediction model may be adjusted according to the calculated loss value.

The predetermined loss function may be:

wherein, M is the number of the first training samples, and i is the ith sample in the M first training samples. Y is _i To the predicted position offset, G, of the second prediction model output for the ith sample _i And aiming at the ith sample, the offset of the reference position is obtained by calculation after the user marks the sample.

In practical applications, the predicted positional deviation amount Y is _i And a reference position offset G _i And the position offset corresponding to the keypoint output by the second prediction model in the embodiment shown in fig. 1 can be represented as a matrix of W × H × 2N. And W x H is the sizes of the first training sample and the second image, and N is the number of the key points of the target object in the first training sample or the second image.

The model training process as shown in fig. 3 can also be understood in conjunction with fig. 4. Predicted position shift amount Y of matrix expressed as W H2N output by second prediction model _i The description is given for the sake of example: the matrix can be understood as a matrix of 2N W H, each W H being a matrix representing a keypoint of the first training sample on the X-axisOr an amount of positional offset on the Y-axis. The relationship between the matrix of W × H × 2N and the matrices of 2N W × H can be understood in connection with fig. 4. And in practical applications, the first prediction model is suitable for small-sized images, and the first prediction model may have a larger number of network layers in consideration of the balance between the amount of calculation and the accuracy of prediction. The second prediction mode is suitable for large-size images and has a smaller number of network layers.

The key point detection method provided by each embodiment can be deployed on a service platform to provide key point detection service for users. The detection device may be considered as a carrier of the service platform, and as mentioned in the above embodiments, the detection device may specifically be a terminal device used by a user, a remote server or a gate, a settlement device, and so on. Fig. 5 is a flowchart of another method for detecting a keypoint according to an embodiment of the present invention. The key point detection method provided by the embodiment of the invention can also be executed by detection equipment. As shown in fig. 5, the method may include the steps of:

s301, a request for invoking a detection service is received.

S302, in response to the shooting operation, in the first image containing the target object, the position of the key point of the target object in the first image is determined.

S303, according to a second image containing the target object and the position of the key point in the first image, determining the position offset corresponding to the key point, wherein the request comprises the first image and the second image, and the size of the second image is larger than that of the first image.

And S304, determining the position of the key point in the second image according to the position offset.

S305, displaying a second image marked with the related key points.

The user can generate a detection service request by means of different devices, such as a terminal device used by the user, a gate, a settlement device and the like, and send the request to the service platform. The request includes a first image and a second image having the same target object, and the size of the first image is smaller than the size of the second image. The specific manner of acquiring the first image and the second image may refer to the description in the above embodiments, and is not described herein again. Wherein, the user can trigger the shooting operation independently, and gate, settlement equipment etc. also can autonomically produce shooting equipment to the image of the different sizes that contains in the acquisition request.

After receiving the service request, the service platform may perform steps 301 to 305, so as to implement the key point detection on the large-sized second image. Finally, the detected key points of the target object can be marked in the second image and displayed to the user, and the user can also visually know the accuracy of key point detection. The presentation results can be as shown in fig. 2.

Optionally, after the position of the key point is obtained, in different scenes, functions of payment, identity recognition, beautifying and special effect addition can be further realized.

The specific implementation process of each step in this embodiment can refer to the related description of the embodiments shown in fig. 1 to fig. 4. The technical effects that can be achieved by the present embodiment may also refer to the descriptions in the foregoing embodiments, and are not described herein again.

For the convenience of understanding, the specific implementation process of the above-provided key point detection method may be further exemplarily described in conjunction with the following application scenarios.

In a live video scene or a self-shooting scene, a user can shoot a large-size original image through a mobile phone, a tablet personal computer and other terminal equipment. The terminal equipment can detect the face of the original image and intercept the image area where the face is located from the original image. And then obtaining a small-size first image and a large-size second image by reducing the image size. The size of the original image is 3000 × 3000, the size of the first image is 200 × 200, and the size of the second image is 2000 × 2000.

Then, the terminal device may output the first prediction model configured by the small-sized first image itself, so that the model outputs the position of the key point of the face in the small-sized first image, that is, the pixel coordinates of the N corresponding pixel points of the key point of the face in the first image are determined. The key points of the human face may include five sense organs and a face contour. And then according to the size relation of 10 times difference between the first image and the second image, mapping N pixel points output by the first prediction model into the second image to obtain the initial position of the key point in the second image, namely obtaining the corresponding 10 x N pixel points and respective pixel coordinates of the face key point in the second image.

Then, the terminal device inputs the second image with the large size into a second prediction model configured by the terminal device, so that the model determines the corresponding position offset of the key point in the second image according to the second image and 10 × n pixel points. And mapping the pixel coordinates of the pixel points obtained after mapping to the position offset output by the second prediction model in a one-to-one correspondence mode.

And finally, the terminal equipment correspondingly adds the pixel coordinates and the position offset of the 10 × n pixel points, so that the accurate position of the face key point in the second image is obtained. Furthermore, a glasses special effect can be added to the eyes of the user according to the identified key points.

The specific training process of the first prediction model and the second prediction model may be referred to in the above description of the embodiments shown in fig. 3 to 4.

The contents of the above-described scenario can be understood in conjunction with fig. 6.

Or in a scene needing face identification, for example, a gate at a building doorway can obtain a first image and a second image with different sizes, and the key points of the face are detected by the method, so that the function of identification is completed, and the user is released.

Of course, the gate may obtain the first image and the second image and send the first image and the second image to a remote server in communication connection with the gate, so that the remote server may perform the key point detection and the identity recognition.

The contents of the above-described scenario can be understood in conjunction with fig. 7.

In the logistics scene, express delivery personnel can use terminal equipment to shoot the parcel when getting on the door and getting the piece to obtain the original image that contains the parcel, terminal equipment carries out target recognition to the original image, and intercepts the image area that the parcel was located. A first image with a small size and a second image with a large size are obtained by adjusting the size of the images.

The terminal device may also perform a preliminary detection on a keypoint of a parcel in the first image by using the first prediction model, wherein the parcel is usually a geometric body with a regular shape, and the keypoint of the parcel can be regarded as a vertex of the geometric body. And based on the primary detection result output by the first prediction model, finely detecting the position of the key point of the parcel in the second image according to the position offset output by the second prediction model. The terminal equipment used by the express personnel can also display the parcel volume and the amount of freight to be paid.

The contents of the above-described scenario can be understood in conjunction with fig. 8.

Fig. 9 is a flowchart of a model training method according to an embodiment of the present invention, where the model training method according to the embodiment of the present invention may be executed by a detection device. It will be appreciated that the detection device may be implemented as software, or a combination of software and hardware. As shown in fig. 9, the method includes the steps of:

s401, a first training sample and a second training sample containing a target object are obtained, wherein the size of the first training sample is larger than that of the second training sample.

S402, inputting the first training sample into the first prediction model, and outputting the position of the key point of the target object in the first training sample by the first prediction model.

And S403, determining the reference position offset corresponding to the key point according to the output result of the first prediction model.

S404, taking the second training sample as input, taking the reference position offset as supervision information, and training a second prediction model.

It should be noted that the first training sample in this embodiment is a small-sized image, i.e., the second training sample in the embodiment shown in fig. 3; the second training sample in this embodiment is a large-size image, i.e., the first training sample in the embodiment shown in fig. 3.

Based on the above correspondence, the obtaining manner of the first training sample and the second training sample, the specific calculation process of the reference position offset, and the training process of the second prediction model may all refer to the related description in the embodiment shown in fig. 3, and are not described herein again. And for the first prediction model, too, it may be trained in advance to converge, and for the specific training process, see also the above description.

In this embodiment, the reference position offset serving as the monitoring information is not manually labeled, but is obtained based on the output result of the first prediction model and the labeling of the user on the key point in the training sample, so that the monitoring information is obtained more easily and accurately, and the training of the second prediction model is ensured to be easier.

Optionally, in order to ensure the effect of the model training, the model parameters of the second prediction model may be adjusted by means of a loss function. Alternatively, the concrete form of the loss function can also be seen in the above description.

The model training method provided by the embodiment can be deployed on a service platform to provide model training service for users. The server may be a carrier of the service platform. Fig. 10 is a flowchart of another model training method according to an embodiment of the present invention. As shown in fig. 10, the method may include the steps of:

s501, a request for calling a training service is received.

S502, responding to the input operation of the user, obtaining a first training sample and a second training sample containing the target object, wherein the first training sample and the second training sample are contained in the request, and the size of the first training sample is larger than that of the second training sample.

S503, inputting the first training sample into the first prediction model, so that the first prediction model outputs the position of the key point of the target object in the first training sample.

S504, determining the reference position offset corresponding to the key point according to the output result of the first prediction model.

And S505, taking the second training sample as input, taking the reference position offset as supervision information, and training a second prediction model.

And S506, outputting the model parameters of the second prediction model.

For users who have model training needs, such as maintenance personnel who can be live platforms, express platforms, property organizations of buildings, and the like, they can send training requests to the server. If the request includes the first training sample and the second training sample with different sizes, the training samples with different sizes can be regarded as input by the user to the server.

After receiving the training samples, the server can complete the training of the second prediction model by means of the first prediction model which is trained to be convergent, and finally, the server outputs model parameters of the second prediction model, so that users with model training requirements can obtain the parameters.

The specific implementation process of each step in this embodiment can refer to the related description of the embodiment shown in fig. 9. The technical effects that can be achieved by the present embodiment may also refer to the descriptions in the foregoing embodiments, and are not described herein again.

The keypoint detection device of one or more embodiments of the invention will be described in detail below. Those skilled in the art will appreciate that these keypoint detection means can be constructed using commercially available hardware components configured by the steps taught in the present scheme.

Fig. 11 is a schematic structural diagram of a keypoint detection apparatus according to an embodiment of the present invention, as shown in fig. 10, the apparatus includes:

a first determining module 11, configured to determine, in a first image containing a target object, a position of a keypoint of the target object in the first image.

A second determining module 12, configured to determine, according to a second image containing the target object and a position of the key point in the first image, a position offset corresponding to the key point, where a size of the second image is larger than a size of the first image.

A third determining module 13, configured to determine, according to the position offset, a position of the key point in the second image.

Optionally, the second determining module 12 is specifically configured to: determining the initial position of the key point in the second image according to the size relationship of the first image and the second image and the position of the key point in the first image; and determining the position offset corresponding to the key point according to the initial position and the second image.

The third determining module 13 is specifically configured to: and determining the target position of the key point in the second image according to the position offset and the initial position.

Optionally, the apparatus further comprises:

a first obtaining module 21, configured to obtain an original image.

An extracting module 22, configured to extract an image area where the target object is located in the original image.

A size adjusting module 23, configured to adjust a size of the image area to obtain the first image and the second image with different sizes.

Optionally, the first determining module 11 is specifically configured to: inputting the first image into a first prediction model to output the location of the keypoint in the first image by the first prediction model.

Optionally, the second determining module 12 is specifically configured to: inputting the second image into a second prediction model to output the position offset by the second prediction model according to the second image and the initial position.

Optionally, the apparatus further comprises:

a second obtaining module 24, configured to obtain a first training sample that includes the target object, where the first training sample and the second training sample are the same in size.

A fourth determining module 25, configured to determine, by means of the first prediction model, a reference position offset corresponding to a key point of the target object in the first training sample.

An input module 26, configured to train the second prediction model by using the first training sample as an input and using the reference position offset as supervision information.

Optionally, the apparatus further comprises: and a parameter adjusting module 27, configured to adjust a model parameter of the second prediction model according to the reference position offset and the predicted position offset output by the second prediction model.

Optionally, the fourth determining module 25 is specifically configured to:

acquiring a second training sample containing the target object, wherein the second training sample is the same as the first image in size;

inputting the second training sample into the first predictive model to output, by the first predictive model, a location of the keypoint in the second training sample;

determining the position of the key point in the first training sample according to the size relationship between the first training sample and the second training sample and the position of the key point in the second training sample;

and determining the difference between the positions of the key points in the first training sample and the positions of the key points in the first training sample which are marked in advance as the reference position offset.

The apparatus shown in fig. 11 can perform the method of the embodiment shown in fig. 1 to 4, and reference may be made to the related description of the embodiment shown in fig. 1 to 4 for a part not described in detail in this embodiment. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 4, which are not described herein again.

The internal functions and structures of the key point detecting apparatus are described above, and in one possible design, the structure of the key point detecting apparatus may be implemented as an electronic device, as shown in fig. 12, which may include: a processor 31 and a memory 32. Wherein the memory 32 is used for storing a program for supporting the electronic device to execute the key point detection method provided in the embodiments shown in fig. 1 to 4, and the processor 31 is configured to execute the program stored in the memory 32.

The program comprises one or more computer instructions which, when executed by the processor 31, are capable of performing the steps of:

Optionally, the processor 31 is further configured to perform all or part of the steps in the foregoing embodiments shown in fig. 1 to 4.

The electronic device may further include a communication interface 33, which is used for the electronic device to communicate with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the method for detecting a keypoint in the method embodiments shown in fig. 1 to 4.

Fig. 13 is a schematic structural diagram of another keypoint detection apparatus provided in an embodiment of the present invention, and as shown in fig. 13, the apparatus includes:

a receiving module 41, configured to receive a request for invoking a detection service.

An executing module 42, configured to execute the following steps according to the processing resource corresponding to the detection service:

and displaying the second image marked with the key points.

The apparatus shown in fig. 13 can perform the method of the embodiment shown in fig. 5, and reference may be made to the related description of the embodiment shown in fig. 5 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution are described in the embodiment shown in fig. 5, and are not described herein again.

While the internal functions and structure of the keypoint detection device have been described above, in one possible design, the structure of the keypoint detection device may be implemented as an electronic device, which may include, as shown in fig. 14: a processor 43 and a memory 44. Wherein the memory 44 is used for storing a program for supporting the electronic device to execute the key point detection method provided in the embodiment shown in fig. 5, and the processor 43 is configured to execute the program stored in the memory 44.

The program comprises one or more computer instructions which, when executed by the processor 43, are capable of carrying out the steps of:

in response to a shooting operation, in a first image containing a target object, determining the position of a key point of the target object in the first image;

and displaying the second image marked with the key points.

Optionally, the processor 43 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 5.

The electronic device may further include a communication interface 44 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the method for detecting a keypoint in the method embodiment shown in fig. 5.

The model training apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that each of these model training devices may be constructed using commercially available hardware components configured through the steps taught in this scheme.

Fig. 15 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention, and as shown in fig. 15, the apparatus includes:

the obtaining module 51 is configured to obtain a first training sample and a second training sample that include a target object, where a size of the first training sample is larger than a size of the second training sample.

An input module 52, configured to input the first training sample into a first prediction model, so that the first prediction model outputs a position of a key point of the target object in the first training sample.

A determining module 53, configured to determine, according to an output result of the first prediction model, a reference position offset corresponding to the key point.

And a training module 54, configured to train a second prediction model by using the second training sample as an input and using the reference position offset as supervision information.

Optionally, the apparatus further comprises: and a parameter adjusting module 55, configured to adjust a model parameter of the second prediction model according to the reference position offset and the predicted position offset output by the second prediction model.

Optionally, the determining module 53 is configured to determine, according to a size relationship between the first training sample and the second training sample and an output result of the first prediction model, a position of the keypoint in the second training sample; and determining the difference between the positions of the key points in the second training sample and the positions of the key points in the second training sample which are labeled in advance as the reference position offset.

Optionally, the apparatus further comprises: an extraction module 56 and a sizing module 57.

The obtaining module 51 is configured to obtain an original training sample.

The extracting module 56 is configured to extract an image area where the target object is located in the original training sample.

The size adjusting module 57 is configured to adjust the size of the image area to obtain the first training sample and the second training sample with different sizes.

The apparatus shown in fig. 15 can perform the method of the embodiment shown in fig. 9, and reference may be made to the related description of the embodiment shown in fig. 9 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution are described in the embodiment shown in fig. 9, and are not described herein again.

While the internal functions and structure of the model training apparatus are described above, in one possible design, the structure of the model training apparatus may be implemented as an electronic device, as shown in FIG. 16, which may include: a processor 61 and a memory 62. Wherein the memory 62 is used for storing a program for supporting the electronic device to execute the key point detection method provided in the embodiment shown in fig. 9, and the processor 61 is configured to execute the program stored in the memory 62.

The program comprises one or more computer instructions which, when executed by the processor 61, are capable of performing the steps of:

Optionally, the processor 61 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 9.

The electronic device may further include a communication interface 63 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the model training method in the method embodiment shown in fig. 9.

Fig. 17 is a schematic structural diagram of another model training apparatus according to an embodiment of the present invention, and as shown in fig. 17, the apparatus includes:

a receiving module 71, configured to receive a request for invoking a training service.

An executing module 72, configured to execute the following steps according to the processing resource corresponding to the training service:

inputting the first training sample into a first prediction model, so that the position of the key point of the target object in the first training sample is output by the first prediction model;

and outputting the model parameters of the second prediction model.

The apparatus shown in fig. 17 can perform the method of the embodiment shown in fig. 10, and reference may be made to the related description of the embodiment shown in fig. 10 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution are described in the embodiment shown in fig. 10, and are not described herein again.

Having described the internal functions and structure of the model training apparatus, in one possible design, the structure of the keypoint detecting apparatus can be implemented as an electronic device, as shown in fig. 18, which can include: a processor 73 and a memory 74. Wherein the memory 74 is used for storing a program for supporting the electronic device to execute the key point detecting method provided in the embodiment shown in fig. 10, and the processor 73 is configured to execute the program stored in the memory 74.

The program comprises one or more computer instructions which, when executed by the processor 73, are capable of performing the steps of:

and outputting the model parameters of the second prediction model.

Optionally, the processor 73 is further configured to perform all or part of the steps in the embodiment shown in fig. 10.

The electronic device may further include a communication interface 75 for the electronic device to communicate with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the model training method in the method embodiment shown in fig. 10.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting a keypoint, comprising:

in a first image containing a target object, determining the position of a key point of the target object in the first image;

mapping the positions of the key points in the first image to a second image according to the size relation between the first image and the second image containing the target object to obtain the initial positions of the key points in the second image;

taking each key point in the second image as a center, and intercepting an image area with a preset size from the second image;

inputting the cut-out image area into a second prediction model so as to determine the position offset corresponding to the key point by the second prediction model, wherein the size of the second image is larger than that of the first image;

determining the calculation result of the position offset and the initial position as the position of the key point in the second image.

2. The method of claim 1, wherein determining the position offset corresponding to the key point according to the initial position comprises:

and determining the position offset corresponding to the key point according to the initial position and the second image.

3. The method of claim 1, further comprising:

acquiring an original image;

extracting an image area where the target object is located from the original image;

and adjusting the size of the image area to obtain the first image and the second image with different sizes.

4. The method of claim 2, wherein determining, in a first image containing a target object, the location of a keypoint of the target object in the first image comprises:

inputting the first image into a first prediction model to output the location of the keypoint in the first image by the first prediction model.

5. The method of claim 4, wherein determining the position offset corresponding to the keypoint from the initial position and the second image comprises:

inputting the second image into a second prediction model to output the position offset by the second prediction model according to the initial position and the second image.

6. The method of claim 5, further comprising:

acquiring a first training sample containing the target object, wherein the first training sample and the second image have the same size;

determining, with the aid of the first prediction model, reference position offsets corresponding to the keypoints of the target object in the first training sample;

and taking the first training sample as input, taking the reference position offset as supervision information, and training the second prediction model.

7. The method of claim 6, further comprising:

and adjusting the model parameters of the second prediction model according to the reference position offset and the predicted position offset output by the second prediction model.

8. The method of claim 6, wherein determining, with the aid of the first predictive model, reference position offsets corresponding to the keypoints of the target object in the first training sample comprises:

acquiring a second training sample containing the target object, wherein the second training sample has the same size as the first image;

inputting the second training sample into the first prediction model to output the position of the key point in the second training sample by the first prediction model;

and determining the difference value of the positions of the key points in the first training sample and the positions of the key points in the first training sample which are marked in advance as the reference position offset.

9. A method for detecting a keypoint, comprising:

inputting the cut-out image area into a second prediction model so as to determine the position offset corresponding to the key point by the second prediction model, wherein the request comprises the first image and the second image, and the size of the second image is larger than that of the first image;

determining the calculated results of the position offset and the initial position as the positions of the key points in the second image;

and displaying the second image marked with the key points.

10. A method of model training, comprising:

inputting the second training sample into a first prediction model to output the position of the key point of the target object in the second training sample by the first prediction model;

determining the position of the key point in the first training sample according to the size relationship between the first training sample and the second training sample and the output result of the first prediction model; determining the difference value between the position of the key point in the first training sample and the position of the key point in the first training sample which is marked in advance as a reference position offset;

taking each key point in the first training sample as a center, and intercepting an image area with a preset size from the first training sample;

and taking the image area with the intercepted preset size as input, taking the reference position offset as supervision information, and training a second prediction model.

11. The method of claim 10, further comprising:

12. The method of claim 10, further comprising:

obtaining an original training sample;

extracting an image area where the target object is located from the original training sample;

adjusting the size of the image area to obtain the first training sample and the second training sample with different sizes.

13. A method of model training, comprising:

responding to an input operation of a user, acquiring a first training sample and a second training sample containing a target object, wherein the first training sample and the second training sample are contained in the request, and the size of the first training sample is smaller than that of the second training sample;

determining the position of the key point in the second training sample according to the size relationship between the first training sample and the second training sample and the output result of the first prediction model;

determining the difference value between the position of the key point in the second training sample and the position of the key point in the second training sample which is marked in advance as a reference position offset;

taking each key point in the second training sample as a center, and intercepting an image area with a preset size from the second training sample;

taking the image area with the intercepted preset size as input, taking the reference position offset as supervision information, and training a second prediction model;

and outputting the model parameters of the second prediction model.

14. A keypoint detection device, comprising:

the first determining module is used for determining the position of a key point of a target object in a first image containing the target object;

a second determining module, configured to map, according to a size relationship between the first image and a second image containing the target object, a position of the keypoint in the first image into the second image, so as to obtain an initial position of the keypoint in the second image;

inputting the cut image area into a second prediction model so as to determine the position offset corresponding to the key point by the second prediction model, wherein the size of the second image is larger than that of the first image;

and a third determining module, configured to determine the calculation result of the position offset and the initial position as the position of the keypoint in the second image.

15. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the keypoint detection method of any of claims 1 to 8.

16. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the keypoint detection method of any of claims 1 to 8.

17. A keypoint detection device, comprising:

mapping the positions of the key points in the first image to a second image containing the target object according to the size relation between the first image and the second image to obtain the initial positions of the key points in the second image;

and displaying the second image marked with the key points.

18. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the keypoint detection method of claim 9.

19. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the keypoint detection method of claim 9.

20. A model training apparatus, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first training sample and a second training sample containing a target object, and the size of the first training sample is larger than that of the second training sample;

an input module, configured to input the second training sample into a first prediction model, so that the first prediction model outputs a position of a key point of the target object in the second training sample;

the determining module is used for determining the position of the key point in the first training sample according to the size relation between the first training sample and the second training sample and the output result of the first prediction model; determining the difference between the position of the key point in the first training sample and the position of the key point in the first training sample which is marked in advance as a reference position offset; taking each key point in the first training sample as a center, and intercepting an image area with a preset size from the first training sample;

and the training module is used for taking the image area with the intercepted preset size as input, taking the reference position offset as supervision information and training a second prediction model.

21. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the model training method of any one of claims 10 to 12.

22. A non-transitory machine-readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the model training method of any one of claims 10 to 12.

23. A model training apparatus, comprising:

determining the difference between the position of the key point in the second training sample and the position of the key point in the second training sample which is marked in advance as a reference position offset;

and outputting the model parameters of the second prediction model.

24. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code that, when executed by the processor, causes the processor to perform the model training method of claim 13.

25. A non-transitory machine-readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the model training method of claim 13.