WO2022021029A1 - 检测模型训练方法、装置、检测模型使用方法及存储介质 - Google Patents

检测模型训练方法、装置、检测模型使用方法及存储介质 Download PDF

Info

Publication number
WO2022021029A1
WO2022021029A1 PCT/CN2020/104973 CN2020104973W WO2022021029A1 WO 2022021029 A1 WO2022021029 A1 WO 2022021029A1 CN 2020104973 W CN2020104973 W CN 2020104973W WO 2022021029 A1 WO2022021029 A1 WO 2022021029A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection model
feature
target
salient
image
Prior art date
Application number
PCT/CN2020/104973
Other languages
English (en)
French (fr)
Inventor
张雪
席迎来
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/104973 priority Critical patent/WO2022021029A1/zh
Priority to CN202080015995.2A priority patent/CN113490947A/zh
Publication of WO2022021029A1 publication Critical patent/WO2022021029A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present application relates to the technical field of computer vision, and in particular, to a detection model training method, an apparatus, a detection model use method, and a storage medium.
  • the technology of identifying objects in images has become one of the very important technologies in computer vision, and the application of deep learning in the field of image object detection has achieved great breakthroughs. For example, the region where a face is located can be identified from a given image.
  • the technical focus of the target detection algorithm in the existing detection model is on the accuracy of the detection result, so the scale of the existing detection model is large, which makes the existing detection model run slowly and cannot be used in resource allocation. If it is implemented on a small mobile terminal, if the scale of the model is reduced and applied to the mobile terminal, the performance of the detection model cannot be guaranteed, and the scope of use of the model is limited.
  • Embodiments of the present application provide a detection model training method, device, detection model use method, and storage medium, which can reduce the scale of the first detection model and improve the reliability and accuracy of the first detection model training.
  • an embodiment of the present application provides a detection model training method, including:
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • an embodiment of the present application further provides a detection model training apparatus, including a processor and a memory, where a computer program is stored in the memory, and the processor executes the implementation of the present application when calling the computer program in the memory Any of the detection model training methods provided in the example.
  • an embodiment of the present application also provides a method for using a detection model, which is applied to computer equipment, where the detection model is a first detection model after training, and the first detection model after training is implemented using the present application.
  • a model obtained by training any of the detection model training methods provided in the example is deployed in the computer equipment; the detection model using method includes:
  • the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • an embodiment of the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is loaded by a processor to execute:
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • This solution can use the trained second detection model to accurately train the first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which can reduce the number of first detection
  • the scale of the model, and the training of the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of training the first detection model, making the first detection model applicable. wide range.
  • FIG. 1 is a schematic flowchart of a detection model training method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an image of an area where an extraction target is located provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a process for preprocessing an initial image and key points of a human face provided by an embodiment of the present application;
  • FIG. 4 is a schematic diagram of generating multiple candidate regions provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for using a detection model provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of training a first detection model provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a detection model training apparatus provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a detection model training method provided by an embodiment of the application.
  • the detection model training method can be applied to a detection model training device, and is used to accurately train the smaller-scale first detection model by using the second detection model.
  • the detection model training device may include a mobile phone, a computer, a server, or an unmanned aerial vehicle.
  • the UAV can be a rotary-wing UAV, such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV, or a rotary-wing and fixed-wing unmanned aerial vehicle.
  • a rotary-wing UAV such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV, or a rotary-wing and fixed-wing unmanned aerial vehicle.
  • the combination of man and machine is not limited here.
  • the detection model training method may include steps S101 to S104 and so on.
  • the first detection model and the second detection model can be flexibly set according to actual needs, and the specific types are not limited here.
  • the first detection model and the second detection model can be neural networks.
  • the detection model training method can be applied to a distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • the distillation algorithm may be to use one or more trained teacher models (also called Teacher models, which may be larger-scale models) to guide the student model (also called the Student model, the Student model). can be a smaller model) for training.
  • the process of the distillation algorithm can be: Teacher model training, Student model training, and joint training with the Teacher model and the Student model to improve the performance of the Student model.
  • the teacher model and the student model can be trained separately through sample images. After the teacher model and the student model are trained separately, the parameters of the teacher model are fixed, that is, the teacher model only performs feature extraction and no parameter update is performed, and the student model continues. Distillation training.
  • distillation techniques there are a small number of distillation techniques that can be applied to the detection model, but they are based on the Two-stage (two-stage) target detection technology, which is not applicable to the one-stage (one-stage) target detection.
  • This application The embodiment can obtain the saliency area corresponding to the target object, and thereby obtain the salient area features of the first detection model (student model) and the second detection model (teacher model), so as to determine the first detection model based on the salient area features of the two.
  • the detection model is trained not only for the distillation algorithm of two-stage (two-stage) target detection, but also for the distillation algorithm of one-stage (one-stage) target detection, which has wider practicability and improves the efficiency of training.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the pre-trained second detection model may be used to guide the training of the first detection model.
  • the sample image may be acquired by a collection device such as a camera or a camera, or the sample image may be acquired from a preset local database or server, or the sample image may be an initial image obtained from the acquisition Generated after preprocessing such as rotation or scaling.
  • the sample image may contain a target, and the type of the target may be flexibly set according to actual needs.
  • the target may include objects such as a human face, a vehicle, a ball, or a dog.
  • a sample image can include multiple images, and the size of each sample image can be the same or different.
  • a sample image can contain one or more objects of the same type, or, a sample image
  • targets that can be included in , which are not specifically limited here.
  • the detection model training method may further include: acquiring an initial image; extracting an image of a region where the target is located from the initial image; extracting the target from the regional image
  • the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
  • the obtained initial images can be preprocessed to obtain abundant sample images, so as to use the abundant sample images to train the first detection model, so as to solve the problem of limited existing data resources.
  • the initial image may be collected by a collection device such as a camera or a camera, or the initial image may be obtained from a preset local database or server, or the like.
  • the initial image may contain objects.
  • the types of objects may include objects such as faces, vehicles, balls, or dogs.
  • the image of the area where the target is located can be extracted from the initial image.
  • the image of the area where the user's face is located can be extracted from the initial image containing the user; for another example, the image of the area containing the vehicle can be extracted.
  • the image of the area where the vehicle is located is extracted from the initial image.
  • the key points of the target can be extracted from the area image, and the number, shape, position or size of the key points can be flexibly set according to actual needs, and the specific content is not limited here.
  • key points such as eyes, nose, mouth, and contours of the face can be extracted from the image of the area where the face is located.
  • the wheels, lights, windows, and body of the vehicle can be extracted from the image of the area where the vehicle is located. etc. key points.
  • the initial image can be preprocessed to obtain a sample image
  • the keypoints can be preprocessed to obtain preprocessed keypoints.
  • preprocessing the initial image and the key points to obtain the sample image and the preprocessed key points may include: rotating, translating, zooming and/or adjusting the brightness of the initial image and the key points according to a preset angle , get the sample image and the preprocessed keypoints.
  • the preprocessing may be flexibly set according to actual needs, for example, the preprocessing may include processing such as rotation, cropping, flipping, translation, scaling, brightness reduction and/or brightness enhancement.
  • the preset angle can be flexibly set according to actual needs.
  • the way of preprocessing the initial image and the way of preprocessing keypoints can be consistent or inconsistent. For example, both the initial image and the key points can be rotated 90 degrees clockwise to obtain the sample image and the preprocessed key points; for another example, the initial image can be rotated 90 degrees clockwise to obtain the sample image, and the The key points are rotated 45 degrees clockwise to obtain the pre-processed key points.
  • the position information of the object in the sample image can be determined according to the preprocessed key points.
  • the position of the preprocessed key point in the sample image can be determined, and the position of the preprocessed key point in the sample image can be determined according to the position of the preprocessed key point in the sample image.
  • the area of the target object in the sample image is generated, and the area can be a rectangle or a square, and the position information of the target object in the sample image is determined based on the area of the target object in the sample image.
  • the position information may be the pixel coordinates of the target object, or the pixel coordinates of the vertex of the region of the target object in the sample image, or the like.
  • the process of preprocessing the initial image and the key points of the face may include:
  • the automatic preprocessing of the initial image and key points (also known as data enhancement processing) is realized, saving time and effort. It should be noted that the initial image and key points can also be preprocessed manually.
  • feature extraction can be performed on the sample image through the first detection model to obtain first feature information
  • feature extraction can be performed on the sample image through the second detection model to obtain second feature information
  • the salient area pos-anchors corresponding to the target can be determined based on the position information of the target in the sample image through the first detection model.
  • the salient area can be an area that is convenient for model learning, and can only include positive sample areas or positive samples. region and negative sample region, etc.
  • determining the salient area corresponding to the target object based on the position information of the target object in the sample image may include: acquiring a plurality of candidate areas; determining the target area of the target object based on the position information of the target object; From the candidate regions, select the region whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range can be an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
  • salient regions containing positive sample regions and negative sample regions can be obtained to train the model.
  • multiple candidate regions may be acquired.
  • acquiring multiple candidate regions may include: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the shape or size of the candidate region can be flexibly set according to actual needs.
  • the sample image can be detected by the second detection model to generate multiple candidate regions;
  • the multiple candidate regions marked in advance, and the multiple candidate regions marked in advance can be manually marked or automatically marked.
  • the target area of the target object is determined based on the position information, for example, the target area of the target object may be determined based on the pixel coordinate positions of the four vertex corners of the quadrilateral where the target object is located. Then, the degree of coincidence between each candidate area and the target area can be calculated separately.
  • the intersection over union (IOU) algorithm can be used to calculate the degree of coincidence between each candidate area and the target area: Obtain the candidate area The intersection area with the target area, and the union area between the candidate area and the target area are obtained, and the degree of coincidence between the candidate area and the target area is calculated according to the intersection area and the union area.
  • the calculation method of the degree of coincidence between the candidate area and the target area can be as follows: formula (1):
  • IOU(A, B) represents the degree of coincidence between the candidate area A and the target area B
  • represents the intersection area between the candidate area A and the target area B
  • represents the union area between the candidate area A and the target area B.
  • the degree of coincidence with the target region can be calculated by formula (1).
  • the degree of coincidence of each object can be calculated separately.
  • a region whose coincidence degree with the target region is greater than the first preset threshold can be selected from the multiple candidate regions to obtain a positive sample region.
  • the specific value of the first preset threshold can be flexibly set according to actual needs. If the coincidence degree between the candidate area and the target area is greater than the first preset threshold, it indicates that the similarity between the candidate area and the target area is high.
  • the classification probability value of each candidate region is calculated, and the value range of the classification probability value may be 0 to 1, for example, the classification probability value of the candidate region being a face region is 0.6 or 0.9.
  • a region with a degree of coincidence with the target region within a preset range and a classification probability value greater than a preset probability threshold can be selected from the multiple candidate regions to obtain a negative sample region, where the preset range is smaller than the first preset probability threshold.
  • the threshold value is set in an interval greater than the second preset threshold value, and the specific value of the second preset threshold value can be flexibly set according to actual needs.
  • the positive sample region and the negative sample region can be set as the salient regions corresponding to the target.
  • the positive sample area not only the positive sample area, but also the information of the negative sample area is obtained to train the first detection model, so that the training is more sufficient, the obtained first detection model is more accurate and reliable, and the shortage of existing training resources is solved.
  • the problem not only the positive sample area, but also the information of the negative sample area is obtained to train the first detection model, so that the training is more sufficient, the obtained first detection model is more accurate and reliable, and the shortage of existing training resources is solved.
  • the first salient region feature and the second salient region feature may be flexibly set according to actual needs, and the specific content is not limited here.
  • the first salient region feature may be a feature related to the first feature information in the salient region
  • the second salient region feature may be a feature related to the second feature information in the salient region.
  • the first salient region feature is acquired according to the first feature information and the salient region, and according to the second feature information and the salient region, Obtaining the features of the second salient region may include: obtaining first feature information in the positive sample region and the negative sample region, respectively, to obtain the first salient region feature; and obtaining the second feature information of the positive sample region and the negative sample region, respectively, to obtain The second salient regional feature.
  • the first detection model is used to detect the type and position of the target.
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature
  • obtaining the trained first detection model may include: acquiring the first salient region feature and the second salient region feature The similarity between the features; the loss value obtained by the first detection model for detecting the sample image; and the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model.
  • the similarity between the first salient region feature and the second salient region feature can be obtained, and the similarity can be calculated by Euclidean distance (ie Euclidean distance). characterization.
  • the similarity includes Euclidean distance
  • obtaining the similarity between the first salient region feature and the second salient region feature may include: determining the similarity between the first salient region feature and the second salient region feature Euclidean distance to get the similarity between the first salient region feature and the second salient region feature.
  • the Euclidean distance L2-loss (distill-loss) between the first salient region feature and the second salient region feature can be calculated, and the Euclidean distance L2-loss is the first salient region feature and the second salient region feature.
  • the loss value loss obtained by detecting the sample image by the first detection model is obtained, and then the parameters of the first detection model can be adjusted according to the similarity L2-loss and the loss value loss to obtain the trained first detection model.
  • adjusting the parameters of the first detection model according to the similarity and the loss value, and obtaining the trained first detection model may include: performing a weighted average operation on the similarity and the loss value to obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the parameters of the first detection model may be adjusted according to the target loss value, so that the parameters of the first detection model are adjusted to appropriate values, and the trained first detection model is obtained. Therefore, a high-precision trained first detection model that meets the requirements can be obtained under the dual constraints of limited computing resources, and a large amount of collected data can be saved, saving time and resources, on the premise of achieving the same effect.
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature, and after the trained first detection model is obtained, the detection model training method may further include: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • the trained first detection model can be used to accurately detect the target in the image.
  • the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like.
  • the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained.
  • the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame.
  • This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • the second detection model can be used to accurately train the smaller-scale first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which reduces the size of the first detection model.
  • the scale of the first detection model, and training the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of the training of the first detection model, so that the scope of application of the first detection model can be improved. wide.
  • FIG. 5 is a schematic flowchart of a method for using a detection model provided by an embodiment of the application.
  • the method for using the detection model can be applied to computer equipment for accurately detecting the target in the image based on the trained first detection model.
  • the computer equipment may include mobile terminals, drones, servers, cameras, etc., and the mobile terminals may include mobile phones and tablet computers.
  • the detection model is a trained first detection model, and the trained first detection model is a model obtained by using the above-mentioned detection model training method, and is deployed in a computer device.
  • the process of training the first detection model may include:
  • the method for using the detection model may include steps S201 to S202 and so on.
  • S202 Detect the target in the image by using the trained first detection model to obtain target position information of the target in the image.
  • the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like.
  • the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained.
  • the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame.
  • the first detection model after training is used to accurately detect the target in the image.
  • FIG. 7 is a schematic block diagram of a detection model training apparatus provided by an embodiment of the present application.
  • the detection model training apparatus 11 may include a processor 111 and a memory 112, and the processor 111 and the memory 112 are connected through a bus, such as an I2C (Inter-integrated Circuit) bus.
  • I2C Inter-integrated Circuit
  • the processor 111 may be a micro-controller unit (Micro-controller Unit, MCU), a central processing unit (Central Processing Unit, CPU), or a digital signal processor (Digital Signal Processor, DSP) or the like.
  • MCU Micro-controller Unit
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • the memory 112 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk, or a mobile hard disk, etc., and may be used to store computer programs.
  • ROM Read-Only Memory
  • the memory 112 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk, or a mobile hard disk, etc., and may be used to store computer programs.
  • the processor 111 is configured to call the computer program stored in the memory 112, and implement the detection model training method provided by the embodiment of the present application when executing the computer program, for example, the following steps may be performed:
  • the parameters of the first detection model are adjusted to obtain the trained first detection model.
  • the processor 111 when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor 111 is configured to execute: obtain the first salient region feature. The similarity between the regional feature and the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value, and obtain the first detection model after training.
  • a detection model is configured to execute: obtain the first salient region feature. The similarity between the regional feature and the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value, and obtain the first detection model after training.
  • the processor 111 when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor 111 is configured to perform: weighted average operation on the similarity and the loss value , obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the similarity includes Euclidean distance.
  • the processor 111 is configured to perform: determining the first salient region feature and the second salient region feature. The Euclidean distance between the two salient region features, to obtain the similarity between the first salient region feature and the second salient region feature.
  • the processor 111 when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, is configured to perform: acquiring a plurality of candidate regions; determining the target region of the target object based on the position information; A positive sample region is obtained by screening out the regions whose coincidence degree with the target region is greater than the first preset threshold from the plurality of candidate regions; screening out from the plurality of candidate regions that the coincidence degree with the target region is within a preset range, and the classification probability is The area whose value is greater than the preset probability threshold value is obtained as a negative sample area, and the preset range is an interval smaller than the first preset threshold value and greater than the second preset threshold value; the positive sample area and the negative sample area are set as the significant area corresponding to the target object .
  • the processor 111 when acquiring multiple candidate regions, is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the processor 111 when acquiring the first salient region feature according to the first feature information and the salient region, and acquiring the second salient region feature according to the second feature information and the salient region, the processor 111 is configured to execute: respectively acquiring The first feature information in the positive sample region and the negative sample region is obtained to obtain the first salient region feature; and the second feature information of the positive sample region and the negative sample region are respectively obtained to obtain the second salient region feature.
  • the processor 111 is further configured to execute: acquiring the to-be-detected Detect the target object in the image through the trained first detection model, and obtain the target position information of the target object in the image.
  • the processor 111 before the feature extraction is performed on the sample image by the first detection model, is configured to perform: acquiring an initial image; extracting an image of the region where the target object is located from the initial image; extracting the target object from the region image
  • the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
  • the processor 111 when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor 111 is configured to perform: rotate and translate the initial image and the key points according to a preset angle , scaling and/or brightness adjustment to obtain sample images and preprocessed keypoints.
  • the target includes a human face.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • Embodiments of the present application further provide a storage medium, where the storage medium is a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program includes program instructions, and the processor executes the program instructions to realize the provision of the embodiments of the present application.
  • the detection model training method For example, a processor can execute:
  • the parameters of the first detection model are adjusted to obtain the trained first detection model.
  • the processor when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor is configured to execute: acquiring the first salient region feature and the similarity between the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value to obtain the first detection model after training Model.
  • the processor when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor is configured to perform: performing a weighted average operation on the similarity and the loss value to obtain target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the similarity includes Euclidean distance
  • the processor when acquiring the similarity between the first salient region feature and the second salient region feature, the processor is configured to perform: determining the first salient region feature and the second salient region feature Euclidean distance between regional features to get the similarity between the first salient region feature and the second salient region feature.
  • the processor when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, is configured to: obtain a plurality of candidate regions; determine the target region of the target object based on the position information; From the candidate regions, select the regions whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range is an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
  • the processor when acquiring multiple candidate regions, is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the processor when obtaining the first salient region feature according to the first feature information and the salient region, and obtaining the second salient region feature according to the second feature information and the salient region, the processor is configured to execute: respectively obtaining positive samples obtaining the first salient region feature by obtaining the first feature information in the region and the negative sample region; and obtaining the second feature information of the positive sample region and the negative sample region respectively to obtain the second salient region feature.
  • the processor is further configured to perform: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • the processor before the feature extraction is performed on the sample image by the first detection model, the processor is configured to perform: acquiring an initial image; extracting an image of an area where the target is located from the initial image; extracting the key of the target from the area image point; preprocess the initial image and key points to obtain the sample image and the preprocessed key points; determine the position information of the target in the sample image according to the preprocessed key points.
  • the processor when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor is configured to perform: rotate, translate, and zoom the initial image and the key points according to a preset angle and/or brightness adjustment to obtain sample images and preprocessed keypoints.
  • the target includes a human face.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • the storage medium may be an internal storage unit of the detection model training apparatus described in any of the foregoing embodiments, such as a hard disk or a memory of the detection model training apparatus.
  • the storage medium can also be an external storage device of the detection model training device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card equipped on the detection model training device (Flash Card) etc.
  • the computer program stored in the storage medium can execute any detection model training method provided by the embodiments of the present application, it is possible to implement any of the detection model training methods provided by the embodiments of the present application.
  • the beneficial effects refer to the foregoing embodiments for details, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种检测模型训练方法、装置、检测模型使用方法及存储介质,包括,通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对样本图像进行特征提取,得到第二特征信息(S101);基于样本图像中目标物的位置信息,确定目标物对应的显著区域(S102);根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征(S103);根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型(S104)。提高了对第一检测模型训练的可靠性和准确性。

Description

检测模型训练方法、装置、检测模型使用方法及存储介质 技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种检测模型训练方法、装置、检测模型使用方法及存储介质。
背景技术
随着科技的发展,以及深度学习的兴起,对图像中目标物进行识别的技术已成为计算机视觉非常重要技术之一,并且使用深度学习在图像目标检测领域中的应用得到巨大的突破。例如,可以从给定的图像中识别出人脸所在的区域。
目前,现有的检测模型中目标检测算法的技术重心是放在检测结果的准确率上,因此现有的检测模型的规模较大,使得现有的检测模型运行速度较慢且无法在资源配置较小的移动终端上实施,若减小模型规模应用到移动终端,则无法保证检测模型的性能,并且限制了模型的使用范围。
发明内容
本申请实施例提供一种检测模型训练方法、装置、检测模型使用方法及存储介质,可以减小第一检测模型的规模,以及提高对第一检测模型训练的可靠性和准确性。
第一方面,本申请实施例提供了一种检测模型训练方法,包括:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;
基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;
根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;
根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
第二方面,本申请实施例还提供了一种检测模型训练装置,包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时执行本申请实施例提供的任一种检测模型训练方法。
第三方面,本申请实施例还提供了一种检测模型使用方法,应用于计算机 设备,所述检测模型为训练后的第一检测模型,所述训练后的第一检测模型为采用本申请实施例提供的任一种检测模型训练方法进行训练得到的模型,并部署在所述计算机设备中;所述检测模型使用方法包括:
获取待检测的图像;
通过所述训练后的第一检测模型对所述图像中的目标物进行检测,得到所述目标物在所述图像中的目标位置信息。
第四方面,本申请实施例还提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序被处理器加载以执行:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;
基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;
根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;
根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
本申请实施例可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。然后,可以基于样本图像中目标物的位置信息,确定目标物对应的显著区域,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。此时,可以根据第一显著区域特征和第二著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。该方案可以利用训练好了的第二检测模型对第一检测模型进行准确训练,以便后续可以利用训练后的第一检测模型可以应用到移动终端对目标物进行检测,可以减小了第一检测模型的规模,以及,基于确定的目标物对应的显著区域及其显著区域特征对第一检测模型进行训练,可以提高对第一检测模型训练的可靠性和准确性,使得第一检测模型的适用范围广。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以 根据这些附图获得其他的附图。
图1是本申请实施例提供的检测模型训练方法的流程示意图;
图2是本申请实施例提供的提取目标物所在的区域图像的示意图;
图3是本申请实施例提供的对初始图像和人脸的关键点进行预处理的流程的流程示意图;
图4是本申请实施例提供的生成多个候选区域的示意图;
图5是本申请实施例提供的检测模型使用方法的流程示意图;
图6是本申请实施例提供的对第一检测模型进行训练的流程示意图;
图7是本申请实施例提供的检测模型训练装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
请参阅图1,图1是申请一实施例提供的一种检测模型训练方法的流程示意图。该检测模型训练方法可以应用在检测模型训练装置中,用于通过第二检测模型对规模较小的第一检测模型进行准确训练。其中检测模型训练装置可以包括手机、电脑、服务器或无人机等。
其中,无人机可以为旋翼型无人机,例如四旋翼无人机、六旋翼无人机、八旋翼无人机,也可以是固定翼无人机,还可以是旋翼型与固定翼无人机的组合,在此不作限定。
具体地,如图1所示,该检测模型训练方法可以包括步骤S101至步骤S104等。
S101、通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。
其中,第一检测模型和第二检测模型可以根据实际需要进行灵活设置,具体类型在此处不作限定,例如第一检测模型和第二检测模型可以神经网络。
在一些实施方式中,检测模型训练方法可以应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。
其中,蒸馏算法可以是,用一个或多个训练好的教师模型(也可以称为Teacher模型,该Teacher模型可以是规模较大的模型)指导学生模型(也可以称为Student模型,该Student模型可以是规模较小的模型)进行训练。蒸馏算法的流程可以是:Teacher模型训练、Student模型训练、以及用Teacher模型和Student模型联合训练达到提升Student模型性能的目的。例如,可以通过样本图像分别对Teacher模型和Student模型进行训练,Teacher模型和Student模型分别训练完成后,将Teacher模型的参数固定,即Teacher模型只作特征提取不再做参数更新,Student模型继续进行蒸馏训练。
在现有技术中,有少量的蒸馏技术可以应用到检测模型中,但是是基于Two-stage(两阶段)目标检测技术的,对于one-stage(一阶段)目标检测不适用,然而,本申请实施例可以通过获取目标物对应的显著性区域,并由此获取第一检测模型(student模型)和第二检测模型(teacher模型)的显著区域特征,从而基于两者的显著区域特征对第一检测模型进行训练,不仅应用于Two-stage(两阶段)目标检测的蒸馏算法,还适用于one-stage(一阶段)目标检测的蒸馏算法,实用性更广,提高训练的效率。
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。为了提高对第一检测模型训练的准确性,可以利用预先训练好的,第二检测模型来指导第一检测模型的训练。
其中,样本图像可以是通过摄像头或照相机等采集设备采集得到的,或者是,样本图像可以是从预设的本地数据库或服务器上获取得到的,或者是,样本图像可以是对获取得到的初始图像进行旋转或缩放等预处理后生成的。该样本图像中可以包含目标物,该目标物的类型可以根据实际需要进行灵活设置,例如,该目标物可以包括人脸、车辆、球或狗等物体。需要说明的是,样本图像可以包括多张,每张样本图像的大小可以一样也可以不一样,一张样本图像中包含的同一类型的目标物可以是一个或多个,或者,一张样本图像中可以包含的多种不同类型的目标物,具体在此处不作限定。
在一些实施方式中,通过第一检测模型对样本图像进行特征提取之前,检 测模型训练方法还可以包括:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。
为了丰富样本图像,以及扩大模型学习范围,可以对获取到的初始图像进行预处理以得到丰富的样本图像,以便利用丰富的样本图像对第一检测模型进行训练,解决现有数据资源受限而无法充分训练的问题。具体地,可以是通过摄像头或照相机等采集设备采集初始图像,或者是,可以是从预设的本地数据库或服务器上获取初始图像等。该初始图像中可以包含目标物,例如,目标物类型可以包括人脸、车辆、球或狗等物体。
然后,可以从初始图像中提取目标物所在的区域图像,例如,如图2所示,可以从包含用户的初始图像中提取该用户的人脸所在的区域图像;又例如,可以从包含车辆的初始图像中提取该车辆所在的区域图像。此时,可以从区域图像中提取目标物的关键点,该关键点的数量、形状、位置或大小等可以根据实际需要进行灵活设置,具体内容在此处不作限定。例如,可以从人脸所在的区域图像中提取人脸的眼睛、鼻子、嘴巴以及轮廓等关键点,又例如,可以从车辆所在的区域图像中提取车辆的车轮、车灯、车窗、以及车身等关键点。
此时,可以对初始图像进行预处理,得到样本图像,以及对关键点进行预处理,得到预处理后的关键点。在一些实施方式中,对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点可以包括:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。
其中,预处理可以根据实际需要进行灵活设置,例如,预处理可以包括旋转、剪裁、翻转、平移、缩放、亮度减弱和/或亮度增强等处理。该预设角度可以根据实际需要进行灵活设置。要说明的是,对初始图像进行预处理的方式和对关键点进行预处理的方式可以一致,或不一致。例如,可以对初始图像和关键点均进行顺时针的90度旋转,得到样本图像和预处理后的关键点;又例如,可以对初始图像进行顺时针的90度旋转,得到样本图像,以及对关键点进行顺时针的45度旋转,得到预处理后的关键点。
最后,可以根据预处理后的关键点确定样本图像中目标物的位置信息,例如,确定预处理后的关键点在样本图像中的位置,并根据预处理后的关键点在 样本图像中的位置生成目标物在样本图像中的区域,该区域可以是矩形或正方形等,基于目标物在样本图像中的区域确定样本图像中目标物的位置信息。该位置信息可以是目标物的像素坐标,或者是目标物在样本图像中的区域的顶角像素坐标等。
如图3所示,以目标物为人脸为例,对初始图像和人脸的关键点进行预处理的流程可以包括:
S11、获取初始图像image。
S12、根据已知的人脸框从初始图像中提取出人脸区域图像face_image。
S13、提取人脸区域图像face_image的人脸关键点face_landmarks。
S14、将初始图像image和人脸关键点face_landmarks旋转任意随机角度,得到旋转后的图像rotate_image和旋转后的人脸关键点rotate_landmarks。
S15、根据旋转后的人脸关键点rotate_landmarks计算人脸框rotate_box,即人脸的位置信息。
S16、保存旋转后的图像rotate_image和人脸框rotate_box。
实现了自动对初始图像和关键点进行预处理(也可以称为数据增强处理),省时省力。需要说明的是,还可以人工手动对对初始图像和关键点进行预处理等。
在得到样本图像和目标物的位置信息后,可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。
S102、基于样本图像中目标物的位置信息,确定目标物对应的显著区域。
可以通过第一检测模型基于样本图像中目标物的位置信息,确定目标物对应的显著区域pos-anchors,该显著区域可以是便于模型学习的区域,可以仅包括正样本区域,还可以包括正样本区域和负样本区域等。
在一些实施方式中,基于样本图像中目标物的位置信息,确定目标物对应的显著区域可以包括:获取多个候选区域;基于所述目标物的位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围可以为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。
为了提高显著区域的可靠性,以便提高对模型训练的精准性以及提升模型的性能,可以获取包含正样本区域和负样本区域的显著区域对模型进行训练。具体地,首先可以获取多个候选区域,在一些实施方式中,获取多个候选区域可以包括:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。
其中,候选区域的形状或大小等可以根据实际需要进行灵活设置,例如,如图4所示,可以通过第二检测模型对样本图像进行检测,生成多个候选区域;又例如,可以直接获取预先标注的多个候选区域,预先标注的多个候选区域可以是人工标注或自动标注等。
以及,基于位置信息确定目标物的目标区域,例如,可以基于目标物所在四边形的四个顶角的像素坐标位置确定目标物的目标区域。然后,可以分别计算每个候选区域与目标区域之间的重合度,例如,可以采用交并比算法(Intersection over Union,IOU)计算每个候选区域与目标区域之间的重合度:获取候选区域与目标区域之间的交集面积,以及获取候选区域与目标区域之间的并集面积,根据交集面积和并集面积计算候选区域与目标区域之间的重合度。
其中,候选区域与目标区域之间的重合度的计算方式可以如下公式(1):
Figure PCTCN2020104973-appb-000001
公式(1)中,IOU(A,B)表示候选区域A与与目标区域B之间的重合度,|A∩B|表示候选区域A与与目标区域B之间的交集面积,|A∪B|表示候选区域A与与目标区域B之间的并集面积。
对于多个候选区域,均可以通过公式(1)计算得到其与目标区域的重合度。当样本图像中包括多个目标物时,可以分别计算每个目标物的重合度。
然后,可以从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域,该第一预设阈值的具体取值可以根据实际需要进行灵活设置,若候选区域与目标区域的重合度大于第一预设阈值,则说明该候选区域与目标区域之间的相似度较高。
以及,计算每个候选区域的分类概率值,该分类概率值的取值范围可以是0至1,例如候选区域为人脸区域的分类概率值为0.6或0.9等。此时可以从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概 率阈值的区域,得到负样本区域,其中,预设范围为小于第一预设阈值且大于第二预设阈值的区间,该第二预设阈值的具体取值可以根据实际需要进行灵活设置。最后,可以将正样本区域和负样本区域设置为目标物对应的显著区域。
在本发明实施例中,不仅获取正样本区域,还获取负样本区域的信息对第一检测模型进行训练,使得训练更充分,得到的第一检测模型更准确和可靠,解决现有训练资源不足的问题。
S103、根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。
其中,第一显著区域特征和第二显著区域特征可以根据实际需要进行灵活设置,具体内容在此处不作限定。例如,第一显著区域特征可以是显著区域中与第一特征信息相关的特征,第二显著区域特征可以是显著区域中与第二特征信息相关的特征。
为了提高第一显著区域特征和第二显著区域特征获取的准确性,在一些实施方式中,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征可以包括:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。
S104、根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在本实施例中,所述第一检测模型用于检测目标物的类型和位置。
在一些实施方式中,根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型可以包括:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
为了提高对第一检测模型训练的可靠性和准确性,可以获取第一显著区域特征和第二显著区域特征之间的相似度,该相似度可以欧几里得距离(即欧氏距离)来表征。
在一些实施方式中,相似度包括欧几里得距离,获取第一显著区域特征和第二显著区域特征之间的相似度可以包括:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之 间的相似度。例如,可以计算第一显著区域特征和第二显著区域特征之间的欧几里得距离L2-loss(distill-loss),该欧几里得距离L2-loss即为第一显著区域特征和第二显著区域特征之间的相似度。
以及,获取第一检测模型对样本图像进行检测得到的损失值loss,然后,可以根据相似度L2-loss和损失值loss对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,根据所述相似度和所述损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型可以包括:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
例如,可以将相似度L2-loss和损失值loss相加并取平均值,得到目标损失值=(L2-loss+loss)/2。又例如,可以设置相似度L2-loss的权重值为A,以及设置损失值loss的权重值为B,其中,A+B=1,此时,将相似度L2-loss和损失值loss分别乘以对应的权重值并求和,得到目标损失值=L2-loss*A+loss*B。
然后,可以根据目标损失值对第一检测模型的参数进行调整,以使得第一检测模型的参数调整至合适的数值,得到训练后的第一检测模型。从而能够在受限的计算资源的双重限制下得到满足需求的高精度的训练后的第一检测模型,在达到相同效果的前提下能够节省较大的采集数据量,省时省资源。
在一些实施方式中,根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,检测模型训练方法还可以包括:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。
在得到训练后的第一检测模型后,可以利用训练后的第一检测模型对图像中的目标物进行精准检测。例如,可以是通过摄像头或照相机等采集设备采集待检测的图像,或者是,可以是从预设的本地数据库或服务器上获取待检测的图像等。此时可以通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。例如,可以通过训练后的第一检测模型对图像中的人脸进行检测,得到人脸在图像中的目标位置信息,该目标位置信息可以是多边形(例如四边形)人脸框的顶角位置。
本申请实施例可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信 息。然后,可以基于样本图像中目标物的位置信息,确定目标物对应的显著区域,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。此时,可以根据第一显著区域特征和第二著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。该方案可以利用第二检测模型对规模较小的第一检测模型进行准确训练,以便后续可以利用训练后的第一检测模型可以应用到移动终端对目标物进行检测,减小了第一检测模型的规模,以及,基于确定的目标物对应的显著区域及其显著区域特征对第一检测模型进行训练,可以提高对第一检测模型训练的可靠性和准确性,使得第一检测模型的适用范围广。
请参阅图5,图5是申请一实施例提供的一种检测模型使用方法的流程示意图。该检测模型使用方法可以应用在计算机设备中,用于基于训练后的第一检测模型对图像中的目标物进行精准检测。其中计算机设备可以包括移动终端、无人机、服务器和相机等,移动终端可以包括手机和平板电脑等。检测模型为训练后的第一检测模型,训练后的第一检测模型为采用上述检测模型训练方法进行训练得到的模型,并部署在计算机设备中。
例如,如图6所示,对第一检测模型进行训练的流程可以包括:
S21、获取样本图像。
S22、基于样本图像对Teacher模型(T-model)进行训练。
S23、基于样本图像对Student模型(S-model)进行训练。
S24、固定Teacher模型的参数,通过Teacher模型提取样本图像的特征feature-T。
S25、通过Student模型提取样本图像的特征feature-S,以及提取显著区域pos_anchors。
S26、根据显著性区pos_anchors以及特征feature-T计算显著区域特征pos_feat_T,以及根据显著性区pos_anchors以及特征feature-S计算显著区域特征pos_feat_S。
S27、计算显著区域特征pos_feat_T和显著区域特征pos_feat_S的欧氏距离L2-loss。
S28、计算Student模型的原始损失值loss。
S29、计算欧氏距离L2-loss和原始损失值loss的加权平均,并进行Student模型的再次训练finetune,得到distill-S-model(训练后的Student模型)并保存。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。
具体地,如图5所示,该检测模型使用方法可以包括步骤S201至步骤S202等。
S201、获取待检测的图像。
S202、通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。
例如,可以是通过摄像头或照相机等采集设备采集待检测的图像,或者是,可以是从预设的本地数据库或服务器上获取待检测的图像等。此时可以通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。例如,可以通过训练后的第一检测模型对图像中的人脸进行检测,得到人脸在图像中的目标位置信息,该目标位置信息可以是多边形(例如四边形)人脸框的顶角位置。实现了利用训练后的第一检测模型对图像中的目标物进行精准检测。
请参阅图7,图7是本申请一实施例提供的检测模型训练装置的示意性框图。该检测模型训练装置11可以包括处理器111和存储器112,处理器111和存储器112通过总线连接,该总线比如为I2C(Inter-integrated Circuit)总线。
具体地,处理器111可以是微控制单元(Micro-controller Unit,MCU)、中央处理单元(Central Processing Unit,CPU)或数字信号处理器(Digital Signal Processor,DSP)等。
具体地,存储器112可以是Flash芯片、只读存储器(ROM,Read-Only Memory)磁盘、光盘、U盘或移动硬盘等,可以用于存储计算机程序。
其中,处理器111用于调用存储在存储器112中的计算机程序,并在执行计算机程序时实现本申请实施例提供的检测模型训练方法,例如可以执行如下步骤:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息;基于样本图像中目标物的位置信息,确定目标物对应的显著区域;根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征;根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器111用于执行:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,在根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器111用于执行:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,相似度包括欧几里得距离,在获取第一显著区域特征和第二显著区域特征之间的相似度时,处理器111用于执行:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之间的相似度。
在一些实施方式中,在基于样本图像中目标物的位置信息,确定目标物对应的显著区域时,处理器111用于执行:获取多个候选区域;基于位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。
在一些实施方式中,在获取多个候选区域时,处理器111用于执行:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。
在一些实施方式中,在根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征时,处理器111用于执行:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,处理器111还用于执行:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。
在一些实施方式中,在通过第一检测模型对样本图像进行特征提取之前,处理器111用于执行:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。
在一些实施方式中,在对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点时,处理器111用于执行:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。
在一些实施方式中,目标物包括人脸。
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。
在一些实施方式中,存储介质应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。
本申请的实施例中还提供一种存储介质,该存储介质为计算机可读存储介质,该存储介质存储有计算机程序,计算机程序中包括程序指令,处理器执行程序指令,实现本申请实施例提供的检测模型训练方法。例如,处理器可以执行:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息;基于样本图像中目标物的位置信息,确定目标物对应的显著区域;根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征;根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器用于执行:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,在根据相似度和损失值对第一检测模型的参数进行调 整,得到训练后的第一检测模型时,处理器用于执行:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。
在一些实施方式中,相似度包括欧几里得距离,在获取第一显著区域特征和第二显著区域特征之间的相似度时,处理器用于执行:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之间的相似度。
在一些实施方式中,在基于样本图像中目标物的位置信息,确定目标物对应的显著区域时,处理器用于执行:获取多个候选区域;基于位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。
在一些实施方式中,在获取多个候选区域时,处理器用于执行:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。
在一些实施方式中,在根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征时,处理器用于执行:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,处理器还用于执行:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。
在一些实施方式中,在通过第一检测模型对样本图像进行特征提取之前,处理器用于执行:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。
在一些实施方式中,在对初始图像和关键点进行预处理,得到样本图像和 预处理后的关键点时,处理器用于执行:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。
在一些实施方式中,目标物包括人脸。
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。
在一些实施方式中,存储介质应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。
其中,存储介质可以是前述任一实施例所述的检测模型训练装置的内部存储单元,例如检测模型训练装置的硬盘或内存。存储介质也可以是检测模型训练装置的外部存储设备,例如检测模型训练装置上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
由于该存储介质中所存储的计算机程序,可以执行本申请实施例所提供的任一种检测模型训练方法,因此,可以实现本申请实施例所提供的任一种检测模型训练方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到 各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (29)

  1. 一种检测模型训练方法,其特征在于,包括:
    通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;
    基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;
    根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;
    根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  2. 根据权利要求1所述的检测模型训练方法,其特征在于,所述根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型包括:
    获取所述第一显著区域特征和所述第二显著区域特征之间的相似度;
    获取所述第一检测模型对所述样本图像进行检测得到的损失值;
    根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  3. 根据权利要求2所述的检测模型训练方法,其特征在于,所述根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型包括:
    对所述相似度和所述损失值进行加权平均运算,得到目标损失值;
    根据所述目标损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  4. 根据权利要求2所述的检测模型训练方法,其特征在于,所述相似度包括欧几里得距离,所述获取所述第一显著区域特征和所述第二显著区域特征之间的相似度包括:
    确定所述第一显著区域特征和所述第二显著区域特征之间的欧几里得距离,得到所述第一显著区域特征和所述第二显著区域特征之间的相似度。
  5. 根据权利要求1所述的检测模型训练方法,其特征在于,所述基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域包括:
    获取多个候选区域;
    基于所述位置信息确定所述目标物的目标区域;
    从所述多个候选区域中筛选出与所述目标区域的重合度大于第一预设阈值的区域,得到正样本区域;
    从所述多个候选区域中筛选出与所述目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,所述预设范围为小于所述第一预设阈值且大于第二预设阈值的区间;
    将所述正样本区域和所述负样本区域设置为所述目标物对应的显著区域。
  6. 根据权利要求5所述的检测模型训练方法,其特征在于,所述获取多个候选区域包括:
    基于所述第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。
  7. 根据权利要求5所述的检测模型训练方法,其特征在于,所述根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征包括:
    分别获取所述正样本区域和所述负样本区域中的所述第一特征信息,得到第一显著区域特征;以及,
    分别获取所述正样本区域和所述负样本区域的所述第二特征信息,得到第二显著区域特征。
  8. 根据权利要求1所述的检测模型训练方法,其特征在于,所述根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型之后,所述检测模型训练方法还包括:
    获取待检测的图像;
    通过所述训练后的第一检测模型对所述图像中的所述目标物进行检测,得到所述目标物在所述图像中的目标位置信息。
  9. 根据权利要求1所述的检测模型训练方法,其特征在于,所述通过第一检测模型对样本图像进行特征提取之前,所述检测模型训练方法还包括:
    获取初始图像;
    从所述初始图像中提取所述目标物所在的区域图像;
    从所述区域图像中提取所述目标物的关键点;
    对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后 的关键点;
    根据所述预处理后的关键点确定所述样本图像中所述目标物的位置信息。
  10. 根据权利要求9所述的检测模型训练方法,其特征在于,所述对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点包括:
    对所述初始图像和所述关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到所述样本图像和预处理后的关键点。
  11. 根据权利要求1所述的检测模型训练方法,其特征在于,所述目标物包括人脸。
  12. 根据权利要求1至11任一项所述的检测模型训练方法,其特征在于,所述第一检测模型的规模小于所述第二检测模型的规模,所述第二检测模型为训练后的模型。
  13. 根据权利要求1至11任一项所述的检测模型训练方法,其特征在于,所述检测模型训练方法应用于蒸馏算法,所述第一检测模型为学生模型,所述第二检测模型为教师模型。
  14. 一种检测模型训练装置,其特征在于,包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时执行如权利要求1至13任一项所述的检测模型训练方法。
  15. 一种检测模型使用方法,其特征在于,应用于计算机设备,所述检测模型为训练后的第一检测模型,所述训练后的第一检测模型为采用权利要求1至13任一项所述的检测模型训练方法进行训练得到的模型,并部署在所述计算机设备中;所述检测模型使用方法包括:
    获取待检测的图像;
    通过所述训练后的第一检测模型对所述图像中的目标物进行检测,得到所述目标物在所述图像中的目标位置信息。
  16. 根据权利要求15所述的检测模型使用方法,其特征在于,所述计算机设备包括移动终端、无人机和相机。
  17. 一种存储介质,其特征在于,所述存储介质用于存储计算机程序,所述计算机程序被处理器加载以执行:
    通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;
    基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;
    根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;
    根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  18. 根据权利要求17所述的存储介质,其特征在于,在根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型时,所述处理器用于执行:
    获取所述第一显著区域特征和所述第二显著区域特征之间的相似度;
    获取所述第一检测模型对所述样本图像进行检测得到的损失值;
    根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  19. 根据权利要求18所述的存储介质,其特征在于,在根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型时,所述处理器用于执行:
    对所述相似度和所述损失值进行加权平均运算,得到目标损失值;
    根据所述目标损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。
  20. 根据权利要求18所述的存储介质,其特征在于,所述相似度包括欧几里得距离,在获取所述第一显著区域特征和所述第二显著区域特征之间的相似度时,所述处理器用于执行:
    确定所述第一显著区域特征和所述第二显著区域特征之间的欧几里得距离,得到所述第一显著区域特征和所述第二显著区域特征之间的相似度。
  21. 根据权利要求17所述的存储介质,其特征在于,在基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域时,所述处理器用于执行:
    获取多个候选区域;
    基于所述位置信息确定所述目标物的目标区域;
    从所述多个候选区域中筛选出与所述目标区域的重合度大于第一预设阈值的区域,得到正样本区域;
    从所述多个候选区域中筛选出与所述目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,所述预设范围为小于 所述第一预设阈值且大于第二预设阈值的区间;
    将所述正样本区域和所述负样本区域设置为所述目标物对应的显著区域。
  22. 根据权利要求21所述的存储介质,其特征在于,在获取多个候选区域时,所述处理器用于执行:
    基于所述第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。
  23. 根据权利要求21所述的存储介质,其特征在于,在根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征时,所述处理器用于执行:
    分别获取所述正样本区域和所述负样本区域中的所述第一特征信息,得到第一显著区域特征;以及,
    分别获取所述正样本区域和所述负样本区域的所述第二特征信息,得到第二显著区域特征。
  24. 根据权利要求17所述的存储介质,其特征在于,在根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型之后,所述处理器还用于执行:
    获取待检测的图像;
    通过所述训练后的第一检测模型对所述图像中的所述目标物进行检测,得到所述目标物在所述图像中的目标位置信息。
  25. 根据权利要求17所述的存储介质,其特征在于,在通过第一检测模型对样本图像进行特征提取之前,所述处理器用于执行:
    获取初始图像;
    从所述初始图像中提取所述目标物所在的区域图像;
    从所述区域图像中提取所述目标物的关键点;
    对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点;
    根据所述预处理后的关键点确定所述样本图像中所述目标物的位置信息。
  26. 根据权利要求25所述的存储介质,其特征在于,在对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点时,所述处理器用于执行:
    对所述初始图像和所述关键点按照预设角度进行旋转、平移、缩放和/或亮 度调节,得到所述样本图像和预处理后的关键点。
  27. 根据权利要求17所述的存储介质,其特征在于,所述目标物包括人脸。
  28. 根据权利要求17至27任一项所述的存储介质,其特征在于,所述第一检测模型的规模小于所述第二检测模型的规模,所述第二检测模型为训练后的模型。
  29. 根据权利要求17至27任一项所述的存储介质,其特征在于,所述存储介质应用于蒸馏算法,所述第一检测模型为学生模型,所述第二检测模型为教师模型。
PCT/CN2020/104973 2020-07-27 2020-07-27 检测模型训练方法、装置、检测模型使用方法及存储介质 WO2022021029A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/104973 WO2022021029A1 (zh) 2020-07-27 2020-07-27 检测模型训练方法、装置、检测模型使用方法及存储介质
CN202080015995.2A CN113490947A (zh) 2020-07-27 2020-07-27 检测模型训练方法、装置、检测模型使用方法及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/104973 WO2022021029A1 (zh) 2020-07-27 2020-07-27 检测模型训练方法、装置、检测模型使用方法及存储介质

Publications (1)

Publication Number Publication Date
WO2022021029A1 true WO2022021029A1 (zh) 2022-02-03

Family

ID=77933700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104973 WO2022021029A1 (zh) 2020-07-27 2020-07-27 检测模型训练方法、装置、检测模型使用方法及存储介质

Country Status (2)

Country Link
CN (1) CN113490947A (zh)
WO (1) WO2022021029A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897069A (zh) * 2022-05-09 2022-08-12 大庆立能电力机械设备有限公司 抽油机智能调控节能保护装置
CN115687914A (zh) * 2022-09-07 2023-02-03 中国电信股份有限公司 模型蒸馏方法、装置、电子设备及计算机可读介质
CN115761529A (zh) * 2023-01-09 2023-03-07 阿里巴巴(中国)有限公司 图像处理方法和电子设备
CN115908982A (zh) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 图像处理方法、模型训练方法、装置、设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580533A (zh) * 2022-03-04 2022-06-03 腾讯科技(深圳)有限公司 特征提取模型的训练方法、装置、设备、介质及程序产品
WO2024113242A1 (zh) * 2022-11-30 2024-06-06 京东方科技集团股份有限公司 着装规范判别方法、行人重识别模型训练方法及装置
CN116071608B (zh) * 2023-03-16 2023-06-06 浙江啄云智能科技有限公司 目标检测方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898168A (zh) * 2018-06-19 2018-11-27 清华大学 用于目标检测的卷积神经网络模型的压缩方法和***
WO2019143946A1 (en) * 2018-01-19 2019-07-25 Visa International Service Association System, method, and computer program product for compressing neural network models
CN110245662A (zh) * 2019-06-18 2019-09-17 腾讯科技(深圳)有限公司 检测模型训练方法、装置、计算机设备和存储介质
CN110674714A (zh) * 2019-09-13 2020-01-10 东南大学 基于迁移学习的人脸和人脸关键点联合检测方法
CN111382870A (zh) * 2020-03-06 2020-07-07 商汤集团有限公司 训练神经网络的方法以及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019143946A1 (en) * 2018-01-19 2019-07-25 Visa International Service Association System, method, and computer program product for compressing neural network models
CN108898168A (zh) * 2018-06-19 2018-11-27 清华大学 用于目标检测的卷积神经网络模型的压缩方法和***
CN110245662A (zh) * 2019-06-18 2019-09-17 腾讯科技(深圳)有限公司 检测模型训练方法、装置、计算机设备和存储介质
CN110599503A (zh) * 2019-06-18 2019-12-20 腾讯科技(深圳)有限公司 检测模型训练方法、装置、计算机设备和存储介质
CN110674714A (zh) * 2019-09-13 2020-01-10 东南大学 基于迁移学习的人脸和人脸关键点联合检测方法
CN111382870A (zh) * 2020-03-06 2020-07-07 商汤集团有限公司 训练神经网络的方法以及装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897069A (zh) * 2022-05-09 2022-08-12 大庆立能电力机械设备有限公司 抽油机智能调控节能保护装置
CN114897069B (zh) * 2022-05-09 2023-04-07 大庆立能电力机械设备有限公司 抽油机智能调控节能保护装置
CN115687914A (zh) * 2022-09-07 2023-02-03 中国电信股份有限公司 模型蒸馏方法、装置、电子设备及计算机可读介质
CN115687914B (zh) * 2022-09-07 2024-01-30 中国电信股份有限公司 模型蒸馏方法、装置、电子设备及计算机可读介质
CN115908982A (zh) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 图像处理方法、模型训练方法、装置、设备和存储介质
CN115761529A (zh) * 2023-01-09 2023-03-07 阿里巴巴(中国)有限公司 图像处理方法和电子设备
CN115761529B (zh) * 2023-01-09 2023-05-30 阿里巴巴(中国)有限公司 图像处理方法和电子设备

Also Published As

Publication number Publication date
CN113490947A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2022021029A1 (zh) 检测模型训练方法、装置、检测模型使用方法及存储介质
US20200160040A1 (en) Three-dimensional living-body face detection method, face authentication recognition method, and apparatuses
WO2019128646A1 (zh) 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
WO2021190171A1 (zh) 图像识别方法、装置、终端和存储介质
CN108229509B (zh) 用于识别物体类别的方法及装置、电子设备
WO2019232866A1 (zh) 人眼模型训练方法、人眼识别方法、装置、设备及介质
US8792722B2 (en) Hand gesture detection
US8750573B2 (en) Hand gesture detection
WO2019232862A1 (zh) 嘴巴模型训练方法、嘴巴识别方法、装置、设备及介质
WO2017096753A1 (zh) 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质
CN109165589B (zh) 基于深度学习的车辆重识别方法和装置
US10534957B2 (en) Eyeball movement analysis method and device, and storage medium
WO2019033572A1 (zh) 人脸遮挡检测方法、装置及存储介质
US9626553B2 (en) Object identification apparatus and object identification method
US7869632B2 (en) Automatic trimming method, apparatus and program
WO2021136528A1 (zh) 一种实例分割的方法及装置
EP2697775A1 (en) Method of detecting facial attributes
WO2020151299A1 (zh) 黄色禁停线识别方法、装置、计算机设备及存储介质
WO2020155790A1 (zh) 理赔信息提取方法和装置、电子设备
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
JP2022133378A (ja) 顔生体検出方法、装置、電子機器、及び記憶媒体
WO2019033567A1 (zh) 眼球动作捕捉方法、装置及存储介质
WO2024077935A1 (zh) 一种基于视觉slam的车辆定位方法及装置
CN112613471A (zh) 人脸活体检测方法、装置及计算机可读存储介质
CN109635649B (zh) 一种无人机侦察目标的高速检测方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20947718

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20947718

Country of ref document: EP

Kind code of ref document: A1