WO2021179498A1 - 目标检测方法及其模型的训练方法、装置及电子设备 - Google Patents

目标检测方法及其模型的训练方法、装置及电子设备 Download PDF

Info

Publication number
WO2021179498A1
WO2021179498A1 PCT/CN2020/100704 CN2020100704W WO2021179498A1 WO 2021179498 A1 WO2021179498 A1 WO 2021179498A1 CN 2020100704 W CN2020100704 W CN 2020100704W WO 2021179498 A1 WO2021179498 A1 WO 2021179498A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
actual
point
target detection
Prior art date
Application number
PCT/CN2020/100704
Other languages
English (en)
French (fr)
Inventor
宋涛
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to KR1020217034041A priority Critical patent/KR20210141650A/ko
Priority to JP2021563131A priority patent/JP2022529838A/ja
Publication of WO2021179498A1 publication Critical patent/WO2021179498A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a target detection method and model training method, device and electronic equipment.
  • the existing neural network models are generally based on anchor matching or anchor free strategies to achieve target detection.
  • the existing strategies still have a high false detection rate in actual use.
  • the embodiments of the present application provide a target detection method and a training method, device and electronic device of its model.
  • the embodiment of the application provides a method for training a target detection model, including: obtaining a sample image, wherein the sample image is marked with actual position information of the actual area where the target is located; taking several points in the sample image as the detection points, based on each The distance between each detection point and the preset point of the actual area, at least one detection point is determined as the positive sample point of the target; the target detection model is used to perform target detection on the sample image, and the prediction area information corresponding to each positive sample point is determined ; using the actual position information and the predicted area information to determine the loss value of the target detection model; adjust the parameters of the target detection model based on the loss value of the target detection model.
  • the sample image contains multiple targets; taking several points in the sample image as detection points, at least one detection point is determined as the positive sample point of the target based on the distance between each detection point and the preset point of the actual area , Including: down-sampling the sample image to obtain multiple feature maps corresponding to different resolutions; based on the size of the actual area of the target, grouping the actual area of multiple targets with multiple feature maps; among them, the larger the size The actual area and the feature map with the smaller resolution are regarded as the same group; for the feature map of the same group and the actual area of the target, each point in the feature map is determined as a detection point; based on each detection point and the actual area The distance between the preset points of at least one of the detection points is determined as the positive sample point of the target.
  • m feature maps there are m feature maps; based on the size of the actual area of the target, the actual areas of multiple targets and multiple feature maps are grouped, including: calculating the area of the actual area of each target, and calculating the maximum and minimum of the area
  • the range of values is divided into m intervals sorted from small to large; m feature maps are arranged from large to small in terms of resolution, and the actual area of the target whose area belongs to the i-th interval is divided from the i-th feature map To the same group; where i and m are positive integers, and i is a value between 0 and m.
  • determining at least one detection point as the positive sample point of the target includes: obtaining the distance between each detection point and the preset point in the actual area ; At least one detection point whose distance from the preset point meets the preset condition is selected as the positive sample point of the target.
  • determining at least one detection point whose distance from the preset point meets the preset condition as the positive sample point of the target includes: determining the first several detection points with the closest distance to the preset point as the target Positive sample point.
  • the prediction region information includes the prediction position information of the prediction region corresponding to the positive sample point and the prediction confidence of the prediction region; using the actual position information and the prediction region information to determine the loss value of the target detection model includes: The actual location information and predicted location information of each target are used to obtain the location loss value; the predicted confidence level is used to obtain the confidence loss value; the loss value of the target detection model is determined based on the location loss value and the confidence loss value.
  • the actual location information includes the actual area size of the actual area
  • the predicted location information includes the predicted area size of the predicted area; using the actual location information and predicted location information of each target to obtain the location loss value, including: using the actual location of each target
  • the area size and the predicted area size are used to obtain the area size loss value; based on the area size loss value, the location loss value is determined.
  • the actual position information also includes the preset point position of the actual area;
  • the predicted position information also includes the predicted offset information between the positive sample point of the predicted area and the preset point of the actual area;
  • the actual position information of each target is used with Predicting the position information to obtain the position loss value also includes: calculating the actual offset information between the preset point position of the actual area of the target and the corresponding positive sample point position; using the actual offset information and the predicted offset information to obtain the offset Shift loss value; determine the position loss value based on the area size loss value, including: determine the position loss value based on the area size loss value and the offset loss value.
  • the method further includes: taking the remaining detection points as negative sample points; using the target detection model Perform target detection on the sample image to obtain the prediction area information corresponding to each positive sample point, including: using the target detection model to perform target detection on the sample image to obtain the prediction area information corresponding to each positive sample point and each negative sample point corresponding
  • the prediction area information; using the prediction confidence to obtain the confidence loss value includes: using the prediction confidence corresponding to the positive sample point and the prediction confidence corresponding to the negative sample point to obtain the confidence loss value.
  • the sample image is a two-dimensional image or a three-dimensional image
  • the actual area is the actual bounding box
  • the predicted area is the predicted bounding box.
  • setting the sample image as a two-dimensional image can achieve target detection on the two-dimensional image
  • setting the sample image as a three-dimensional image can achieve target detection on the three-dimensional image
  • the embodiment of the present application provides a target detection method, including: acquiring an image to be tested; using a target detection model to perform target detection on the image to be tested to obtain target area information corresponding to the target in the image to be tested; wherein the target detection model is Obtained by the training method of the target detection model in the above first aspect.
  • the embodiment of the application provides a training device for a target detection model, including an image acquisition module, a sample selection module, a target detection module, a loss determination module, and a parameter adjustment module.
  • the image acquisition module is configured to acquire sample images;
  • the sample selection module is configured to Taking several points in the sample image as detection points, and based on the distance between each detection point and the preset point in the actual area, at least one detection point is determined as the positive sample point of the target;
  • the target detection module is configured to use the target detection model Perform target detection on the sample image to obtain the predicted area information corresponding to each positive sample point;
  • the loss determination module is configured to use the actual position information and the predicted area information to determine the loss value of the target detection model;
  • the parameter adjustment module is configured to be based on the target detection model Adjust the parameters of the target detection model.
  • the embodiment of the application provides a target detection device, which includes an image acquisition module and a target detection module.
  • the image acquisition module is configured to acquire the image to be tested;
  • the target detection module is configured to use the target detection model to perform target detection on the image to be tested to obtain The target area information corresponding to the target in the test image; wherein the target detection model is obtained by the above-mentioned target detection model training device.
  • An embodiment of the present application provides an electronic device including a memory and a processor coupled to each other, and the processor is configured to execute program instructions stored in the memory to implement the above-mentioned target detection model training method or the above-mentioned target detection method.
  • the embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium stores program instructions.
  • the program instructions are executed by a processor, the training method of the above-mentioned target detection model is realized, or the target detection method is realized.
  • the embodiment of the present application provides a computer program, including computer readable code, when the computer readable code is executed in an electronic device, the processor in the electronic device is configured to achieve the goal of any one of the above The training method of the detection model, or the target detection method described above.
  • the target detection method and its model training method, device and electronic equipment provided by the embodiments of the present application use several points in a sample image as detection points and are based on the distance between each detection point and a preset point in the actual area , At least one detection point is determined as the positive sample point of the target, thereby using the target monitoring model to perform target detection on the sample image, obtaining the predicted area information corresponding to each positive sample point, and using the actual actual area of the target in the sample image
  • the predicted location information included in the location information and the predicted area information determines the loss value of the target detection model, and adjusts the parameters of the target detection model based on the loss value of the target detection model, which can be based on the multiple positive sample points corresponding to the matching Predict the location information for training the target detection model, so as to ensure the recall rate without designing the anchor frame.
  • the parameters of the target detection model based on the loss value related to the location information the accuracy can be ensured, and then the target detection model can be adjusted. Improve the accuracy of target detection.
  • FIG. 1 is a schematic diagram of a network architecture for training a target detection model and its application according to an embodiment of the application;
  • FIG. 2 is a schematic flowchart of a method for training a target detection model provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of the implementation process of step S22 in the training method of the target detection model provided by the embodiment of the application;
  • FIG. 4 is a schematic flowchart of a target detection method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of some predicted area information obtained by the target detection method provided by an embodiment of the application.
  • FIG. 6 is a schematic flowchart of another target detection method provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the application.
  • system and "network” in this article are often used interchangeably in this article.
  • the term “and/or” in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations.
  • the character "/” in this text generally indicates that the associated objects before and after are in an "or” relationship.
  • "many” in this document means two or more than two.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the application.
  • the network architecture includes a CT machine 11, a computer device 12, and a server 13, where the CT machine 11 is used to collect original images.
  • the CT machine 11 establishes a communication connection with the computer device 12, the CT machine 11 can send the obtained original image to the computer device 12, and the computer device 12 performs processing such as marking the original image to obtain a sample image.
  • the server 13 stores the sample image
  • the computer device 12 also establishes a communication connection with the server 13, and the computer device 12 can directly obtain the sample image from the server 13.
  • the computer device 12 After the computer device 12 obtains the sample image, it adjusts the parameters of the target detection model based on the sample image.
  • the computer device 12 receives the image to be tested, and the computer device 12 obtains target area information corresponding to the target in the image to be tested based on the target detection model.
  • the server 13 after the server 13 obtains the sample image, it adjusts the parameters of the target detection model stored in the server 13 based on the sample image.
  • the computer device 12 sends the image to be tested to the server 13 so that the server 13 can obtain target area information corresponding to the target in the image to be tested based on the target detection model. After obtaining the target area information, the server 13 returns the target area information to the computer device 12.
  • the target detection method and its model training method As reference to the schematic diagram of the application scenario shown in FIG. 1, the following describes various embodiments of the target detection method and its model training method, device, and electronic equipment.
  • An embodiment of the application provides a method for training a target detection model.
  • the method is applied to a training device for a target detection model.
  • the training device for the target detection model may be a computer device 12 as shown in FIG. 1.
  • It may also be the server 13 in FIG. 1.
  • the method provided in the embodiment of the present application may be implemented by a computer program, and when the computer program is executed, each step in the training method of the target detection model provided in the embodiment of the present application is completed.
  • the computer program may be executed by the processor of the training device of the target detection model.
  • FIG. 2 is a schematic flowchart of a method for training a target detection model provided by an embodiment of the application. As shown in FIG. 2, the method for training a target detection model may include the following steps:
  • Step S21 Obtain a sample image.
  • the sample image is marked with the actual position information of the actual area where the target is located.
  • the actual area may be an actual bounding box (Bounding Box), for example, the actual bounding box of the target, and the actual bounding box may be a rectangular box, which is not limited here.
  • the actual position information may include the position information of the preset point of the actual area (for example, the center point of the actual area) and the size of the actual area (for example, the size of the actual bounding box). Length and width).
  • the sample image in order to implement target detection on a two-dimensional image, may be a two-dimensional image. In other implementation scenarios, in order to implement target detection on a three-dimensional image, the sample image may be a three-dimensional image, which is not limited here.
  • the sample image in order to apply target detection to the field of medical imaging, may be a medical image, and the medical image may be a CT (Computed Tomography) image or an MR (Magnetic Resonance, nuclear magnetic resonance) image. , It is not limited here.
  • the target in the sample image may be a biological organ, for example, pituitary gland, pancreas, etc.; or, the target in the sample image may also be a diseased tissue, etc., for example, luminal infarction, hematoma, etc. , It is not limited here.
  • it can be deduced by analogy, so I won't give an example one by one here.
  • Step S22 Taking several points in the sample image as detection points, at least one detection point is determined as the positive sample point of the target based on the distance between each detection point and the preset point of the actual area.
  • the distance between each detection point and the preset point of the actual area can be obtained, so that the distance between each detection point and the preset point of the actual area can be At least one detection point whose distance between the points meets the preset condition is determined as the positive sample point of the target.
  • At least part of the detection points whose distance from the preset point is less than a preset distance threshold can be selected as the positive sample point of the target, for example, at least part of the detection points whose distance from the preset point is less than 5 pixels , Or, at least part of the detection points whose distance from the preset point is less than 8 pixels is used as the positive sample point of the target, which is not limited here.
  • the first detection points with the closest distance to the preset point may also be determined as the positive sample points of the target.
  • the detection points can be the first 10 detection points, the first 20 detection points, the first 30 detection points, etc., which are not limited here.
  • At least one detection point is determined as the positive sample point of the target, so that each actual area is matched to the same number of positive sample points, which can help ensure the gradient between targets of different sizes Balance, in turn, can help improve the accuracy of target detection.
  • Step S23 Use the target detection model to perform target detection on the sample image to obtain prediction area information corresponding to each positive sample point.
  • the prediction area information corresponding to each positive sample point includes the prediction position information of the prediction area corresponding to the positive sample point.
  • the prediction area in order to clarify the range of the prediction area, the prediction area may be a prediction bounding box, and the prediction bounding box may be a rectangle, which is not limited here.
  • the prediction area information in order to be able to uniquely represent a prediction bounding box, the prediction area information may include the location information of the preset point of the prediction area (for example, the center point of the prediction area) and the size of the prediction area (for example, the size of the prediction bounding box). Length and width).
  • Step S24 Use the actual position information and the predicted area information to determine the loss value of the target detection model.
  • the prediction area information may also include the prediction confidence of the prediction area, and the prediction confidence The degree can indicate the reliability of the prediction area. The higher the prediction confidence, the higher the reliability of the prediction area.
  • the actual location information and predicted location information of each target are used to obtain the location loss value, and the prediction confidence is used , Get the confidence loss value, and get the loss value of the target detection model based on the position loss value and the confidence loss value.
  • At least one of the binary cross-entropy loss function, the mean square error loss function, and the L 1 loss function may be used to calculate the loss value, which is not limited here.
  • L 1 loss function also known as Least Absolute Deviation (LAD) or Least Absolute Error (LAE)
  • LAD Least Absolute Deviation
  • LAE Least Absolute Error
  • m represents the number of positive sample points
  • y (i) is the target value
  • Is the estimated value
  • L 1 is the loss function
  • LSE Least Square Error
  • m represents the number of positive sample points
  • y (i) is the target value
  • Is the estimated value
  • L 2 is the loss function
  • the actual position information may also include the actual area size of the actual area
  • the predicted area information may also include the predicted area size of the predicted area.
  • each area may also be used. The actual area size and the predicted area size of each target are obtained to obtain the area size loss value, and the location loss value is determined based on the area size loss value.
  • the position loss weight corresponding to the position loss value and the confidence degree corresponding to the confidence loss value can be preset Loss weight, and use the position loss weight and the confidence loss weight to weight the position loss value and the confidence loss value to obtain the loss value of the target detection model.
  • the actual position information may also include the preset point position of the actual area
  • the predicted position may also include the predicted area.
  • the predicted offset information between the positive sample point and the preset point of the actual area so that the actual offset information between the preset point position of the actual area of the target and the corresponding positive sample point position can be calculated, and the actual offset information can be used
  • the offset information and the predicted offset information obtain the offset loss value, and then the position loss value can be determined based on the area size loss value and the offset loss value.
  • the actual area size and the predicted area size of each target can be calculated using the IoU (Intersection over Union) loss function or the L 1 loss function to obtain the area size loss value, and use the L 1 loss function
  • IoU is the ratio between the intersection and union between the actual area and the predicted area
  • the L 1 loss function is used to calculate the length difference between the actual bounding boxes of the predicted bounding box, and/or the predicted bounding box and the actual For the width difference between the bounding boxes, please refer to the previous related steps.
  • the position of the preset point (such as the center point) of the actual area is (38, 37.5)
  • the category is human
  • the position of a positive sample point is (37.5, 37.5)
  • the size of the prediction area predicted by the detection model is 10*15
  • the prediction offset information is (offset-x, offset-y)
  • the confidence level for the category is 0.9
  • the confidence level for the category is 0.2
  • the target's confidence can be calculated
  • the actual offset information between the preset point position of the actual area and the corresponding positive sample point position is (0.5, 0.1).
  • the target is a small target
  • the size of the corresponding actual area is 0.0.2*0.04
  • the above The offset is larger than the size of the actual area, which leads to a large deviation in target detection. Therefore, the loss calculation and training of the offset can make the predicted offset close to or equal to the actual offset.
  • the confidence loss value in order to further improve the accuracy of the confidence loss value and thereby improve the accuracy of target detection, it is also possible to use detection points other than positive sample points as negative sample points, and use the target detection model Perform target detection on the sample image to obtain the prediction area information corresponding to each positive sample point and the prediction area information corresponding to each negative sample point, and then use the prediction confidence corresponding to the positive sample point and the prediction confidence corresponding to the negative sample point, Obtain the confidence loss value.
  • Step S25 Adjust the parameters of the target detection model based on the loss value of the target detection model.
  • the parameters of the target detection model can be adjusted.
  • the parameters of the target detection model may include, but are not limited to: the weight of the convolutional layer of the target detection model.
  • the above-mentioned step S23 and subsequent steps may be executed again until the loss value meets the preset training end condition.
  • the preset training end condition may include: the loss value of the target detection model is less than a preset loss threshold, and the loss value of the target detection model no longer decreases.
  • the target monitoring model is used to perform target detection on the sample image, and the predicted area information corresponding to each positive sample point is obtained, and the actual position information of the actual area where the target is located in the sample image and the predicted area information include Predict location information, determine the loss value of the target detection model, adjust the parameters of the target detection model based on the loss value of the target detection model, and train the target detection model based on the predicted position information corresponding to multiple positive sample points obtained by the match Therefore, it is possible to ensure the recall rate without designing the anchor frame.
  • the parameters of the target detection model based on the loss value related to the position information, the accuracy rate can be ensured, and the accuracy of the target detection can be improved.
  • FIG. 3 is a schematic flowchart of step S22 in the method for training a target detection model provided by an embodiment of the application.
  • the sample image may include multiple targets, and step S22 may be implemented through the following steps:
  • Step S221 down-sampling the sample image to obtain multiple feature maps corresponding to different resolutions.
  • Feature Pyramid Networks may be used to down-sample the sample image to obtain multiple feature maps corresponding to different resolutions.
  • the above-mentioned FPN may be a part of the target detection model, so that by inputting the sample image into the target detection model, multiple feature maps corresponding to different resolutions can be obtained. Taking a sample image of 128*128 as an example, down-sampling it can get a feature map corresponding to 4*4 resolution, a feature map corresponding to 8*8, a feature map corresponding to 16*16, etc., which are not limited here. .
  • each feature point in the 4*4 resolution feature map corresponds to the 32*32 pixel area of the sample image
  • each feature point in the 8*8 resolution feature map corresponds to 16*16 pixels of the sample image Area
  • each feature point in the 16*16 resolution feature map corresponds to the 8*8 pixel area of the sample image.
  • the feature maps of other resolutions can be deduced by analogy, so we will not give examples one by one here.
  • Step S222 Based on the size of the actual area of the target, group the actual areas of the multiple targets with multiple feature maps.
  • the actual area with a larger size and the feature map with a smaller resolution are regarded as the same group.
  • the actual area sizes of multiple targets in the sample image are 16*32, 11*22, 10*20, 5*10, so you can change the size to 16*32
  • the actual area and the feature map with a resolution of 4*4 are divided into the same group, and the actual area with a size of 11*22 and the actual area with a size of 10*20 and the feature map with a resolution of 8*8 are divided into the same group.
  • the actual area with a size of 5*10 and the feature map with a resolution of 16*16 are divided into the same group, which is not limited here.
  • the area of the actual area of each target in order to accurately group the actual area of multiple targets with multiple feature maps, can also be calculated, and the range between the maximum value and the minimum value of the area can be calculated. Divided into m intervals sorted from small to large, where m is the number of feature maps, arrange the m feature maps in order of resolution from large to small, and the actual area of the target whose area belongs to the i-th interval is compared with The i-th feature map is divided into the same group, where i and m are positive integers, and i is a value between 0 and m.
  • the number m of feature maps with different resolutions is 3, and the actual area sizes of multiple targets in the sample image are 16*32, 11*22, 10*20, and 5 respectively. *10, the areas are 512, 242, 200, and 50 respectively, and the maximum value 512 and the minimum value 50 are divided into 3 intervals, which are 50 ⁇ 204, 204 ⁇ 358, 358 ⁇ 512, respectively.
  • the 4*4 resolution Feature maps, 8*8 feature maps, 16*16 feature maps are sorted in descending order of resolution: 16*16 resolution feature maps, 8*8 resolution feature maps, 4*4 resolution feature maps Feature map, the actual area of the target whose area belongs to the first interval (ie 50 ⁇ 204) is the actual area of 10*20 and the actual area of 5*10, so the two and the first feature map (ie, the resolution is The 16*16 feature map) is divided into the same group; the actual area of the target whose area belongs to the second interval (ie 204-358) is the actual area of 11*22, so it is combined with the second feature map (ie, the resolution 8*8 feature map) is divided into the same group; the actual area of the target whose area belongs to the third interval (i.e., 358-512) is the actual area of 16*32, so it is divided from the third feature map (that is, to distinguish The feature map with a rate of 4*4) is divided into the same group.
  • Other sample images can be
  • Step S223 For the feature maps of the same group and the actual area of the target, each point in the feature map is used as a detection point, and based on the distance between each detection point and the preset point of the actual area, at least one detection point is selected as A positive sample of the target.
  • the position coordinates of the detection point in the sample image can be determined according to the position coordinates of the detection point in the feature map and the resolution of the feature map, so as to calculate the detection point and the position coordinates of the detection point in the sample image.
  • the distance between the preset points in the actual area can be determined according to the position coordinates of the detection point in the feature map and the resolution of the feature map, so as to calculate the detection point and the position coordinates of the detection point in the sample image. The distance between the preset points in the actual area.
  • each feature point in the 4*4 feature map is used as a detection point, because each feature of the feature map with a resolution of 4*4
  • the point corresponds to 32*32 in the 128*128 sample image, so the detection point (1,1) corresponds to (16,16) in the sample image, and the detection point (1,2) corresponds to (16,48) in the sample image ), the detection point (1,3) corresponds to (16,80) in the sample image, and the detection point (1,4) corresponds to (16,112) in the sample image.
  • the detection point (2,1) corresponds to the sample image.
  • the detection point (2,2) corresponds to (48,48) in the sample image
  • the detection point (2,3) corresponds to (48,80) in the sample image
  • the detection point (2,4) ) Corresponds to (48,112) in the sample image.
  • the Euclidean distance can be used to calculate the preset distance of the detection point from the actual area
  • the distances of the points are: 16, 16, 48, 80, 35.78, 35.78, 57.69, 86.16.
  • Other detection points can be deduced by analogy, so we will not give examples one by one here.
  • the size of the actual area is the size of the target of 16*32
  • the positive sample points can be the feature points (1,1), (1,2) and (2,1), (2,2) in the feature map with a resolution of 4*4. Other cases can be deduced by analogy. This will not give examples one by one.
  • multiple feature maps corresponding to different resolutions are obtained, so that the actual area of the multiple targets and multiple feature maps are grouped based on the size of the actual area of the target, and The larger the size of the actual area and the smaller the resolution of the feature map are as the same grouping, so that for the feature map of the same group and the actual area of the target, each point of the feature map is the detection point, and the execution is based on each detection point and the actual area.
  • each point in the feature map of each group can be used as the detection point for the selection of positive sample points, which can help ensure that as many as possible are generated. Positive sample points are in turn helpful to ensure the recall rate, which in turn helps to improve the accuracy of target detection.
  • An embodiment of the application provides a target detection method.
  • the method is applied to a target detection device.
  • the target detection device may be a computer device.
  • the method provided in the embodiment of the application may be implemented by a computer program. At that time, each step in the target detection method provided in the embodiment of the present application is completed.
  • the computer program may be executed by the processor of the target detection device.
  • FIG. 4 is a schematic flow chart of the target detection method provided in an embodiment of this application. As shown in FIG. 4, the target detection method may include the following steps:
  • Step S41 Obtain an image to be tested.
  • the image to be tested in order to implement target detection on a two-dimensional image, may be a two-dimensional image. In other implementation scenarios, in order to implement target detection on a three-dimensional image, the image to be tested may be a three-dimensional image, which is not limited here.
  • the image to be tested may be a medical image, for example, a CT (Computed Tomography) image or an MR (Magnetic Resonance) image.
  • CT Computer Tomography
  • MR Magnetic Resonance
  • the target in the image to be tested can be a biological organ, such as the pituitary gland, pancreas, etc.; or, the target in the image to be tested can also be a diseased tissue, such as luminal infarction, hematoma, etc., which is not done here. limited.
  • it can be deduced by analogy, so I won't give examples one by one here.
  • Step S42 Use the target detection model to perform target detection on the image to be tested, and obtain target area information corresponding to the target in the image to be tested.
  • the target detection model is obtained through the steps in any of the above-mentioned training method embodiments of the target detection model. You can refer to the steps in any of the foregoing target detection model training method embodiments.
  • the target detection model is used to perform target detection on the image to be tested to obtain a prediction corresponding to each detection point.
  • Area information where the prediction area information corresponding to each detection point includes the prediction confidence level and prediction area location information of the prediction area corresponding to the detection point, and is based on the prediction confidence level and prediction area location information of the prediction area corresponding to each detection point , Adopting non-maximum suppression (Non-Maximum Suppression, NMS) to obtain target area information corresponding to the target in the image to be measured.
  • FIG. 5 is a schematic diagram of several prediction area information obtained by the target detection method provided by an embodiment of the application.
  • prediction area 01 to prediction area 05 are prediction areas corresponding to each detection point, and the detection results in prediction
  • the prediction confidence of area 01 is 0.6
  • the prediction confidence of prediction area 02 is 0.9
  • the prediction confidence of prediction area 03 is 0.8
  • the confidence of prediction area 04 is 0.9
  • the confidence of prediction area 05 is 0.8.
  • the regions are arranged in descending order of prediction confidence: prediction region 01, prediction region 03, prediction region 05, prediction region 02, prediction region 04, select the prediction region 04 with the highest prediction confidence, and use the prediction location information to determine the prediction regions respectively 01.
  • Whether the IoU of prediction area 03, prediction area 05, prediction area 02, and prediction area 04 is greater than a preset intersection ratio threshold (for example, 60%), if yes, discard it.
  • a preset intersection ratio threshold for example, 60%
  • prediction area 05 and The intersection of prediction area 04 is relatively large, assuming it is 85%, prediction area 05 is discarded, and the intersection ratio of prediction area 01 to prediction area 03 to prediction area 04 is 0, so keep it.
  • prediction area 04 is taken as For the target area corresponding to the target, select the prediction area 02 with the highest prediction confidence from the remaining prediction areas 01 to 03, and determine whether the IoU of the prediction area 01, the prediction area 03 and the prediction area 02 are based on the prediction position information It is greater than a preset intersection ratio threshold (for example, 60%). If yes, discard it.
  • a preset intersection ratio threshold for example, 60%
  • prediction area 01, prediction area 03, and prediction area 02 are 65% and 70%, respectively.
  • predict area 01 and prediction area 03 is discarded, and the predicted area 02 is reserved as the target area corresponding to the target.
  • Other situations can be deduced by analogy, so I won't give examples one by one here.
  • NMS Non-Maximum Suppression
  • the predicted position in the training process of the target detection model, in order to improve the accuracy of the target detection model, especially to improve the detection accuracy of small targets, the predicted position may also include the positive sample points of the predicted region and The predicted offset information between the preset points of the actual area, so that the actual offset information between the preset point position of the actual area of the target and the corresponding positive sample point position can be calculated, and the actual offset information and the predicted offset can be used
  • the offset loss value can be obtained from the shift information, and then the position loss value can be obtained based on the area size loss value and the offset loss value.
  • the position loss value can be used to adjust the parameters of the target detection model.
  • the obtained target area information may also include the offset information (offset-x, offset-y) between the target area and the detection point (x0, y0), so the position of the target in the image to be measured can be expressed as (x0+offset -x, y0+offset-y), and determine the target category based on the detected category confidence. For example, the category confidence that the target is human is 0.9, and the category confidence that the target is cat is 0.1, so the detection can be determined The target is people.
  • the target area information may also include the size (for example, length and width) of the target area.
  • the target detection method provided in the embodiments of the present application can improve the accuracy of target detection by using the target detection model obtained by the target detection model training method in each of the foregoing embodiments to perform target detection on the image to be tested.
  • an embodiment of the present application further provides a target detection method, the method includes:
  • Step S61 Pass the acquired image to be tested through the FPN network to obtain feature maps of different resolutions.
  • step S62 the feature maps of different resolutions are grouped.
  • each gt box group according to the size of the gt box (the same as the area of the actual area in each of the above embodiments) and feature maps at different resolutions.
  • the feature maps with higher resolution are responsible for detecting small targets, and the feature maps with lower resolution are responsible for detection.
  • When calculating the loss function first sort each gt box according to the distance from the detection point to the center point of the gt box, and select the first k detection points as the positive sample points of the gt box, and the remaining points are the negative sample points of the gt box.
  • Use the IoU loss to regress the height (H, High) and width (W, Width) of the box according to the corresponding positive sample, and use the L1 loss function to regress the offset of the corresponding positive sample point.
  • Step S63 in the inference process based on the grouping, an NMS operation is used to remove duplicate detection boxes.
  • the method provided in the embodiments of the present application has enough positive samples to ensure the recall rate. At the same time, since each gt box matches the same number of positive samples, the gradient between the targets of different sizes in the classification loss can be guaranteed. balance.
  • the IOU loss is used to regress the H and W of the gt box, and the L1loss is used to calculate the offset value (offset) from the positive sample point to the actual gt box center point to obtain more accurate position information.
  • FIG. 6 is a schematic diagram of the processing process of images to be tested in medical images based on an embodiment of this application. As shown in FIG. The feature maps of different resolutions are grouped to obtain each group 602, and based on each group 602, an NMS operation is used to remove duplicate detection frames, and an image 603 of the disease location is obtained. In this way, the detection accuracy is improved and false positives are reduced.
  • FIG. 7 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the present application.
  • the training device 70 for a target detection model includes: an image acquisition module 71, a sample selection module 72, a target detection module 73, and loss The determination module 74 and the parameter adjustment module 75.
  • the image acquisition module 71 is configured to acquire a sample image, wherein the sample image is marked with the actual position information of the actual area where the target is located;
  • the sample selection module 72 is configured to detect several points in the sample image Point, based on the distance between each detection point and the preset point of the actual area, at least one detection point is determined as the positive sample point of the target;
  • the target detection module 73 is configured to use the target detection model to perform target detection on the sample image to obtain The prediction area information corresponding to each positive sample point;
  • the loss determination module 74 is configured to use the actual position information and the prediction area information to determine the loss value of the target detection model;
  • the parameter adjustment module 75 is configured to adjust the target based on the loss value of the target detection model Check the parameters of the model.
  • the training device for the target detection model uses several points in the sample image as the detection points, and determines at least one detection point as the distance between each detection point and the preset point of the actual area
  • the positive sample point of the target is used to detect the target in the sample image using the target monitoring model to obtain the predicted area information corresponding to each positive sample point, and use the actual position information of the actual area where the target is located in the sample image and the predicted area information.
  • the target detection model can be determined based on the predicted position information corresponding to the multiple positive sample points obtained by the match. Training can ensure the recall rate without designing the anchor frame.
  • the parameters of the target detection model based on the loss value related to the position information, the accuracy can be ensured, and the accuracy of the target detection can be improved.
  • the sample image contains multiple targets
  • the sample selection module 72 includes a down-sampling sub-module configured to down-sample the sample image to obtain multiple feature maps corresponding to different resolutions
  • the sample selection module 72 also includes a grouping sub-module, configured to group the actual areas of multiple targets with multiple feature maps based on the size of the actual area of the target; among them, the actual area with a larger size and a feature map with a smaller resolution are regarded as the same
  • the sample selection module 72 also includes a sample selection sub-module configured to determine each point in the feature map as a detection point for the feature map of the same group and the actual area of the target, based on the preset of each detection point and the actual area The distance between the points, the step of determining at least one detection point as the positive sample point of the target.
  • the actual area of the multiple targets and multiple feature maps are grouped based on the size of the actual area of the target, and
  • the larger the size of the actual area and the smaller the resolution of the feature map are as the same grouping, so that for the feature map of the same group and the actual area of the target, each point of the feature map is the detection point, and the execution is based on each detection point and the actual area.
  • the distance between the preset points of the area, the step of selecting at least one detection point as the positive sample point of the target, on the one hand, the feature map with high resolution can be responsible for the small size target, and the feature map with low resolution can be responsible for the large size.
  • each point of the feature map of each group can be used as the detection point to select positive sample points, which can help ensure that as many positive samples as possible are generated Points, which in turn helps to ensure the recall rate, which in turn helps to improve the accuracy of target detection.
  • the grouping sub-module includes an interval division part, configured to calculate the area of the actual area of each target, and divide the range between the maximum value and the minimum value of the area into smaller ones.
  • the grouping submodule includes the grouping division part, which is configured to arrange the m feature maps according to the resolution from large to small, and combine the actual area of the target with the area of the i-th interval and the i-th feature The graph is divided into the same group; where i and m are positive integers, and i is a value between 0 and m.
  • the range between the maximum value and the minimum value of the area is divided into m intervals sorted from small to large, and m is the same as the number of feature maps, and Sort the m feature maps in descending order of resolution, and divide the actual area of the target whose area belongs to the i-th interval and the i-th feature map into the same group, so that the larger the actual area and the smaller the resolution
  • the feature maps of as the same group which can help to achieve multi-scale target detection, and thus can help improve the accuracy of target detection.
  • the sample selection module 72 further includes a distance calculation sub-module configured to obtain the distance between each detection point and a preset point in the actual area, and the sample selection module 72 also includes a distance judgment sub-module. It is configured to determine at least one detection point whose distance from the preset point meets the preset condition as the positive sample point of the target.
  • the distance judgment sub-module is configured to use the first detection points with the closest distance to the preset point as the positive sample points of the target.
  • each actual area can be matched to the same number of positive sample points, which can be beneficial Ensuring the gradient balance between targets of different sizes can help improve the accuracy of target detection.
  • the prediction area information includes the prediction location information of the prediction area corresponding to the positive sample point and the prediction confidence of the prediction area
  • the loss determination module 74 includes a location loss value calculation sub-module configured to The actual location information and predicted location information of each target are used to obtain the location loss value.
  • the loss determination module 74 also includes a confidence loss value calculation sub-module configured to use the predicted confidence to obtain the confidence loss value.
  • the loss determination module 74 also It includes a model loss value calculation sub-module, which is configured to determine the loss value of the target detection model based on the position loss value and the confidence loss value.
  • the actual location information and predicted location information of each target are used to obtain the location loss value, and the predicted confidence level is used to obtain the confidence loss value, so as to obtain the target detection model based on the location loss value and the confidence loss value
  • the loss value of can ensure the accuracy of the loss value calculation in the training process, which can help improve the accuracy of target detection.
  • the actual location information includes the actual area size of the actual area
  • the predicted location information includes the predicted area size of the predicted area
  • the location loss value calculation sub-module includes an area size loss value calculation part configured to use each The actual area size of the target and the predicted area size are used to obtain the area size loss value.
  • the position loss value calculation sub-module includes a position loss value calculation part, which is configured to determine the position loss value based on the area size loss value.
  • the actual area size and predicted area size of each target are used to obtain the area size loss value, and based on the area size loss value, the position loss value is obtained, which can improve the accuracy of the loss value and further ensure the training process
  • the accuracy of the calculation of the mid-loss value can in turn help improve the accuracy of target detection.
  • the actual position information further includes the preset point position of the actual area; the predicted position information also includes the predicted offset information between the positive sample point of the predicted area and the preset point of the actual area, and the area size
  • the loss value calculation part is also configured to calculate the actual offset information between the preset point position of the actual area of the target and the corresponding positive sample point position, and use the actual offset information and the predicted offset information to obtain the offset loss value
  • the position loss value calculation part is further configured to determine the position loss value based on the area size loss value and the offset loss value.
  • the sample selection module 72 further includes a negative sample selection sub-module configured to use the remaining detection points as negative sample points
  • the target detection module 73 is configured to perform target detection on the sample image using the target detection model.
  • the confidence loss value calculation sub-module is configured to use the prediction confidence corresponding to the positive sample point and the prediction confidence corresponding to the negative sample point , Get the confidence loss value.
  • the prediction area information corresponding to each positive sample point and the prediction area information corresponding to each negative sample point are used to obtain the confidence loss value, which can help improve the accuracy of the confidence loss value. Conducive to improving the accuracy of target detection.
  • the sample image is a two-dimensional image or a three-dimensional image
  • the actual area is an actual bounding box
  • the predicted area is a predicted bounding box
  • setting the sample image as a two-dimensional image can achieve target detection on the two-dimensional image
  • setting the sample image as a three-dimensional image can achieve target detection on the three-dimensional image
  • FIG. 8 is a schematic structural diagram of a target detection device provided by an embodiment of the application.
  • the target detection device 80 includes an image acquisition module 81 and a target detection module 82.
  • the image acquisition module 81 is configured to acquire an image to be tested;
  • the module 82 is configured to use the target detection model to perform target detection on the image to be tested, and obtain target area information corresponding to the target in the image to be tested; wherein the target detection model is the target in the training device embodiment through any of the target detection models described above Obtained by the training device that detects the model.
  • the target detection device provided by the embodiment of the present application can improve the accuracy of target detection by using the target detection model obtained by the training device of the target detection model in the embodiment of the training device for the target detection model to perform target detection on the image to be tested. .
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the electronic device 90 includes a memory 91, a processor 92, and a communication bus 93 that are coupled to each other, and the processor 92 is configured to execute storage in the memory 91
  • the program instructions to implement the steps of any of the foregoing target detection model training method embodiments, or implement the steps of any of the foregoing target detection method embodiments.
  • the electronic device 90 may include but is not limited to: a microcomputer and a server.
  • the electronic device 90 may also include mobile devices such as a notebook computer and a tablet computer, which are not limited herein.
  • the processor 92 is configured to control itself and the memory 91 to implement the steps of any of the foregoing target detection model training method embodiments, or implement the steps of any of the foregoing target detection method embodiments.
  • the communication bus 93 is configured to connect the memory 91 and the processor 92.
  • the processor 92 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 92 may be an integrated circuit chip with signal processing capability.
  • the processor 92 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA), or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the processor 92 may be jointly implemented by an integrated circuit chip.
  • the above solution can train the target detection model based on the predicted position information corresponding to the multiple positive sample points obtained by the matching, so that the recall rate can be ensured without the need to design the anchor frame.
  • the The loss value adjusts the parameters of the target detection model, which can ensure accuracy, and thereby can improve the accuracy of target detection.
  • FIG. 10 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the application.
  • the computer-readable storage medium 100 stores program instructions 101 that can be executed by a processor, and the program instructions 101 are configured to implement any of the foregoing.
  • the above solution can train the target detection model based on the predicted position information corresponding to the multiple positive sample points obtained by the matching, so that the recall rate can be ensured without the need to design the anchor frame.
  • the The loss value adjusts the parameters of the target detection model, which can ensure accuracy, and thereby can improve the accuracy of target detection.
  • the disclosed method and device can be implemented in other ways.
  • the device implementation described above is only illustrative, for example, the division of modules or units is only a logical function division, and there may be other divisions in actual implementation, for example, units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of this embodiment.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .
  • the embodiment of the application discloses a target detection method and its model training method, device and electronic equipment.
  • the training method of the target detection model includes: obtaining a sample image, wherein the sample image is marked with the actual area where the target is located. Position information; take several points in the sample image as detection points, and select at least one detection point as the positive sample point of the target based on the distance between each detection point and the preset point of the actual area; use the target detection model to compare the sample image Perform target detection to obtain prediction area information corresponding to each positive sample point, where the prediction area information corresponding to each positive sample point includes the predicted position information of the prediction area corresponding to the positive sample point; using actual position information and predicted area information, Determine the loss value of the target detection model; adjust the parameters of the target detection model based on the loss value of the target detection model.
  • Target detection based on this model can improve the accuracy of target detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标检测方法及其模型的训练方法及相关装置、设备,其中,目标检测模型的训练方法包括:获取样本图像,其中,样本图像标注有目标所在的实际区域的实际位置信息;以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点;利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,其中,每个正样本点对应的预测区域信息包括正样本点对应的预测区域的预测位置信息;利用实际位置信息与预测区域信息,确定目标检测模型的损失值;基于目标检测模型的损失值,调整目标检测模型的参数。

Description

目标检测方法及其模型的训练方法、装置及电子设备
相关申请的交叉引用
本申请基于申请号为202010167104.7、申请日为2020年03月11日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种目标检测方法及其模型的训练方法、装置及电子设备。
背景技术
随着神经网络、深度学习等人工智能技术的发展,对神经网络模型进行训练,并利用经训练的神经网络模型完成目标检测等任务的方式,逐渐受到人们的青睐。
目前,现有的神经网络模型一般是基于锚框(anchor)匹配或者无锚框(anchor free)策略,以实现目标检测,然而现有策略在实际使用中仍然存在误检率较高的问题。
发明内容
本申请实施例提供一种目标检测方法及其模型的训练方法、装置及电子设备。
本申请实施例提供了一种目标检测模型的训练方法,包括:获取样本图像,其中,样本图像标注有目标所在的实际区域的实际位置信息;以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点;利用目标检测模型对样本图像进行目标检测,确定每个正样本点对应的预测区域信息;利用实际位置信息与预测区域信息,确定目标检测模型的损失值;基于目标检测模型的损失值,调整目标检测模型的参数。
其中,样本图像中包含多个目标;以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点,包括:对样本图像进行降采样,得到对应不同分辨率的多个特征图;基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组;其中,尺寸越大的实际区域与分辨率越小的特征图作为同一分组;对于同一分组的特征图和目标的实际区域,确定特征图中的每个点为检测点;基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点。
其中,特征图为m个;基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组,包括:计算每个目标的实际区域的面积,将面积的最大值和最小值之间的范围划分为从小到大排序的m个区间;将m个特征图按照分辨率从大到小排列,并将面积属于第i个区间的目标的实际区域与第i个特征图划分至同一分组;其中,i和m为正整数,且i为0至m之间的值。
其中,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点,包括:获得每个检测点与实际区域的预设点之间的距离;选择与预设点之间的距离满足预设条件的至少一个检测点作为目标的正样本点。
其中,将与预设点之间的距离满足预设条件的至少一个检测点确定为目标的正样本点,包括:将与预设点之间的距离最近的前若干个检测点确定为目标的正样本点。
其中,预测区域信息包括所述正样本点对应的预测区域的预测位置信息和所述预测区域的预测置信度;利用实际位置信息与预测区域信息,确定目标检测模型的损失值,包括:利用每个目标的实际位置信息与预测位置信息,得到位置损失值;利用预测置信度,得到置信度损失值;基于位置损失值和置信度损失值,确定目标检测模型的损失值。
其中,实际位置信息包括实际区域的实际区域尺寸,预测位置信息包括预测区域的预测区域尺寸;利用每个目标的实际位置信息与预测位置信息,得到位置损失值,包括:利用每个目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值;基于区域尺寸损失值,确定位置损失值。
其中,实际位置信息还包括实际区域的预设点位置;预测位置信息还包括预测区域的正样本点与实际区域的预设点之间的预测偏移信息;利用每个目标的实际位置信息与预测位置信息,得到位置损失值,还包括:计算目标的实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息;利用实际偏移信息和预测偏移信息,得到偏移损失值;基于区域尺寸损失值,确定位置损失值,包括:基于区域尺寸损失值和偏移损失值,确定位置损失值。
其中,在基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点之后,还包括:将剩余的检测点作为负样本点;利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,包括:利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息和每个负样本点对应的预测区域信息;利用预测置信度,得到置信度损失值,包括:利用正样本点对应的预测置信度和负样本点对应的预测置信度,得到置信度损失值。
其中,样本图像为二维图像或三维图像,实际区域为实际边界框,预测区域为预测边界框。
因此,将样本图像设置为二维图像,能够实现对二维图像进行目标检测,将样本图像设置为三维图像,能够实现对三维图像进行目标检测。
本申请实施例提供了一种目标检测方法,包括:获取待测图像;利用目标检测模型对待测图像进行目标检测,得到与待测图像中的目标对应的目标区域信息;其 中,目标检测模型是通过上述第一方面中的目标检测模型的训练方法得到的。
本申请实施例提供了一种目标检测模型的训练装置,包括图像获取模块、样本选取模块、目标检测模块和损失确定模块、参数调整模块,图像获取模块配置为获取样本图像;样本选取模块配置为以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点;目标检测模块配置为利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息;损失确定模块配置为利用实际位置信息与预测区域信息,确定目标检测模型的损失值;参数调整模块配置为基于目标检测模型的损失值,调整目标检测模型的参数。
本申请实施例提供了一种目标检测装置,包括图像获取模块和目标检测模块,图像获取模块配置为获取待测图像;目标检测模块配置为利用目标检测模型对待测图像进行目标检测,得到与待测图像中的目标对应的目标区域信息;其中,目标检测模型是通过上述目标检测模型的训练装置得到的。
本申请实施例提供了一种电子设备,包括相互耦接的存储器和处理器,处理器配置为执行存储器中存储的程序指令,以实现上述目标检测模型的训练方法,或实现上述目标检测方法。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有程序指令,程序指令被处理器执行时实现上述目标检测模型的训练方法,或实现目标检测方法。
本申请实施例提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行配置为实现上述任一项所述的目标检测模型的训练方法,或者所述的目标检测方法。
本申请实施例提供的目标检测方法及其模型的训练方法、装置及电子设备,通过将样本图像中的若干点作为检测点,并基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点,从而利用目标监测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,并利用样本图像中目标所在的实际区域的实际位置信息和预测区域信息所包括的预测位置信息,确定目标检测模型的损失值,从而基于目标检测模型的损失值,调整目标检测模型的参数,能够基于匹配得到的多个正样本点所对应的预测位置信息进行目标检测模型的训练,从而能够在无需设计锚框的前提下,确保召回率,此外,通过基于与位置信息相关的损失值调整目标检测模型的参数,能够确保准确率,进而能够提高目标检测的准确性。
附图说明
图1为本申请实施例提供一种目标检测模型的训练及其应用的网络架构的示意图;
图2为本申请实施例提供的一种目标检测模型的训练方法的流程示意图;
图3为本申请实施例提供的目标检测模型的训练方法中步骤S22的实现流程示意图;
图4为本申请实施例提供的目标检测方法的流程示意图;
图5为本申请实施例提供的目标检测方法得到的若干预测区域信息的示意图;
图6为本申请实施例提供的另一种目标检测方法的流程示意图;
图7为本申请实施例提供的目标检测模型的训练装置的结构示意图;
图8为本申请实施例提供的目标检测装置的结构示意图;
图9为本申请实施例提供的电子设备的结构示意图;
图10为本申请实施例提供的计算机可读存储介质的结构示意图。
具体实施方式
下面结合说明书附图,对本申请实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定***结构、接口、技术之类的具体细节,以便透彻理解本申请。
本文中术语“***”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。
图1为本申请实施例提供的一种网络架构的示意图,如图1所示,在该网络架构中包括CT机11、计算机设备12和服务器13,其中,CT机11用于采集原始图像。CT机11与计算机设备12建立有通信连接,CT机11可以将得到的原始图像发送给计算机设备12,计算机设备12对原始图像进行标记等处理得到样本图像。在一些实施例中,服务器13中存储有样本图像,计算机设备12与服务器13同样建立有通信连接,计算机设备12可以从服务器13处直接获取样本图像。当计算机设备12获取到样本图像后,基于样本图像调整目标检测模型的参数。本申请实施例中,当CT机11获取到待测图像后,计算机设备12接收到待测图像,计算机设备12基于目标检测模型得到与待测图像中的目标对应的目标区域信息。
在一些实施例中,也可以是服务器13获取到样本图像后,基于样本图像调整自身存储的目标检测模型的参数。本申请实施例中,CT机12采集到待测图像后,通过计算机设备12将待测图像发送给服务器13,以使服务器13基于目标检测模型得 到与待测图像中的目标对应的目标区域信息,服务器13在得到目标区域信息后,将目标区域信息返回给计算机设备12。
结合图1所示的应用场景示意图,以下对目标检测方法及其模型的训练方法、装置及电子设备的各实施例进行说明。
本申请实施例提供的一种目标检测模型的训练方法,所述方法应用于目标检测模型的训练设备,所述目标检测模型的训练设备可以是如图1中的计算机设备12,在一些实施例中,也可以是如图1中的服务器13。本申请实施例提供的方法可以通过计算机程序来实现,该计算机程序在执行的时候,完成本申请实施例提供的目标检测模型的训练方法中各个步骤。在一些实施例中,该计算机程序可以被目标检测模型的训练设备的处理器执行。图2为本申请实施例提供的一种目标检测模型的训练方法的流程示意图,如图2所示,目标检测模型的训练方法可以包括如下步骤:
步骤S21:获取样本图像。
本申请实施例中,样本图像标注有目标所在的实际区域的实际位置信息。在本申请的一些实施例中,为了明确实际区域的具体范围,实际区域可以是实际边界框(Bounding Box),例如,目标的实际边界框,实际边界框可以是矩形框,在此不做限定。在一个实施场景中,为了能够唯一表示一个实际边界框,实际位置信息可以包括实际区域的预设点(例如,实际区域的中心点)的位置信息和实际区域的尺寸(例如,实际边界框的长度和宽度)。
在本申请的一些实施例中,为了实现对二维图像进行目标检测,样本图像可以是二维图像。在另一些实施场景中,为了实现对三维图像进行目标检测,样本图像可以是三维图像,在此不做限定。
在本申请的一些实施例中,为了使目标检测应用于医学图像领域,样本图像可以是医学图像,医学图像可以是CT(Computed Tomography,计算机断层扫描)图像、MR(Magnetic Resonance,核磁共振)图像,在此不做限定。当所述样本图像时医学图像时,样本图像中的目标可以是生物器官等,例如,脑垂体、胰腺等;或者,样本图像中的目标还可以是病变组织等,例如,腔梗、血肿等,在此不做限定。当应用于其他领域时,可以以此类推,在此不再一一举例。
步骤S22:以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点。
在本申请的一些实施例中,为了提高目标检测模型的准确性,以及后续进行目标检测的准确性,可以获取每个检测点与实际区域的预设点之间的距离,从而将与预设点之间的距离满足预设条件的至少一个检测点确定为目标的正样本点。可以选取与预设点之间的距离小于一预设距离阈值的至少部分检测点,作为目标的正样本点,例如,将与预设点之间的距离小于5个像素点的至少部分检测点,或者,将与预设点之间的距离小于8个像素点的至少部分检测点,作为目标的正样本点,在此 不做限定。在本申请的一些实施例中,为了确保不同大小的目标之间的梯度均衡,还可以将与预设点之间的距离最近的前若干个检测点确定为目标的正样本点,前若干个检测点可以是前10个检测点、前20个检测点、前30个检测点等等,在此不做限定,通过以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点,使得每个实际区域均匹配到数量相同的正样本点,从而能够有利于确保不同大小的目标之间的梯度均衡,进而能够有利于提高目标检测的准确性。
步骤S23:利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息。
本申请实施例中,每个正样本点对应的预测区域信息包括正样本点对应的预测区域的预测位置信息。在一些实施场景中,为了明确预测区域的范围,预测区域可以是预测边界框,预测边界框可以是矩形,在此不做限定。在一些实施场景中,为了能够唯一表示一个预测边界框,预测区域信息可以包括预测区域的预设点(例如,预测区域的中心点)的位置信息和预测区域的尺寸(例如,预测边界框的长度和宽度)。
步骤S24:利用实际位置信息与预测区域信息,确定目标检测模型的损失值。
在本申请的一些实施例中,为了提高损失计算的准确性,从而提高目标检测模型的准确性,进而提高后续目标检测的准确性,预测区域信息还可以包括预测区域的预测置信度,预测置信度可以表示预测区域的可信度,预测置信度越高,表明预测区域的可信度越高,从而利用每个目标的实际位置信息与预测位置信息,得到位置损失值,并利用预测置信度,得到置信度损失值,基于位置损失值和置信度损失值,得到目标检测模型的损失值。
在本申请的一些实施例中,可以采用二分类交叉熵损失函数、均方误差损失函数、L 1损失函数中的至少一种计算损失值,在此不做限定。其中,L 1损失函数,也被称为最小绝对值偏差(Least Absolute Deviation,LAD)或最小绝对值误差(Least Absolute Error,LAE),总的来说就是把目标值y (i)和估计值
Figure PCTCN2020100704-appb-000001
的绝对差值的总和最小化,可以参见公式(1):
Figure PCTCN2020100704-appb-000002
在公式(1)中,m表示正样本点的数量,y (i)为目标值,
Figure PCTCN2020100704-appb-000003
为估计值,L 1为损失函数。
此外,还可以采用L 2损失函数,也被称为最小平方误差(Least Square Error,LSE),总的来说,就是把m个正样本点的目标值y (i)和估计值
Figure PCTCN2020100704-appb-000004
的差值平方和最小化,可以参见公式(2):
Figure PCTCN2020100704-appb-000005
在公式(2)中,m表示正样本点的数量,y (i)为目标值,
Figure PCTCN2020100704-appb-000006
为估计值,L 2为损失函数。
在本申请的一些实施例中,实际位置信息还可以包括实际区域的实际区域尺寸,预测区域信息还可以包括预测区域的预测区域尺寸,为了进一步提高后续区域尺寸预测的准确性,还可以利用每个目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值,并基于区域尺寸损失值,确定位置损失值。
在本申请的一些实施例中,基于位置损失值和置信度损失值在计算目标检测模型的损失值时,可以预先设置与位置损失值对应的位置损失权重和与置信度损失值对应的置信度损失权重,并分别利用位置损失权重和置信度损失权重对位置损失值和置信度损失值进行加权处理,从而得到目标检测模型的损失值。
在本申请的一些实施例中,为了提高目标检测模型的准确性,特别是提高对于小目标的检测准确性,实际位置信息还可以包括实际区域的预设点位置,预测位置还可以包括预测区域的正样本点与实际区域的预设点之间的预测偏移信息,从而可以计算目标的实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息,并利用实际偏移信息和预测偏移信息得到偏移损失值,进而可以基于区域尺寸损失值和偏移损失值,确定位置损失值。示例性地,可以利用IoU(Intersection over Union,交并比)损失函数或L 1损失函数对每个目标的实际区域尺寸和预测区域尺寸进行计算,得到区域尺寸损失值,并利用L 1损失函数对实际偏移信息和预测偏移信息进行计算,得到偏移损失值。其中,IoU为实际区域和预测区域之间的交集与并集之间的比值;利用L 1损失函数,计算预测边界框的实际边界框之间的长度差,和/或,预测边界框和实际边界框之间的宽度差,可以参阅前述相关步骤。
以样本图像的尺寸是100*100为例,实际区域的预设点(如中心点)位置为(38,37.5),类别为人,某一正样本点的位置为(37.5,37.5),利用目标检测模型预测得到的预测区域的尺寸为10*15,预测偏移信息为(offset-x,offset-y),类别为人的置信度为0.9,类别为猫的置信度为0.2,可以计算目标的实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息为(0.5,0.1),若目标为小目标,其对应的实际区域的尺寸为0.0.2*0.04,则上述偏移量大于实际区域的尺寸,从而导致目标检测的偏差很大,故对偏移量进行损失计算,并进行训练,能够使得预测出来的偏移量趋近于或等于实际的偏移量。
在本申请的一些实施例中,为了进一步提高置信度损失值的准确性,进而提高目标检测的准确性,还可以将除正样本点之外的检测点作为负样本点,并利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息和每个 负样本点对应的预测区域信息,进而利用正样本点对应的预测置信度和负样本点对应的预测置信度,得到置信度损失值。
步骤S25:基于目标检测模型的损失值,调整目标检测模型的参数。
基于计算得到的目标检测模型的损失值,可以对目标检测模型的参数进行调整。目标检测模型的参数可以包括但不限于:目标检测模型的卷积层的权重。
在本申请的一些实施例中,在对目标检测模型的参数进行调整之后,还可以重新执行上述步骤S23以及后续步骤,直至损失值满足预设训练结束条件为止。在本申请的一些实施例中,预设训练结束条件可以包括:目标检测模型的损失值小于一预设损失阈值,且目标检测模型的损失值不再减小。
本申请实施例提供的目标检测模型的训练方法,通过将样本图像中的若干点作为检测点,并基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点,从而利用目标监测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,并利用样本图像中目标所在的实际区域的实际位置信息和预测区域信息所包括的预测位置信息,确定目标检测模型的损失值,从而基于目标检测模型的损失值,调整目标检测模型的参数,能够基于匹配得到的多个正样本点所对应的预测位置信息进行目标检测模型的训练,从而能够在无需设计锚框的前提下,确保召回率,此外,通过基于与位置信息相关的损失值调整目标检测模型的参数,能够确保准确率,进而能够提高目标检测的准确性。
请参阅图3,图3为本申请实施例提供的目标检测模型的训练方法中步骤S22的流程示意图。本申请实施例中,样本图像中可以包括多个目标,步骤S22可以通过以下步骤实现:
步骤S221:对样本图像进行降采样,得到对应不同分辨率的多个特征图。
在本申请的一些实施例中,可以采用FPN(Feature Pyramid Networks,特征金字塔网络)对样本图像进行降采样,从而得到对应不同分辨率的多个特征图。在本申请的一些实施例中,上述FPN可以为目标检测模型的一部分,从而将样本图像输入目标检测模型即可得到对应不同分辨率的多个特征图。以128*128的样本图像为例,对其进行降采样可以得到对应4*4分辨率的特征图、对应8*8的特征图、对应16*16的特征图等等,在此不做限定。在此基础上,4*4分辨率的特征图中每个特征点对应样本图像的32*32像素区域,而8*8分辨率的特征图中每个特征点对应样本图像的16*16像素区域,而16*16分辨率的特征图中每个特征点对应样本图像的8*8像素区域。其他分辨率的特征图可以以此类推,在此不再一一举例。
步骤S222:基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组。
本申请实施例中,尺寸越大的实际区域与分辨率越小的特征图作为同一分组。 实际区域的尺寸越大,说明目标越大,反之,说明目标越小,故可采用小分辨率的特征图负责检测大目标,而采用大分辨率的特征图负责检测小目标。仍以上述128*128的样本图像为例,样本图像中多个目标的实际区域的尺寸分别是16*32、11*22、10*20、5*10,故可以将尺寸为16*32的实际区域与分辨率为4*4的特征图分为同一分组,将尺寸为11*22的实际区域和尺寸为10*20的实际区域与分辨率为8*8的特征图分为同一分组,将尺寸为5*10的实际区域与分辨率为16*16的特征图分为同一分组,在此不做限定。
在本申请的一些实施例中,为了准确地将多个目标的实际区域与多个特征图进行分组,还可以计算每个目标的实际区域的面积,将面积的最大值和最小值间的范围划分为从小到大排序的m个区间,其中,m为特征图的数量,将m个特征图按照分辨率从大到小的顺序排列,并将面积属于第i个区间的目标的实际区域与第i个特征图划分至同一分组,其中,i和m为正整数,且i为0至m之间的值。仍以上述128*128的样本图像为例,不同分辨率的特征图的数量m为3,样本图像中多个目标的实际区域的尺寸分别是16*32、11*22、10*20、5*10,面积分别为512、242、200、50,其最大值512和最小值50之间划分3个区间,分别为50~204、204~358、358~512,将4*4分辨率的特征图、8*8的特征图、16*16的特征图按照分辨率从大到小排序为:16*16分辨率的特征图、8*8分辨率的特征图、4*4分辨率的特征图,面积属于第1个区间(即50~204)的目标的实际区域为10*20的实际区域和5*10的实际区域,故将两者与第1个特征图(即分辨率为16*16的特征图)划分至同一分组;面积属于第2个区间(即204~358)的目标的实际区域为11*22的实际区域,故将其与第2个特征图(即分辨率为8*8的特征图)划分至同一分组;面积属于第3个区间(即358~512)的目标的实际区域为16*32的实际区域,故将其与第3个特征图(即分辨率为4*4的特征图)划分至同一分组。其他样本图像可以以此类推,在此不再一一举例。
步骤S223:对于同一分组的特征图和目标的实际区域,以特征图中的每个点为检测点,基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本。
本申请实施例中,可以根据检测点在特征图中的位置坐标和特征图的分辨率,确定检测点在样本图像中的位置坐标,从而根据检测点在样本图像中的位置坐标计算检测点与实际区域的预设点之间的距离。以16*32的实际区域和分辨率为4*4的特征图为例,将4*4特征图中的每个特征点分别作为检测点,由于分辨率为4*4的特征图每个特征点对应128*128样本图像中的32*32,故检测点(1,1)对应于样本图像中的(16,16),检测点(1,2)对应于样本图像中的(16,48),检测点(1,3)对应于样本图像中的(16,80),检测点(1,4)对应于样本图像中的(16,112)检测点 (2,1)对应于样本图像中的(48,16),检测点(2,2)对应于样本图像中的(48,48),检测点(2,3)对应于样本图像中的(48,80),检测点(2,4)对应于样本图像中的(48,112),若16*32的实际区域的预设点在样本图像中位置为(16,32),利用欧氏距离,可以计算上述检测点距离实际区域的预设点的距离分别为:16、16、48、80、35.78、35.78、57.69、86.16,其他检测点以此类推,在此不再一一举例。当选择与预设点之间的距离最近的前若干个检测点作为目标的正样本点时,若上述前若干个检测点为4个检测点,则实际区域的尺寸为16*32的目标的正样本点可以是分辨率为4*4的特征图中的特征点(1,1)、(1,2)和(2,1)、(2,2),其他情况可以以此类推,在此不再一一举例。
区别于前述实施例,通过对样本图像进行降采样,得到对应不同分辨率的多个特征图,从而基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组,且尺寸越大的实际区域和分辨率越小的特征图作为同一分组,从而对同一分组的特征图和目标的实际区域,以特征图的每个点为检测点,执行基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点的步骤,一方面能够使得分辨率高的特征图对应于小尺寸的目标,而分辨率低的特征图对应于大尺寸的目标,从而有利于实现多尺度的目标检测,另一方面能够以每个分组的特征图的每个点为检测点进行正样本点的选取,从而能够有利于确保产生尽可能多的正样本点,进而有利于确保召回率,进而有利于提高目标检测的准确性。
本申请实施例提供的一种目标检测方法,所述方法应用于目标检测设备,所述目标检测设备可以是计算机设备,本申请实施例提供的方法可以通过计算机程序来实现,该计算机程序在执行的时候,完成本申请实施例提供的目标检测方法中各个步骤。在一些实施例中,该计算机程序可以被目标检测设备的处理器执行,图4为本申请实施例提供的目标检测方法的流程示意图,如图4所示,目标检测方法可以包括如下步骤:
步骤S41:获取待测图像。
在本申请的一些实施例中,为了实现对二维图像进行目标检测,待测图像可以是二维图像。在另一些实施场景中,为了实现对三维图像进行目标检测,待测图像可以是三维图像,在此不做限定。
在本申请的一些实施例中,为了使目标检测应用于医学图像领域,待测图像可以是医学图像,例如,CT(Computed Tomography,计算机断层扫描)图像、MR(Magnetic Resonance,核磁共振)图像,在此不做限定。对应的,待测图像中的目标可以是生物器官等,例如,脑垂体、胰腺等;或者,待测图像中的目标还可以是病变组织等,例如,腔梗、血肿等,在此不做限定。当应用于其他领域时,可以以 此类推,在此不再一一举例。
步骤S42:利用目标检测模型对待测图像进行目标检测,得到与待测图像中的目标对应的目标区域信息。
本申请实施例中,目标检测模型是通过上述任一目标检测模型的训练方法实施例中的步骤得到的。可以参阅前述任一目标检测模型的训练方法实施例中的步骤。
在本申请的一些实施例中,为了提高目标检测的准确性,可以以待测图像中的若干点为检测点,并利用目标检测模型对待测图像进行目标检测,得到每个检测点对应的预测区域信息,其中,每个检测点对应的预测区域信息包括检测点对应的预测区域的预测置信度和预测区域位置信息,并基于每个检测点对应的预测区域的预测置信度和预测区域位置信息,采用非极大值抑制(Non-Maximum Suppression,NMS)得到与待测图像中的目标对应的目标区域信息。图5为本申请实施例提供的目标检测方法得到的若干预测区域信息的示意图,如图5所示,预测区域01~预测区域05分别是与每个检测点对应的预测区域,且检测得到预测区域01的预测置信度为0.6、预测区域02的预测置信度为0.9、预测区域03的预测置信度为0.8、预测区域04的置信度为0.9、预测区域05的置信度为0.8,将上述预测区域按照预测置信度从小到大排列为:预测区域01、预测区域03、预测区域05、预测区域02、预测区域04,选取预测置信度最大的预测区域04,利用预测位置信息,分别判断预测区域01、预测区域03、预测区域05、预测区域02与预测区域04的IoU是否大于一预设交并比阈值(例如,60%),若是,则丢弃,如图5所示,预测区域05与预测区域04的交并比较大,假设为85%,则将预测区域05丢弃,而预测区域01~预测区域03与预测区域04的交并比为0,故保留,此时将预测区域04作为与目标对应的目标区域,从剩下的预测区域01~预测区域03中选取预测置信度最大的预测区域02,并基于预测位置信息,判断预测区域01和预测区域03与预测区域02的IoU是否大于一预设交并比阈值(例如,60%),若是,则丢弃,假设预测区域01和预测区域03与预测区域02的IoU分别为65%、70%,则将预测区域01和预测区域03丢弃,并保留预测区域02作为与目标对应的目标区域。其他情况可以以此类推,在此不再一一举例。
在本申请的一些实施例中,为了实现对待测图像的多尺度检测,从而尽可能全面地检测出待测图像中的目标,特别是小目标,还可以对待测图像进行降采样,得到对应不同分辨率的多个特征图,并将多个特征图中的若干特征点作为检测点,并利用目标检测模型对待测图像进行目标检测,得到每个检测点对应的预测区域信息,并基于每个检测点对应的预测区域的预测置信度和预测区域位置信息,采用非极大值抑制(Non-Maximum Suppression,NMS)从若干检测点对应的预测区域信息中,确定得到与待测图像中的目标对应的目标区域信息。可以参阅前述相关步骤。
在本申请的一些实施例中,在目标检测模型的训练过程中,为了提高目标检测模型的准确性,特别是提高对于小目标的检测准确性,预测位置还可以包括预测区域的正样本点与实际区域的预设点之间的预测偏移信息,从而可以计算目标的实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息,并利用实际偏移信息和预测偏移信息得到偏移损失值,进而可以基于区域尺寸损失值和偏移损失值,得到位置损失值,以利用位置损失值对目标检测模型的参数进行调整,故在对待测图像进行目标检测时所得到的目标区域信息还可以包括目标区域与检测点(x0,y0)之间的偏移信息(offset-x,offset-y),故目标在待测图像中的位置可以表示为(x0+offset-x,y0+offset-y),并基于检测得到的类别置信度确定目标的类别,例如,检测到目标为人的类别置信度为0.9,目标为猫的类别置信度为0.1,故可以确定检测到的目标为人。此外,目标区域信息还可以包括目标区域的尺寸(例如,长度和宽度)。
本申请实施例提供的目标检测方法,通过利用上述各个实施例中的目标检测模型的训练方法得到的目标检测模型对待测图像进行目标检测,能够提高目标检测的准确性。
基于前述的各个实施例,本申请实施例再提供一种目标检测方法,所述方法包括:
步骤S61,将获取的待测图像通过FPN网络得到不同分辨率的特征图。
步骤S62,将不同分辨率的特征图进行分组处理。
根据gt box(同上述各个实施例中的实际区域的面积)的大小和不同分辨率下的特征图进行分组,分辨率较高的特征图负责检测小目标,分辨率较低的特征图负责检测较大的目标。计算损失函数时,先对每一个gt box根据检测点到其gt box中心点的距离排序,选择前k个检测点作为该gt box的正样本点,其余点为该gt box的负样本点。根据相应的正样本利用IoU损失回归其gt box的高(H,High)和宽(W,Width)大小,并对相应的正样本点的偏移量用L1损失函数进行回归。
步骤S63,基于该分组在推断的过程中采用NMS操作去除重复的检出框。
本申请实施例提供的方法,具备了足够多的正样本来保证召回率(recall),同时由于每个gt box匹配到相同的数量的正样本可以保证分类loss中不同大小的目标之间的梯度平衡。采用IOU loss来回归gt box的H和W,同时采用L1loss计算正样本点到实际gt box中心点的偏移值(offset),得到更精确的位置信息。
对于医疗影像中的疾病位置检测,图6为本申请实施例基于医疗影像中待测图片处理过程示意图,如图6所示,将待侧图像601通过FPN网络得到不同分辨率的特征图,将不同分辨率的特征图进行分组处理得到各个分组602,基于各个分组602采用NMS操作去除重复的检出框,得到疾病位置的图像603。如此,提高了检测精 度,降低假阳。
图7是本申请实施例提供的目标检测模型的训练装置的结构示意图,如图7所示,目标检测模型的训练装置70包括:图像获取模块71、样本选取模块72、目标检测模块73、损失确定模块74和参数调整模块75,图像获取模块71配置为获取样本图像,其中,样本图像标注有目标所在的实际区域的实际位置信息;样本选取模块72配置为以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点;目标检测模块73配置为利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息;损失确定模块74配置为利用实际位置信息与预测区域信息,确定目标检测模型的损失值;参数调整模块75配置为基于目标检测模型的损失值,调整目标检测模型的参数。
本申请实施例提供的目标检测模型的训练装置,通过将样本图像中的若干点作为检测点,并基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点,从而利用目标监测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,并利用样本图像中目标所在的实际区域的实际位置信息和预测区域信息所包括的预测位置信息,确定目标检测模型的损失值,从而基于目标检测模型的损失值,调整目标检测模型的参数,能够基于匹配得到的多个正样本点所对应的预测位置信息进行目标检测模型的训练,从而能够在无需设计锚框的前提下,确保召回率,此外,通过基于与位置信息相关的损失值调整目标检测模型的参数,能够确保准确率,进而能够提高目标检测的准确性。
在本申请的一些实施例中,样本图像中包含多个目标,样本选取模块72包括降采样子模块,配置为对样本图像进行降采样,得到对应不同分辨率的多个特征图,样本选取模块72还包括分组子模块,配置为基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组;其中,尺寸越大的实际区域与分辨率越小的特征图作为同一分组,样本选取模块72还包括样本选取子模块,配置为对于同一分组的特征图和目标的实际区域,确定特征图中的每个点为检测点,基于每个检测点与实际区域的预设点之间的距离,将至少一个检测点确定为目标的正样本点的步骤。
区别于前述实施例,通过对样本图像进行降采样,得到对应不同分辨率的多个特征图,从而基于目标的实际区域的尺寸,将多个目标的实际区域与多个特征图进行分组,且尺寸越大的实际区域和分辨率越小的特征图作为同一分组,从而对同一分组的特征图和目标的实际区域,以特征图的每个点为检测点,执行基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点的步骤,一方面能够使得分辨率高的特征图负责小尺寸的目标,而分辨率低的特征图负责大尺寸的目标,从而有利于实现多尺度的目标检测,另一方面能够以每个分组的 特征图的每个点为检测点进行正样本点的选取,从而能够有利于确保产生尽可能多的正样本点,进而有利于确保召回率,进而有利于提高目标检测的准确性。
在本申请的一些实施例中,特征图为m个,分组子模块包括区间划分部分,配置为计算每个目标的实际区域的面积,将面积的最大值和最小值之间的范围划分为从小到大排序的m个区间,分组子模块包括分组划分部分,配置为将m个特征图按照分辨率从大到小排列,并将面积属于第i个区间的目标的实际区域与第i个特征图划分至同一分组;其中,i和m为正整数,且i为0至m之间的值。
区别于前述实施例,通过计算每个目标的实际区域的面积,将面积的最大值和最小值之间的范围划分为从小到大排序的m个区间,且m与特征图的数量相同,并将m个特征图按照分辨率从大到小排序,将面积属于第i个区间的目标的实际区域与第i个特征图划分至同一分组,能够使得尺寸越大的实际区域与分辨率越小的特征图作为同一分组,从而能够有利于实现多尺度的目标检测,进而能够有利于提高目标检测的准确性。
在本申请的一些实施例中,样本选取模块72还包括距离计算子模块,配置为获得每个检测点与实际区域的预设点之间的距离,样本选取模块72还包括距离判断子模块,配置为将与预设点之间的距离满足预设条件的至少一个检测点确定为目标的正样本点。
区别于前述实施例,通过获取每个检测点与实际区域的预设点之间的距离,并选择与预设点之间的距离满足预设条件的至少一个检测点作为目标的正样本点,能够有利于确保产生尽可能多的正样本点,进而有利于确保召回率,进而有利于提高目标检测的准确性。
在本申请的一些实施例中,距离判断子模块,配置为将与预设点之间的距离最近的前若干个检测点作为目标的正样本点。
区别于前述实施例,通过选择与预设点之间的距离最近的前若干个检测点作为目标的正样本点,能够使得每个实际区域均匹配到数量相同的正样本点,从而能够有利于确保不同大小的目标之间的梯度均衡,进而能够有利于提高目标检测的准确性。
在本申请的一些实施例中,预测区域信息包括所述正样本点对应的预测区域的预测位置信息和所述预测区域的预测置信度,损失确定模块74包括位置损失值计算子模块,配置为利用每个目标的实际位置信息与预测位置信息,得到位置损失值,损失确定模块74还包括置信度损失值计算子模块,配置为利用预测置信度,得到置信度损失值,损失确定模块74还包括模型损失值计算子模块,配置为基于位置损失值和置信度损失值,确定目标检测模型的损失值。
区别于前述实施例,通过每个目标的实际位置信息与预测位置信息,得到位置 损失值,并利用预测置信度得到置信度损失值,从而基于位置损失值和置信度损失值,得到目标检测模型的损失值,能够确保训练过程中损失值计算的准确性,进而能够有利于提高目标检测的准确性。
在本申请的一些实施例中,实际位置信息包括实际区域的实际区域尺寸,预测位置信息包括预测区域的预测区域尺寸,位置损失值计算子模块包括区域尺寸损失值计算部分,配置为利用每个目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值,位置损失值计算子模块包括位置损失值计算部分,配置为基于区域尺寸损失值,确定位置损失值。
区别于前述实施例,利用每个目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值,并基于区域尺寸损失值,得到位置损失值,能够提高损失值的准确性,能够进一步确保训练过程中损失值计算的准确性,进而能够有利于提高目标检测的准确性。
在本申请的一些实施例中,实际位置信息还包括实际区域的预设点位置;预测位置信息还包括预测区域的正样本点与实际区域的预设点之间的预测偏移信息,区域尺寸损失值计算部分还配置为计算目标的实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息,并利用实际偏移信息和预测偏移信息,得到偏移损失值,位置损失值计算部分还配置为基于区域尺寸损失值和偏移损失值,确定位置损失值。
区别于前述实施例,基于预测区域的正样本点与实际区域的预设点之间的预测偏移信息,以及实际区域的预设点位置与对应的正样本点位置之间的实际偏移信息,得到偏移损失值,并基于区域尺寸损失值和偏移损失值,确定位置损失值,能够提高位置损失值的准确性,进而能够提高目标检测的准确性,特别是能够提高小目标的检测准确性。
在本申请的一些实施例中,样本选取模块72还包括负样本选取子模块,配置为将剩余的检测点作为负样本点,目标检测模块73配置为利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息和每个负样本点对应的预测区域信息,置信度损失值计算子模块配置为利用正样本点对应的预测置信度和负样本点对应的预测置信度,得到置信度损失值。
区别于前述实施例,利用每个正样本点对应的预测区域信息和每个负样本点对应的预测区域信息,得到置信度损失值,能够有利于提高置信度损失值的准确性,进而能够有利于提高目标检测的准确性。
在本申请的一些实施例中,样本图像为二维图像或三维图像,实际区域为实际边界框,预测区域为预测边界框。
区别于前述实施例,将样本图像设置为二维图像,能够实现对二维图像进行目 标检测,将样本图像设置为三维图像,能够实现对三维图像进行目标检测。
图8为本申请实施例提供的目标检测装置的结构示意图,如图8所示,目标检测装置80包括图像获取模块81和目标检测模块82,图像获取模块81配置为获取待测图像;目标检测模块82配置为利用目标检测模型对待测图像进行目标检测,得到与待测图像中的目标对应的目标区域信息;其中,目标检测模型是通过上述任一目标检测模型的训练装置实施例中的目标检测模型的训练装置得到的。
本申请实施例提供的目标检测装置,通过利用上述任一目标检测模型的训练装置实施例中的目标检测模型的训练装置得到的目标检测模型对待测图像进行目标检测,能够提高目标检测的准确性。
图9为本申请实施例提供的电子设备的结构示意图,如图9所示,电子设备90包括相互耦接的存储器91、处理器92和通信总线93,处理器92配置为执行存储器91中存储的程序指令,以实现上述任一目标检测模型的训练方法实施例的步骤,或实现上述任一目标检测方法实施例中的步骤。在一些实施场景中,电子设备90可以包括但不限于:微型计算机、服务器,此外,电子设备90还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。
处理器92配置为控制其自身以及存储器91以实现上述任一目标检测模型的训练方法实施例的步骤,或实现上述任一目标检测方法实施例中的步骤。通信总线93配置为连接存储器91和处理器92。处理器92还可以称为CPU(Central Processing Unit,中央处理单元)。处理器92可能是一种集成电路芯片,具有信号的处理能力。处理器92还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器92可以由集成电路芯片共同实现。
上述方案,能够基于匹配得到的多个正样本点所对应的预测位置信息进行目标检测模型的训练,从而能够在无需设计锚框的前提下,确保召回率,此外,通过基于与位置信息相关的损失值调整目标检测模型的参数,能够确保准确率,进而能够提高目标检测的准确性。
图10为本申请实施例提供的计算机可读存储介质的结构示意图,如图10所示,计算机可读存储介质100存储有能够被处理器运行的程序指令101,程序指令101配置为实现上述任一目标检测模型的训练方法实施例的步骤,或实现上述任一目标检测方法实施例中的步骤。
上述方案,能够基于匹配得到的多个正样本点所对应的预测位置信息进行目标检测模型的训练,从而能够在无需设计锚框的前提下,确保召回率,此外,通过基 于与位置信息相关的损失值调整目标检测模型的参数,能够确保准确率,进而能够提高目标检测的准确性。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
工业实用性
本申请实施例公开了一种目标检测方法及其模型的训练方法、装置及电子设备,其中,目标检测模型的训练方法包括:获取样本图像,其中,样本图像标注有目标所在的实际区域的实际位置信息;以样本图像中的若干点为检测点,基于每个检测点与实际区域的预设点之间的距离,选择至少一个检测点作为目标的正样本点;利用目标检测模型对样本图像进行目标检测,得到每个正样本点对应的预测区域信息,其中,每个正样本点对应的预测区域信息包括正样本点对应的预测区域的预测位置信息;利用实际位置信息与预测区域信息,确定目标检测模型的损失值;基于目标检测模型的损失值,调整目标检测模型的参数。在基于该模型进行目标检测是,能够提高目标检测的准确性。

Claims (25)

  1. 一种目标检测模型的训练方法,包括:
    获取样本图像,其中,所述样本图像标注有目标所在的实际区域的实际位置信息;
    以所述样本图像中的若干点为检测点,基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点;
    利用目标检测模型对所述样本图像进行目标检测,确定每个所述正样本点对应的预测区域信息;
    利用所述实际位置信息与所述预测区域信息,确定所述目标检测模型的损失值;
    基于所述目标检测模型的损失值,调整所述目标检测模型的参数。
  2. 根据权利要求1所述的训练方法,所述样本图像中包含多个所述目标;
    所述以所述样本图像中的若干点为检测点,基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点,包括:
    对所述样本图像进行降采样,得到对应不同分辨率的多个特征图;
    基于所述目标的实际区域的尺寸,将所述多个目标的实际区域与所述多个特征图进行分组;其中,尺寸越大的所述实际区域与分辨率越小的所述特征图作为同一分组;
    对于同一分组的特征图和所述目标的实际区域,将所述特征图中的每个点确定为检测点;
    基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点。
  3. 根据权利要求2所述的训练方法,所述特征图为m个;
    所述基于所述目标的实际区域的尺寸,将所述多个目标的实际区域与所述多个特征图进行分组,包括:
    计算每个所述目标的实际区域的面积,将所述面积的最大值和最小值之间的范围划分为从小到大排序的m个区间;
    将所述m个特征图按照分辨率从大到小排列,并将面积属于第i个区间的所述目标的实际区域与第i个特征图划分至同一分组;其中,i和m为正整数,且i为0至m之间的值。
  4. 根据权利要求1至3任一项所述的训练方法,所述基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点,包括:
    获得每个所述检测点与所述实际区域的预设点之间的距离;
    将与所述预设点之间的距离满足预设条件的至少一个所述检测点确定为所述目标的正样本点。
  5. 根据权利要求4所述的训练方法,所述将与所述预设点之间的距离满足预设条件的至少一个所述检测点确定为所述目标的正样本点,包括:
    将与所述预设点之间的距离最近的前若干个检测点确定为所述目标的正样本点。
  6. 根据权利要求1所述的训练方法,预测区域信息包括所述正样本点对应的预测区域的预测位置信息和所述预测区域的预测置信度,
    所述利用所述实际位置信息与所述预测区域信息,确定所述目标检测模型的损失值,包括:
    利用每个目标的所述实际位置信息与所述预测位置信息,得到位置损失值;
    利用所述预测置信度,得到置信度损失值;
    基于所述位置损失值和所述置信度损失值,确定所述目标检测模型的损失值。
  7. 根据权利要求6所述的训练方法,所述实际位置信息包括所述实际区域的实际区域尺寸,所述预测位置信息包括所述预测区域的预测区域尺寸;
    所述利用每个目标的所述实际位置信息与所述预测位置信息,得到位置损失值,包括:
    利用每个所述目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值;
    基于所述区域尺寸损失值,确定位置损失值。
  8. 根据权利要求7所述的训练方法,所述实际位置信息还包括所述实际区域的预设点位置;所述预测位置信息还包括所述预测区域的正样本点与所述实际区域的预设点之间的预测偏移信息;
    所述利用每个目标的所述实际位置信息与所述预测位置信息,得到位置损失值,还包括:
    计算所述目标的实际区域的预设点位置与对应的所述正样本点位置之间的实际偏移信息;
    利用所述实际偏移信息和所述预测偏移信息,得到偏移损失值;
    所述基于所述区域尺寸损失值,确定位置损失值,包括:
    基于所述区域尺寸损失值和所述偏移损失值,确定位置损失值。
  9. 根据权利要求6所述的训练方法,在所述基于每个所述检测点与所述实际区域的预设点之间的距离,选择至少一个所述检测点作为所述目标的正样本点之后,还包括:
    将剩余的所述检测点作为负样本点;
    所述利用目标检测模型对所述样本图像进行目标检测,得到每个所述正样本点 对应的预测区域信息,包括:
    利用目标检测模型对所述样本图像进行目标检测,得到每个所述正样本点对应的预测区域信息和每个所述负样本点对应的预测区域信息;
    所述利用所述预测置信度,得到置信度损失值,包括:
    利用所述正样本点对应的预测置信度和所述负样本点对应的预测置信度,得到置信度损失值。
  10. 根据权利要求1所述的训练方法,所述样本图像为二维图像或三维图像,所述实际区域为实际边界框,所述预测区域为预测边界框。
  11. 一种目标检测方法,包括:
    获取待测图像;
    利用目标检测模型对所述待测图像进行目标检测,得到与所述待测图像中的目标对应的目标区域信息;
    其中,所述目标检测模型是通过权利要求1至10任一项所述的目标检测模型的训练方法得到的。
  12. 一种目标检测模型的训练装置,包括:
    图像获取模块,配置为获取样本图像,其中,所述样本图像标注有目标所在的实际区域的实际位置信息;
    样本选取模块,配置为以所述样本图像中的若干点为检测点,基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点;
    目标检测模块,配置为利用目标检测模型对所述样本图像进行目标检测,确定每个所述正样本点对应的预测区域信息;
    损失确定模块,配置为利用所述实际位置信息与所述预测区域信息,确定所述目标检测模型的损失值;
    参数调整模块,配置为基于所述目标检测模型的损失值,调整所述目标检测模型的参数。
  13. 根据权利要求12所述的目标检测模型的训练装置,所述样本图像中包含多个所述目标;所述样本选取模块包括:
    降采样子模块,配置为对所述样本图像进行降采样,得到对应不同分辨率的多个特征图;
    分组子模块,配置为基于所述目标的实际区域的尺寸,将所述多个目标的实际区域与所述多个特征图进行分组;其中,尺寸越大的所述实际区域与分辨率越小的所述特征图作为同一分组;
    选取子模块,配置为对于同一分组的特征图和所述目标的实际区域,将所述特 征图中的每个点确定为检测点;基于每个所述检测点与所述实际区域的预设点之间的距离,将至少一个所述检测点确定为所述目标的正样本点。
  14. 根据权利要求13所述的目标检测模型的训练装置,所述特征图为m个;分组子模块包括:
    区间划分部分,配置为计算每个所述目标的实际区域的面积,将所述面积的最大值和最小值之间的范围划分为从小到大排序的m个区间;
    分组划分部分,配置为将所述m个特征图按照分辨率从大到小排列,并将面积属于第i个区间的所述目标的实际区域与第i个特征图划分至同一分组;其中,i和m为正整数,且i为0至m之间的值。
  15. 根据权利要求12至14任一项所述的目标检测模型的训练装置,所述样本选取模块还包括:
    距离计算子模块,配置为获得每个所述检测点与所述实际区域的预设点之间的距离;
    距离判断子模块,配置为将与所述预设点之间的距离满足预设条件的至少一个所述检测点确定为所述目标的正样本点。
  16. 根据权利要求15所述的目标检测模型的训练装置,所述距离判断子模块还配置为将与所述预设点之间的距离最近的前若干个检测点确定为所述目标的正样本点。
  17. 根据权利要求12所述的目标检测模型的训练装置,预测区域信息包括所述正样本点对应的预测区域的预测位置信息和所述预测区域的预测置信度,所述损失确定模块,包括:
    位置损失值计算子模块,配置为利用每个目标的所述实际位置信息与所述预测位置信息,得到位置损失值;
    置信度损失值计算子模块,配置为利用所述预测置信度,得到置信度损失值;
    模型损失值计算子模块,配置为基于所述位置损失值和所述置信度损失值,确定所述目标检测模型的损失值。
  18. 根据权利要求17所述的目标检测模型的训练装置,所述实际位置信息包括所述实际区域的实际区域尺寸,所述预测位置信息包括所述预测区域的预测区域尺寸;所述位置损失值计算子模块,包括:
    区域尺寸损失值计算部分,配置为利用每个所述目标的实际区域尺寸和预测区域尺寸,得到区域尺寸损失值;
    位置损失值计算部分,配置为基于所述区域尺寸损失值,确定位置损失值。
  19. 根据权利要求18所述的目标检测模型的训练装置,所述实际位置信息还包括所述实际区域的预设点位置;所述预测位置信息还包括所述预测区域的正样本点 与所述实际区域的预设点之间的预测偏移信息;
    区域尺寸损失值计算部分,还配置为计算所述目标的实际区域的预设点位置与对应的所述正样本点位置之间的实际偏移信息;利用所述实际偏移信息和所述预测偏移信息,得到偏移损失值;
    位置损失值计算部分,还配置为基于所述区域尺寸损失值和所述偏移损失值,确定位置损失值。
  20. 根据权利要求19所述的目标检测模型的训练装置,样本选取模块还包括:
    负样本选取子模块,配置为将剩余的所述检测点作为负样本点;
    目标检测模块配置为利用目标检测模型对所述样本图像进行目标检测,得到每个所述正样本点对应的预测区域信息和每个所述负样本点对应的预测区域信息;
    置信度损失值计算子模块,配置为利用所述正样本点对应的预测置信度和所述负样本点对应的预测置信度,得到置信度损失值。
  21. 根据权利要求12所述的目标检测模型的训练装置,所述样本图像为二维图像或三维图像,所述实际区域为实际边界框,所述预测区域为预测边界框。
  22. 一种目标检测装置,包括:
    图像获取模块,配置为获取待测图像;
    目标检测模块,配置为利用目标检测模型对所述待测图像进行目标检测,得到与所述待测图像中的目标对应的目标区域信息;
    其中,所述目标检测模型是通过权利要求12所述的目标检测模型的训练装置得到的。
  23. 一种电子设备,包括相互耦接的存储器和处理器,所述处理器配置为执行所述存储器中存储的程序指令,以实现权利要求1至10任一项所述的目标检测模型的训练方法,或实现权利要求11所述的目标检测方法。
  24. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至10任一项所述的目标检测模型的训练方法,或实现权利要求11所述的目标检测方法。
  25. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行配置为实现权利要求1至10任一项所述的目标检测模型的训练方法,或者权利要求11所述的目标检测方法。
PCT/CN2020/100704 2020-03-11 2020-07-07 目标检测方法及其模型的训练方法、装置及电子设备 WO2021179498A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020217034041A KR20210141650A (ko) 2020-03-11 2020-07-07 타깃 검출 방법 및 타깃 검출 모델의 트레이닝 방법, 장치 및 전자 기기
JP2021563131A JP2022529838A (ja) 2020-03-11 2020-07-07 ターゲット検出方法及びそのモデルの訓練方法、装置並びに電子機器

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010167104.7A CN111508019A (zh) 2020-03-11 2020-03-11 目标检测方法及其模型的训练方法及相关装置、设备
CN202010167104.7 2020-03-11

Publications (1)

Publication Number Publication Date
WO2021179498A1 true WO2021179498A1 (zh) 2021-09-16

Family

ID=71863905

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100704 WO2021179498A1 (zh) 2020-03-11 2020-07-07 目标检测方法及其模型的训练方法、装置及电子设备

Country Status (5)

Country Link
JP (1) JP2022529838A (zh)
KR (1) KR20210141650A (zh)
CN (1) CN111508019A (zh)
TW (1) TW202135006A (zh)
WO (1) WO2021179498A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663731A (zh) * 2022-05-25 2022-06-24 杭州雄迈集成电路技术股份有限公司 车牌检测模型的训练方法及***、车牌检测方法及***
CN115205555A (zh) * 2022-07-12 2022-10-18 北京百度网讯科技有限公司 确定相似图像的方法、训练方法、信息确定方法及设备

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132206A (zh) * 2020-09-18 2020-12-25 青岛商汤科技有限公司 图像识别方法及相关模型的训练方法及相关装置、设备
CN112328715B (zh) * 2020-10-16 2022-06-03 浙江商汤科技开发有限公司 视觉定位方法及相关模型的训练方法及相关装置、设备
CN112232431A (zh) * 2020-10-23 2021-01-15 携程计算机技术(上海)有限公司 水印检测模型训练方法、水印检测方法、***、设备及介质
CN112348892A (zh) * 2020-10-29 2021-02-09 上海商汤智能科技有限公司 点定位方法及相关装置、设备
CN112669293A (zh) * 2020-12-31 2021-04-16 上海商汤智能科技有限公司 图像检测方法和检测模型的训练方法及相关装置、设备
CN113435260A (zh) * 2021-06-07 2021-09-24 上海商汤智能科技有限公司 图像检测方法和相关训练方法及相关装置、设备及介质
CN113256622A (zh) * 2021-06-28 2021-08-13 北京小白世纪网络科技有限公司 基于三维图像的目标检测方法、装置及电子设备
CN113642431B (zh) * 2021-07-29 2024-02-06 北京百度网讯科技有限公司 目标检测模型的训练方法及装置、电子设备和存储介质
CN113705672B (zh) * 2021-08-27 2024-03-26 国网浙江省电力有限公司双创中心 图像目标检测的阈值选取方法、***、装置及存储介质
US11967137B2 (en) * 2021-12-02 2024-04-23 International Business Machines Corporation Object detection considering tendency of object location
WO2024118670A1 (en) * 2022-11-29 2024-06-06 Merck Sharp & Dohme Llc 3d segmentation of lesions in ct images using self-supervised pretraining with augmentation
CN117557788B (zh) * 2024-01-12 2024-03-26 国研软件股份有限公司 一种基于运动预测的海上目标检测方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697460A (zh) * 2018-12-05 2019-04-30 华中科技大学 对象检测模型训练方法、目标对象检测方法
US20190294177A1 (en) * 2018-03-20 2019-09-26 Phantom AI, Inc. Data augmentation using computer simulated objects for autonomous control systems
CN110598764A (zh) * 2019-08-28 2019-12-20 杭州飞步科技有限公司 目标检测模型的训练方法、装置及电子设备
CN110599503A (zh) * 2019-06-18 2019-12-20 腾讯科技(深圳)有限公司 检测模型训练方法、装置、计算机设备和存储介质
CN110827253A (zh) * 2019-10-30 2020-02-21 北京达佳互联信息技术有限公司 一种目标检测模型的训练方法、装置及电子设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6431302B2 (ja) * 2014-06-30 2018-11-28 キヤノン株式会社 画像処理装置、画像処理方法及びプログラム
JP2017059207A (ja) * 2015-09-18 2017-03-23 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 画像認識方法
KR101879207B1 (ko) * 2016-11-22 2018-07-17 주식회사 루닛 약한 지도 학습 방식의 객체 인식 방법 및 장치
CN108304761A (zh) * 2017-09-25 2018-07-20 腾讯科技(深圳)有限公司 文本检测方法、装置、存储介质和计算机设备
CN108229307B (zh) * 2017-11-22 2022-01-04 北京市商汤科技开发有限公司 用于物体检测的方法、装置和设备
CN108710868B (zh) * 2018-06-05 2020-09-04 中国石油大学(华东) 一种基于复杂场景下的人体关键点检测***及方法
CN110084253A (zh) * 2019-05-05 2019-08-02 厦门美图之家科技有限公司 一种生成物体检测模型的方法
CN110298298B (zh) * 2019-06-26 2022-03-08 北京市商汤科技开发有限公司 目标检测及目标检测网络的训练方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190294177A1 (en) * 2018-03-20 2019-09-26 Phantom AI, Inc. Data augmentation using computer simulated objects for autonomous control systems
CN109697460A (zh) * 2018-12-05 2019-04-30 华中科技大学 对象检测模型训练方法、目标对象检测方法
CN110599503A (zh) * 2019-06-18 2019-12-20 腾讯科技(深圳)有限公司 检测模型训练方法、装置、计算机设备和存储介质
CN110598764A (zh) * 2019-08-28 2019-12-20 杭州飞步科技有限公司 目标检测模型的训练方法、装置及电子设备
CN110827253A (zh) * 2019-10-30 2020-02-21 北京达佳互联信息技术有限公司 一种目标检测模型的训练方法、装置及电子设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663731A (zh) * 2022-05-25 2022-06-24 杭州雄迈集成电路技术股份有限公司 车牌检测模型的训练方法及***、车牌检测方法及***
CN115205555A (zh) * 2022-07-12 2022-10-18 北京百度网讯科技有限公司 确定相似图像的方法、训练方法、信息确定方法及设备
CN115205555B (zh) * 2022-07-12 2023-05-26 北京百度网讯科技有限公司 确定相似图像的方法、训练方法、信息确定方法及设备

Also Published As

Publication number Publication date
CN111508019A (zh) 2020-08-07
KR20210141650A (ko) 2021-11-23
TW202135006A (zh) 2021-09-16
JP2022529838A (ja) 2022-06-24

Similar Documents

Publication Publication Date Title
WO2021179498A1 (zh) 目标检测方法及其模型的训练方法、装置及电子设备
US11049014B2 (en) Learning apparatus, detecting apparatus, learning method, and detecting method
WO2020215672A1 (zh) 医学图像病灶检测定位方法、装置、设备及存储介质
WO2021128825A1 (zh) 三维目标检测及模型的训练方法及装置、设备、存储介质
WO2021000423A1 (zh) 一种生猪体重测量方法及装置
US20180025249A1 (en) Object Detection System and Object Detection Method
US9330336B2 (en) Systems, methods, and media for on-line boosting of a classifier
CN110738235B (zh) 肺结核判定方法、装置、计算机设备及存储介质
WO2023155494A1 (zh) 图像检测及训练方法、相关装置、设备、介质和程序产品
CN112614133B (zh) 一种无锚点框的三维肺结节检测模型训练方法及装置
WO2023138190A1 (zh) 目标检测模型的训练方法及对应的检测方法
CN110610472A (zh) 实现肺结节图像分类检测的计算机装置及方法
KR20200062589A (ko) 뇌 mri 영상의 뇌 영역별 분할을 통한 치매 예측 장치 및 방법
WO2022257314A1 (zh) 图像检测方法和相关训练方法及相关装置、设备及介质
CN109448854A (zh) 一种肺结核检测模型的构建方法及应用
US20160171717A1 (en) State estimation apparatus, state estimation method, integrated circuit, and non-transitory computer-readable storage medium
CN110533120B (zh) 器官结节的图像分类方法、装置、终端及存储介质
WO2023092959A1 (zh) 图像分割方法及其模型的训练方法及相关装置、电子设备
CN113240699B (zh) 图像处理方法及装置,模型的训练方法及装置,电子设备
CN112488178B (zh) 网络模型的训练方法及装置、图像处理方法及装置、设备
JP7484492B2 (ja) レーダーに基づく姿勢認識装置、方法及び電子機器
CN113192085A (zh) 三维器官图像分割方法、装置及计算机设备
JP7239002B2 (ja) 物体数推定装置、制御方法、及びプログラム
CN116912258B (zh) 一种肺部ct图像病灶参数自效估计方法
WO2023226793A1 (zh) 二尖瓣开口间距检测方法、电子设备和存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20217034041

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021563131

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924072

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924072

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 28.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20924072

Country of ref document: EP

Kind code of ref document: A1