WO2023035558A1 - 一种基于锚点切图的图像处理方法、装置、设备和介质 - Google Patents

一种基于锚点切图的图像处理方法、装置、设备和介质 Download PDF

Info

Publication number
WO2023035558A1
WO2023035558A1 PCT/CN2022/078357 CN2022078357W WO2023035558A1 WO 2023035558 A1 WO2023035558 A1 WO 2023035558A1 CN 2022078357 W CN2022078357 W CN 2022078357W WO 2023035558 A1 WO2023035558 A1 WO 2023035558A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
anchor point
sliding window
information
target
Prior art date
Application number
PCT/CN2022/078357
Other languages
English (en)
French (fr)
Inventor
李晓川
李仁刚
赵雅倩
张润泽
范宝余
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023035558A1 publication Critical patent/WO2023035558A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present application relates to the technical field of computer vision, and in particular relates to an image processing method, device, device and computer-readable storage medium based on anchor point cutting.
  • Object detection is one of the most important research directions in the field of computer vision. Due to its strong practicability and landing prospects, a large number of researchers have joined the research community for target algorithm optimization. The target detection algorithm is gradually becoming mature, and various algorithms and models are emerging. At this stage, due to the popularity of deep learning and the general improvement of computing power, cutting-edge target detection algorithms such as Yolov5 and the target detection network (Center Net) model have achieved good performance in terms of efficiency and accuracy. Among them, Yolo v5 is a fast and compact open source object detection model. Compared with other networks, it has stronger performance under the same size information and has very good stability. It can predict the end-to-end neural network of object categories and bounding boxes. network.
  • the training of the model depends on the dataset.
  • the COCO database is a large image dataset designed for object detection, segmentation, human keypoint detection, semantic segmentation, and caption generation.
  • the image pixels collected by the camera are much larger than the standard size information of the current public data set.
  • the single-sided pixels of most images of COCO are within 1000, while the real scene is collected by cameras, mobile phones and other shooting tools.
  • Most of the pixels on a single side of the image are more than 3,000, and there are even super-large pixel images such as the PANDA dataset, with pixels on a single side between 15,000 and 32,000.
  • the purpose of the embodiments of the present application is to provide an image processing method, device, device and computer-readable storage medium based on anchor point cut images, which can realize the detection and analysis of very large images.
  • an embodiment of the present application provides an image processing method based on anchor point cutting, including:
  • the anchor point information of the image is obtained; wherein the anchor point information includes the center point information and the size information of the anchor point identification frame;
  • the initial depth detection network model is trained by using the cut image to obtain a depth detection network model for detecting the position of the target object in the image.
  • using the cut image to train the initial depth detection network model to obtain the depth detection network model for detecting the position of the target in the image also includes:
  • each slice image obtained by segmentation is scaled
  • the scaled images of each slice are analyzed by using the deep detection network model to obtain a detection frame set including the target object.
  • a soft-nms algorithm is used to delete redundant detection frames from the detection frame set.
  • determining the sliding window size information according to the set sliding window maximum value, sliding window minimum value, size multiple and size information of the anchor point identification frame includes:
  • mx represents the maximum value of the sliding window
  • mn represents the minimum value of the sliding window
  • represents the size multiple
  • w represents the size information of the anchor point identification box.
  • the method further includes:
  • the location of the target includes the horizontal and vertical coordinates of the center point of the target, the width and height values of the target slide box corresponding to the target, the probability that the target contained in the target slide box belongs to the set item, and the The category to which the target contained in the target slider belongs.
  • the embodiment of the present application also provides an image processing device based on anchor point cutting, including an acquisition unit, a determination unit, an interception unit, and a training unit;
  • the acquiring unit is configured to acquire anchor point information of the image based on the annotation information of the image; wherein the anchor point information includes center point information and size information of an anchor point identification frame;
  • the determination unit is configured to determine the sliding window size information according to the set sliding window maximum value, sliding window minimum value, size multiple and the size information of the anchor point identification frame;
  • the intercepting unit is configured to intercept a cut image containing a target object from the image based on the sliding window size information and the center point information;
  • the training unit is configured to use the cut image to train the initial depth detection network model to obtain a depth detection network model for detecting the position of the target in the image.
  • a segmentation unit e.g., a scaling unit and an analysis unit are also included;
  • the segmentation unit is configured to segment the new image according to various set size information when the new image is acquired;
  • the scaling unit is configured to scale each slice image obtained by segmentation according to the size information required by the depth detection network model
  • the analysis unit is configured to use the deep detection network model to analyze the scaled slice images to obtain a detection frame set including the target object.
  • delete units are also included.
  • the deletion unit is configured to use a soft-nms algorithm to delete redundant detection frames from the detection frame set.
  • the determining unit is configured to calculate the sliding window size information S according to the following formula,
  • mx represents the maximum value of the sliding window
  • mn represents the minimum value of the sliding window
  • represents the size multiple
  • w represents the size information of the anchor point identification box.
  • a calculation unit and a function unit are further included;
  • the calculation unit is used to calculate the intersection ratio of the sliding window corresponding to the sliding window size information and the target frame corresponding to the anchor point information;
  • the working unit is configured to use the cut image whose intersection ratio is greater than or equal to a preset threshold as a training sample, and use the training sample to train the initial depth detection network model.
  • an erasing unit is also included;
  • the erasing unit is used for erasing the cut images whose intersection ratio is smaller than the preset threshold.
  • the location of the target includes the horizontal and vertical coordinates of the center point of the target, the width and height values of the target slide box corresponding to the target, the probability that the target contained in the target slide box belongs to the set item, and the The category to which the target contained in the target slider belongs.
  • the embodiment of the present application also provides an image processing device based on anchor point cutting, including:
  • a processor is used to execute the computer program to realize the steps of the image processing method based on anchor point cutting as described in any one of the above.
  • the embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the anchor point-based graph cutting as described in any one of the above is implemented.
  • the steps of the image processing method are described in any one of the above.
  • the anchor point information of the image is obtained; wherein, the anchor point information may include the center point information and the size information of the anchor point identification frame. Labeling information can be used to indicate the position of the target in the image, but the position is not accurate, so the anchor point information of the image needs to be obtained.
  • the sliding window size information for intercepting the target can be determined more accurately; based on the sliding window size information and the center point Information, you can intercept the cut image containing the target object from the image, use the cut image to train the initial depth detection network model, and obtain the depth detection network model for detecting the position of the target object in the image.
  • the size information of the sliding window for intercepting the target object can be determined, so that a cut image containing the target object can be intercepted from the image.
  • the image can be intercepted into a cut image with a small pixel value by cutting the image, so that the initial depth detection network model can realize the detection and analysis of the cut image, and by changing the form of the image, it can achieve the detection and analysis of the super large image. Purpose of detection analysis.
  • FIG. 1 is a flow chart of an image processing method based on anchor point cutting provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of cutting a cut-out image from an image provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an image processing device based on anchor point cutting provided in an embodiment of the present application
  • Fig. 4 is a structural diagram of an image processing device based on anchor point cutting provided by another embodiment of the present application.
  • FIG. 1 is the flow chart of a kind of image processing method based on anchor point cutting figure that the embodiment of the present application provides, and this method comprises:
  • S101 Obtain anchor point information of the image based on the annotation information of the image.
  • the anchor point information may include center point information and size information of the anchor point identification frame.
  • the annotation information of the image may be the position of the area where the target object is located in the image. According to the position of the area where the target object is located, the area where the target object is located can be roughly determined, and the center point information of the area where the target object is located and the size information of the frame of the area can be used as the anchor point information of the image.
  • the size information of the anchor point identification frame may include a width value and a height value of the anchor point identification frame.
  • the target object may be a pedestrian, a building, an animal, etc., and the specific form of the target object is not limited.
  • S102 Determine the sliding window size information according to the set sliding window maximum value, sliding window minimum value, size multiple, and size information of the anchor point identification frame.
  • each target is processed in the same way, so in the subsequent introductions, the processing of one target in the image is taken as an example for the introduction.
  • the size information of the sliding window may include a width value and a height value of the sliding window.
  • the maximum value of the sliding window, the minimum value of the sliding window and the size multiple can be set according to the type of target to be detected.
  • the sliding window size information S can be calculated according to the following formula,
  • mx represents the maximum value of the sliding window
  • mn represents the minimum value of the sliding window
  • represents the size multiple
  • w represents the size information of the anchor point identification box.
  • the calculated S is the width value of the sliding window; when w is the height value of the anchor point recognition frame, the calculated S is the height value of the sliding window.
  • S103 Based on the size information of the sliding window and the center point information, intercept a cut image including the target object from the image.
  • the sliding window size information corresponding to each object in the image can be determined.
  • the center point information can be used to identify the center point of the area where the target object is located. After the center point corresponding to the target object and the size information of the sliding window are determined, the cut image containing the target object can be intercepted from the image.
  • FIG. 2 is a schematic diagram of cutting a cutout from an image provided by the embodiment of the present application.
  • the black oval represents the target object
  • the dotted line box represents the cutout frame to be cut.
  • the corresponding sliding window size information is calculated for each target object, so as to ensure that the intercepted image can contain the complete target object as much as possible.
  • S104 Use the image cut to train the initial depth detection network model to obtain a depth detection network model for detecting the location of the target in the image.
  • each cutout basically contains a complete target object, and the initial deep detection network model is trained by using the cutout, so that the trained deep detection network model has a higher detection accuracy.
  • the location of the target is the output information of the deep detection network model.
  • the type of output information of the deep detection network model can be preset.
  • the position of the target object may include the horizontal and vertical coordinates of the center point of the target object, and the width and height values of the target sliding frame corresponding to the target object.
  • the location of the object may also include the probability that the object contained in the target sliding frame belongs to the set item and the category to which the target object contained in the target sliding frame belongs.
  • the obtained cut image can be input into the Input as a training sample for normalization processing and converted into a matrix, and the processed matrix is input into the initial deep network model to extract high-dimensional features, and then the features are passed through the volume
  • the machine is mapped to the spatial feature, and the area on the spatial feature represents the feature vector of the corresponding area of the original image.
  • each sliding window can correspond to an output result, which can contain 6 parameters: the abscissa and ordinate of the center point of the target, and the corresponding The width and height values of the target slider, the probability that the target contained in the target slider belongs to the set item, and the category of the target contained in the target slider.
  • the gradient is calculated and backpropagated, and the weight parameters in the network model are optimized to make the gradient decrease and the network converge.
  • the above model training process belongs to the existing relatively mature model training process, and will not be introduced here.
  • the model can recognize multiple types of targets.
  • the probability that the target contained in the target slide box belongs to the set item can be the probability that the target belongs to each target type that needs to be identified. probability.
  • the object type with the highest probability is the category to which the object belongs.
  • the set object may be an object to be identified, such as a pedestrian, an animal, a building, and the like. Assuming that the target contained in a target sliding frame in the image has a probability of 90% belonging to pedestrians, a probability of belonging to animals of 10%, and a probability of belonging to buildings of 0%, then the category of the target contained in the target sliding frame is pedestrians.
  • the anchor point information of the image is obtained; wherein, the anchor point information may include the center point information and the size information of the anchor point identification frame. Labeling information can be used to indicate the position of the target in the image, but the position is not accurate, so the anchor point information of the image needs to be obtained.
  • the sliding window size information for intercepting the target can be determined more accurately; based on the sliding window size information and the center point Information, you can intercept the cut image containing the target object from the image, use the cut image to train the initial depth detection network model, and obtain the depth detection network model for detecting the position of the target object in the image.
  • the size information of the sliding window for intercepting the target object can be determined, so that a cut image containing the target object can be intercepted from the image.
  • the image can be intercepted into a cut image with a small pixel value by cutting the image, so that the initial depth detection network model can realize the detection and analysis of the cut image, and by changing the form of the image, it can achieve the detection and analysis of the super large image. Purpose of detection analysis.
  • the new image after the training of the depth detection network model is completed, the new image can be processed by the depth detection network model. Before the new image is input to the depth detection network model, the new image needs to be segmented and Scaling, so that the scaled image can meet the size requirements of the depth detection network model for the input image.
  • the new image when a new image is acquired, the new image may be segmented according to various set size information.
  • the new image can be segmented according to various sizes.
  • the new image can be segmented according to three sizes of 1*1, 2*2 and 3*5.
  • the depth detection network model generally has size requirements for the input image. Therefore, after the new image is segmented according to the set multiple size information, each slice obtained by segmentation can be divided according to the size information required by the depth detection network model. The image is scaled. The scaled images of each slice are analyzed by using the deep detection network model, and a set of detection frames containing the target is obtained.
  • the soft-nms algorithm may be used to delete redundant detection frames from the detection frame set.
  • the soft-nms algorithm can be used to remove duplicate boxes and reduce false detections.
  • a variety of size information is used to segment the new image, so as to ensure that some of the segmented images can contain complete objects, so that the detection network model can more accurately analyze the objects that contain objects.
  • the redundancy is further removed from the detection frame set, so that the final detection frame set is more streamlined.
  • the quality of the training samples is an important factor affecting the training results of the deep detection network model. Therefore, in the embodiment of this application, in order to improve the detection accuracy of the deep detection network model, based on the sliding window size information and the center point information, from the image After intercepting the cut image containing the target object, calculate the intersection and union ratio of the sliding window corresponding to the sliding window size information and the target frame corresponding to the anchor point information; take the cut image with the intersection ratio greater than or equal to the preset threshold as a training sample, use The training samples train the initial deep detection network model.
  • the training samples include positive samples and negative samples, the positive samples may be images containing the target object, and the negative samples may be images not containing the target object.
  • the image cut with the intersection ratio greater than or equal to the preset threshold is taken as a positive sample of the training sample, and the cut image with the intersection ratio smaller than the preset threshold is used as the negative sample of the training sample.
  • the cut image with the intersection ratio smaller than the preset threshold may contain incomplete objects, which are not suitable as negative samples.
  • the cut image with the intersection ratio smaller than the preset threshold can be erased. . In practical applications, the effect of erasing the cut image can be achieved by filling the gray scale.
  • the cut image whose intersection ratio is greater than or equal to the preset threshold can be taken as a positive sample.
  • FIG. 3 is a schematic structural diagram of an image processing device based on anchor point cutting provided in an embodiment of the present application, including an acquisition unit 31, a determination unit 32, an interception unit 33, and a training unit 34;
  • the determination unit 32 is configured to determine the sliding window size information according to the set sliding window maximum value, sliding window minimum value, size multiple and size information of the anchor point identification frame;
  • An intercepting unit 33 configured to intercept a cut image containing the target object from the image based on the sliding window size information and the center point information;
  • the training unit 34 is configured to train the initial depth detection network model by using the cut image to obtain a depth detection network model for detecting the position of the target in the image.
  • a segmentation unit e.g., a scaling unit and an analysis unit are also included;
  • a scaling unit configured to scale each slice image obtained by segmentation according to the size information required by the depth detection network model
  • the analysis unit is configured to use the deep detection network model to analyze the scaled images of each slice to obtain a set of detection frames including the target object.
  • delete units are also included.
  • the deletion unit is used to delete redundant detection frames from the detection frame set by using the soft-nms algorithm.
  • the determining unit is configured to calculate the sliding window size information S according to the following formula,
  • mx represents the maximum value of the sliding window
  • mn represents the minimum value of the sliding window
  • represents the size multiple
  • w represents the size information of the anchor point identification box.
  • a calculation unit and a function unit are also included;
  • a calculation unit configured to calculate the intersection ratio of the sliding window corresponding to the sliding window size information and the target frame corresponding to the anchor point information
  • an erasing unit is also included;
  • the location of the target includes the horizontal and vertical coordinates of the center point of the target, the width and height values of the target slider corresponding to the target, the probability that the target contained in the target slider belongs to the set item, and the target contained in the target slider category to which the object belongs.
  • the anchor point information of the image is obtained; wherein, the anchor point information may include the center point information and the size information of the anchor point identification frame. Labeling information can be used to indicate the position of the target in the image, but the position is not accurate, so the anchor point information of the image needs to be obtained.
  • the sliding window size information for intercepting the target can be determined more accurately; based on the sliding window size information and the center point Information, you can intercept the cut image containing the target object from the image, use the cut image to train the initial depth detection network model, and obtain the depth detection network model for detecting the position of the target object in the image.
  • the size information of the sliding window for intercepting the target object can be determined, so that a cut image containing the target object can be intercepted from the image.
  • the image can be intercepted into a cut image with a small pixel value by cutting the image, so that the initial depth detection network model can realize the detection and analysis of the cut image, and by changing the form of the image, it can achieve the detection and analysis of the super large image. Purpose of detection analysis.
  • FIG. 4 is a structural diagram of an image processing device based on anchor point cutting provided in another embodiment of the present application.
  • the image processing device based on anchor point cutting includes: a memory 20 for storing computer program;
  • the processor 21 is configured to implement the steps of the image processing method based on anchor point cutting in the above embodiment when executing the computer program.
  • the image processing device based on anchor point cutting may include, but is not limited to, a smart phone, a tablet computer, a notebook computer or a desktop computer, and the like.
  • the processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • Processor 21 can adopt at least one hardware form in DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • Processor 21 may also include a main processor and a coprocessor, the main processor is a processor for processing data in a wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • CPU Central Processing Unit
  • the coprocessor Low-power processor for processing data in standby state.
  • the processor 21 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content required to be displayed on the display screen.
  • the processor 21 may also include an AI (Artificial Intelligence, artificial intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 20 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 20 may also include high-speed random access memory, and non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices.
  • the memory 20 is at least used to store the following computer program 201, wherein, after the computer program is loaded and executed by the processor 21, it can realize the correlation of the image processing method based on anchor point cutting disclosed in any of the above embodiments. step.
  • the resources stored in the memory 20 may also include an operating system 202 and data 203, etc., and the storage method may be temporary storage or permanent storage.
  • the operating system 202 may include Windows, Unix, Linux and so on.
  • the data 203 may include but not limited to label information of the image, the maximum value of the sliding window, the minimum value of the sliding window, a size multiple, and the like.
  • the image processing device based on anchor point cutting can further include a display screen 22 , an input/output interface 23 , a communication interface 24 , a power supply 25 and a communication bus 26 .
  • FIG. 4 does not constitute a limitation on the image processing device based on image cutting based on anchor points, and may include more or less components than those shown in the illustration.
  • the image processing method based on anchor point cutting in the above embodiments is implemented in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , executing all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM, registers, hard disk, programmable Various media that can store program codes such as removable disks, CD-ROMs, magnetic disks, or optical disks.
  • an embodiment of the present invention also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by a processor, any one of the above-mentioned image processing based on anchor point cutting can be realized. method steps.
  • each functional module of the computer-readable storage medium in the embodiments of the present invention can be specifically implemented according to the methods in the above-mentioned method embodiments, and the specific implementation process can refer to the relevant descriptions of the above-mentioned method embodiments, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种基于锚点切图的图像处理方法、装置、设备和介质,基于图像的标注信息,获取图像的锚点信息;锚点信息可以包括中心点信息和锚点识别框的尺寸信息。按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,可以较为准确的确定出滑窗尺寸信息;基于滑窗尺寸信息以及中心点信息,可以从图像中截取包含目标物的切图,利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。对于超大图像而言,通过切图的方式可以将图像截取为像素值较小的切图,从而使得初始深度检测网络模型可以实现对切图的检测分析,达到对超大图像检测分析的目的。

Description

一种基于锚点切图的图像处理方法、装置、设备和介质
本申请要求在2021年9月10日提交中国专利局、申请号为2020111063785.3、发明名称为“一种基于锚点切图的图像处理方法、装置、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,特别是涉及一种基于锚点切图的图像处理方法、装置、设备和计算机可读存储介质。
背景技术
目标检测是计算机视觉领域最重要的研究方向之一。由于其较强的实用性和落地前景,大量科研人员跻身目标算法优化的科研社区之列。目标检测算法逐渐趋于成熟,各种算法和模型也不断涌现出来。现阶段,由于深度学习的普及和算力的普遍提高,前沿目标检测算法如Yolo v5和目标检测网络(Center Net)模型等均在效率和准确性上取得了较好的表现。其中,Yolo v5是一种快速紧凑的开源对象检测模型,与其它网络相比,同等尺寸信息下性能更强,并且具有很不错的稳定性,可以预测对象的类别和边界框的端对端神经网络。
模型的训练依赖于数据集。COCO数据库是一个大型图像数据集,专为对象检测、分割、人体关键点检测、语义分割和字幕生成而设计。但是由于现实场景中,摄像机收集到的图像像素要远大于目前公开数据集的标准尺寸信息,如COCO大多数图像的单边像素均在1000以内,而现实场景总摄像头、手机等拍摄工具收集到图像单边像素大多在3000以上,更有超大像素图像如PANDA数据集,单边像素在15000到32000之间。
因此,现有的检测算法无法解决超大像素图像的检测问题。一方面,现有算法的算力无法满足将整张图像输入到算法模型中进行推理; 另一方面,对图像进行单纯的缩放会丢失大量的细节特征从而导致小目标的严重漏检。如何在现有检测模型基础上,设计适用于自然场景超大像素图像或视频的检测流程,是提升现有算法落地性的关键,也是现有检测算法在超大像素图像上的瓶颈所在。
可见,如何实现对超大像素图像的检测分析,是本领域技术人员需要解决的问题。
发明内容
本申请实施例的目的是提供一种基于锚点切图的图像处理方法、装置、设备和计算机可读存储介质,可以实现对超大图像的检测分析。
为解决上述技术问题,本申请实施例提供一种基于锚点切图的图像处理方法,包括:
基于图像的标注信息,获取所述图像的锚点信息;其中,所述锚点信息包括中心点信息和锚点识别框的尺寸信息;
按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息;
基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图;
利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
可选地,在所述利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型还包括:
在获取到新图像的情况下,按照设定的多种尺寸信息对所述新图像进行切分;
按照所述深度检测网络模型要求的尺寸信息,对切分得到的各切片图像进行缩放;
利用所述深度检测网络模型对缩放后的各切片图像进行分析,得到包含目标物的检测框集合。
可选地,还包括:
利用soft-nms算法,从所述检测框集合中删除冗余检测框。
可选地,所述按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息包括:
按照如下公式,计算所述滑窗尺寸信息S,
S=max(mn,min(mx,λw))
其中,mx表示滑窗最大值,mn表示滑窗最小值,λ表示尺寸倍数,w表示锚点识别框的尺寸信息。
可选地,在所述基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图之后还包括:
计算所述滑窗尺寸信息对应的滑窗与所述锚点信息对应的目标框的交并比;
将所述交并比大于或等于预设阈值的切图作为训练样本,利用所述训练样本对初始深度检测网络模型进行训练。
可选地,还包括:
抹除所述交并比小于所述预设阈值的切图。
可选地,所述目标物所在位置包括目标物的中心点横纵坐标、目标物对应的目标滑框的宽高值、所述目标滑框包含的目标物属于设定物品的概率以及所述目标滑框包含的目标物所属的类别。
本申请实施例还提供了一种基于锚点切图的图像处理装置,包括获取单元、确定单元、截取单元和训练单元;
所述获取单元,用于基于图像的标注信息,获取所述图像的锚点信息;其中,所述锚点信息包括中心点信息和锚点识别框的尺寸信息;
所述确定单元,用于按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息;
所述截取单元,用于基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图;
所述训练单元,用于利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
可选地,还包括切分单元、缩放单元和分析单元;
所述切分单元,用于在获取到新图像的情况下,按照设定的多种尺寸信息对所述新图像进行切分;
所述缩放单元,用于按照所述深度检测网络模型要求的尺寸信息,对切分得到的各切片图像进行缩放;
所述分析单元,用于利用所述深度检测网络模型对缩放后的各切片图像进行分析,得到包含目标物的检测框集合。
可选地,还包括删除单元;
所述删除单元,用于利用soft-nms算法,从所述检测框集合中删除冗余检测框。
可选地,所述确定单元用于按照如下公式,计算所述滑窗尺寸信息S,
S=max(mn,min(mx,λw))
其中,mx表示滑窗最大值,mn表示滑窗最小值,λ表示尺寸倍数,w表示锚点识别框的尺寸信息。
可选地,在所述基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图之后还包括计算单元和作为单元;
所述计算单元,用于计算所述滑窗尺寸信息对应的滑窗与所述锚点信息对应的目标框的交并比;
所述作为单元,用于将所述交并比大于或等于预设阈值的切图作为训练样本,利用所述训练样本对初始深度检测网络模型进行训练。
可选地,还包括抹除单元;
所述抹除单元,用于抹除所述交并比小于所述预设阈值的切图。
可选地,所述目标物所在位置包括目标物的中心点横纵坐标、目标物对应的目标滑框的宽高值、所述目标滑框包含的目标物属于设定物品的概率以及所述目标滑框包含的目标物所属的类别。
本申请实施例还提供了一种基于锚点切图的图像处理设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序以实现如上述任意一项所述基 于锚点切图的图像处理方法的步骤。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上述任意一项所述基于锚点切图的图像处理方法的步骤。
由上述技术方案可以看出,基于图像的标注信息,获取图像的锚点信息;其中,锚点信息可以包括中心点信息和锚点识别框的尺寸信息。标注信息可以用于表示图像中目标物的位置,但是该位置并不精确,因此需要获取图像的锚点信息。按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,可以较为准确的确定出用于截取目标物的滑窗尺寸信息;基于滑窗尺寸信息以及中心点信息,可以从图像中截取包含目标物的切图,利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。在该技术方案中,依赖于图像中目标物所在的位置,可以确定出截取目标物的滑窗尺寸信息,从而可以从图像中截取包含目标物的切图。对于超大图像而言,通过切图的方式可以将图像截取为像素值较小的切图,从而使得初始深度检测网络模型可以实现对切图的检测分析,通过改变图像的形式,达到对超大图像检测分析的目的。
附图说明
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种基于锚点切图的图像处理方法的流程图;
图2为本申请实施例提供的一种从图像中截取切图的示意图;
图3为本申请实施例提供的一种基于锚点切图的图像处理装置的结构示意图;
图4为本申请另一实施例提供的一种基于锚点切图的图像处理设 备的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本申请保护范围。
本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。
接下来,详细介绍本申请实施例所提供的一种基于锚点切图的图像处理方法。图1为本申请实施例提供的一种基于锚点切图的图像处理方法的流程图,该方法包括:
S101:基于图像的标注信息,获取图像的锚点信息。
其中,锚点信息可以包括中心点信息和锚点识别框的尺寸信息。
图像的标注信息可以是图像中目标物所在区域的位置。根据目标物所在区域的位置,可以粗略的确定出目标物所在的区域,可以将目标物所在区域的中心点信息以及该区域边框的尺寸信息作为图像的锚点信息。锚点识别框的尺寸信息可以包括锚点识别框的宽度值和高度值。
目标物可以是行人,也可以是建筑物、动物等,对于目标物的具体形式不做限定。
S102:按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,确定出滑窗尺寸信息。
一张图像中包含的目标物个数可能为多个,每个目标物的处理方 式相同,因此在后续介绍中均以图像中一个目标物的处理为例展开的介绍。
在本申请实施例中,为了实现对超大像素图像进行合理的切图处理,保证每张切图中尽可能包含完整的目标物,因此需要确定出目标物所在的区域对应的合适的滑窗尺寸信息。
其中,滑窗尺寸信息可以包括滑窗的宽度值和高度值。
在实际应用中,可以根据所需检测的目标物的类型,设定好滑窗最大值、滑窗最小值和尺寸倍数。
在具体实现中,可以按照如下公式,计算滑窗尺寸信息S,
S=max(mn,min(mx,λw))
其中,mx表示滑窗最大值,mn表示滑窗最小值,λ表示尺寸倍数,w表示锚点识别框的尺寸信息。
w为锚点识别框的宽度值时,计算出的S为滑窗的宽度值;w为锚点识别框的高度值时,计算出的S为滑窗的高度值。
S103:基于滑窗尺寸信息以及中心点信息,从图像中截取包含目标物的切图。
基于上述S101和S102的操作可以确定出图像中每个目标物所对应的滑窗尺寸信息。中心点信息可以用于标识目标物所在区域的中心点,在确定出目标物对应的中心点以及滑窗尺寸信息之后,便可以从图像中截取出包含该目标物的切图。
如图2所示为本申请实施例提供的一种从图像中截取切图的示意图,图2中黑色椭圆表示目标物,虚线框表示所需截取的切图边框。
在本申请实施例中,针对于每个目标物计算其对应的滑窗尺寸信息,从而保证截取得到的切图可以尽可能的包含完整的目标物。
S104:利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
从图像中截取切图,使得超大像素图像可以截取为多个小像素图像,初始深度检测网络模型的处理能力可以实现对切图的分析处理。
并且各切图中基本都包含了完整的目标物,利用切图对初始深度 检测网络模型进行训练,使得训练后的深度检测网络模型具有较高的检测准确率。
在本申请实施例中,目标物所在位置为深度检测网络模型的输出信息,为了详细的了解目标物的信息,可以预先设置好深度检测网络模型输出信息的类型。目标物所在位置可以包括目标物的中心点横纵坐标、目标物对应的目标滑框的宽高值。并且为了实现对不同类型的目标物的识别,目标物所在位置还可以包括目标滑框包含的目标物属于设定物品的概率以及目标滑框包含的目标物所属的类别。
在具体实现中,可以将获得的切图作为训练样本输入到Input中进行归一化处理并转化成矩阵,将处理后的矩阵输入到初始深度网络模型中提取高维特征,再将特征经过卷机映射为空间特征,在空间特征上的区域则代表原图对应区域的特征向量。训练过程中,在空间特征上按照规则产出若干个框,每个滑窗可以对应一个输出结果,该输出结果可以包含6个参数:目标物的中心点横坐标和纵坐标、目标物对应的目标滑框的宽度值和高度值、目标滑框包含的目标物属于设定物品的概率以及目标滑框包含的目标物所属的类别。通过对这6个参数与输入标注信息进行最大似然拟合,计算梯度并反向传播,优化网络模型中的权重参数使梯度下降,令网络收敛。上述模型训练过程属于现有较为成熟的模型训练流程,在此不再展开介绍。
在实际应用中,模型能够识别的目标物类型可以有多种,模型的输出结果中目标滑框包含的目标物属于设定物品的概率,可以是目标物属于所需识别的各目标物类型的概率。概率最高的目标物类型即为目标物所属的类别。
举例说明,设定的物品可以是所需识别的目标物,例如,行人、动物、建筑物等。假设图像中一个目标滑框包含的目标物属于行人的概率为90%,属于动物的概率为10%,属于建筑物的概率为0%,则目标滑框包含的目标物所属的类别为行人。
由上述技术方案可以看出,基于图像的标注信息,获取图像的锚点信息;其中,锚点信息可以包括中心点信息和锚点识别框的尺寸信 息。标注信息可以用于表示图像中目标物的位置,但是该位置并不精确,因此需要获取图像的锚点信息。按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,可以较为准确的确定出用于截取目标物的滑窗尺寸信息;基于滑窗尺寸信息以及中心点信息,可以从图像中截取包含目标物的切图,利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。在该技术方案中,依赖于图像中目标物所在的位置,可以确定出截取目标物的滑窗尺寸信息,从而可以从图像中截取包含目标物的切图。对于超大图像而言,通过切图的方式可以将图像截取为像素值较小的切图,从而使得初始深度检测网络模型可以实现对切图的检测分析,通过改变图像的形式,达到对超大图像检测分析的目的。
在本申请实施例中,在完成对深度检测网络模型的训练之后,可以利用深度检测网络模型对新图像进行处理,在将新图像输入至深度检测网络模型之前,需要对新图像进行切分和缩放,从而使得缩放后的图像可以符合深度检测网络模型对输入图像的尺寸要求。
在具体实现中,可以在获取到新图像的情况下,按照设定的多种尺寸信息对新图像进行切分。
由于***无法提前获知新图像中包含目标物的标注信息,因此为了保证切分的图像能够尽可能包含完整的目标物,可以按照多种不同的尺寸对新图像进行切分。
例如,在实际应用中,可以按照1*1,2*2和3*5这三种尺寸对新图像进行切分。
深度检测网络模型对输入的图片一般会有尺寸要求,因此在按照设定的多种尺寸信息对新图像进行切分之后,可以按照深度检测网络模型要求的尺寸信息,对切分得到的各切片图像进行缩放。利用深度检测网络模型对缩放后的各切片图像进行分析,得到包含目标物的检测框集合。
考虑到实际应用中,由于按照不同的尺寸信息对新图像进行了切 分,可能会出现同一个目标物在多张图像中出现的情况,导致得到的检测框集合出现冗余检测框,因此,在本申请实施例中可以利用soft-nms算法,从检测框集合中删除冗余检测框。
soft-nms算法可以用来去除重复框,降低误检。其实现原理可以参见现有技术,在此不做赘述。
在本申请实施例中,采用多种尺寸信息对新图像进行切分,从而保证切分得到的某些图像中可以包含有完整的目标物,从而使得检测网络模型可以较为准确的分析出包含目标物的检测框集合。并且进一步对检测框集合去除冗余,使最终得到的检测框集合更加精简。
训练样本的好坏是影响深度检测网络模型训练结果的重要因素,因此在本申请实施例中,为了提升深度检测网络模型的检测准确性,可以在基于滑窗尺寸信息以及中心点信息,从图像中截取包含目标物的切图之后,计算滑窗尺寸信息对应的滑窗与锚点信息对应的目标框的交并比;将交并比大于或等于预设阈值的切图作为训练样本,利用训练样本对初始深度检测网络模型进行训练。
训练样本包括正样本和负样本,正样本可以是包含目标物的图像,负样本可以是不包含目标物的图像。一般情况下,会将交并比大于或等于预设阈值的切图作为训练样本的正样本,将交并比小于预设阈值的切图作为训练样本的负样本。但是交并比小于预设阈值的切图可能包含的是不完整的目标物,其不适合作为负样本,为了避免其对模型训练造成影响,可以抹除交并比小于预设阈值的切图。在实际应用中,可以采用填充灰度的方式达到抹除切图的效果。
在实际应用中,可以将交并比大于或等于预设阈值的切图作为正样本。将填充灰度之后的切图作为负样本,或者将预先设定好的不包含目标物的图像作为负样本。
通过计算目标框的交并比,可以有效的筛选出适合模型训练的正样本。并且通过填充灰度的方式,可以降低交并比小于预设阈值的切图对模型训练造成的影响,保证了模型训练的准确性。
图3为本申请实施例提供的一种基于锚点切图的图像处理装置的结构示意图,包括获取单元31、确定单元32截取单元33和训练单元34;
获取单元31,用于基于图像的标注信息,获取图像的锚点信息;其中,锚点信息包括中心点信息和锚点识别框的尺寸信息;
确定单元32,用于按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,确定出滑窗尺寸信息;
截取单元33,用于基于滑窗尺寸信息以及中心点信息,从图像中截取包含目标物的切图;
训练单元34,用于利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
可选地,还包括切分单元、缩放单元和分析单元;
切分单元,用于在获取到新图像的情况下,按照设定的多种尺寸信息对新图像进行切分;
缩放单元,用于按照深度检测网络模型要求的尺寸信息,对切分得到的各切片图像进行缩放;
分析单元,用于利用深度检测网络模型对缩放后的各切片图像进行分析,得到包含目标物的检测框集合。
可选地,还包括删除单元;
删除单元,用于利用soft-nms算法,从检测框集合中删除冗余检测框。
可选地,确定单元用于按照如下公式,计算滑窗尺寸信息S,
S=max(mn,min(mx,λw))
其中,mx表示滑窗最大值,mn表示滑窗最小值,λ表示尺寸倍数,w表示锚点识别框的尺寸信息。
可选地,在基于滑窗尺寸信息以及中心点信息,从图像中截取包含目标物的切图之后还包括计算单元和作为单元;
计算单元,用于计算滑窗尺寸信息对应的滑窗与锚点信息对应的目标框的交并比;
作为单元,用于将交并比大于或等于预设阈值的切图作为训练样本,利用训练样本对初始深度检测网络模型进行训练。
可选地,还包括抹除单元;
抹除单元,用于抹除交并比小于预设阈值的切图。
可选地,目标物所在位置包括目标物的中心点横纵坐标、目标物对应的目标滑框的宽高值、目标滑框包含的目标物属于设定物品的概率以及目标滑框包含的目标物所属的类别。
图3所对应实施例中特征的说明可以参见图1所对应实施例的相关说明,这里不再一一赘述。
由上述技术方案可以看出,基于图像的标注信息,获取图像的锚点信息;其中,锚点信息可以包括中心点信息和锚点识别框的尺寸信息。标注信息可以用于表示图像中目标物的位置,但是该位置并不精确,因此需要获取图像的锚点信息。按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及锚点识别框的尺寸信息,可以较为准确的确定出用于截取目标物的滑窗尺寸信息;基于滑窗尺寸信息以及中心点信息,可以从图像中截取包含目标物的切图,利用切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。在该技术方案中,依赖于图像中目标物所在的位置,可以确定出截取目标物的滑窗尺寸信息,从而可以从图像中截取包含目标物的切图。对于超大图像而言,通过切图的方式可以将图像截取为像素值较小的切图,从而使得初始深度检测网络模型可以实现对切图的检测分析,通过改变图像的形式,达到对超大图像检测分析的目的。
图4为本申请另一实施例提供的一种基于锚点切图的图像处理设备的结构图,如图4所示,基于锚点切图的图像处理设备包括:存储器20,用于存储计算机程序;
处理器21,用于执行计算机程序时实现如上述实施例基于锚点切图的图像处理方法的步骤。
本实施例提供的基于锚点切图的图像处理设备可以包括但不限于智能手机、平板电脑、笔记本电脑或台式电脑等。
其中,处理器21可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器21可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器21也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器21可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器21还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器20可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器20还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器20至少用于存储以下计算机程序201,其中,该计算机程序被处理器21加载并执行之后,能够实现前述任一实施例公开的基于锚点切图的图像处理方法的相关步骤。另外,存储器20所存储的资源还可以包括操作***202和数据203等,存储方式可以是短暂存储或者永久存储。其中,操作***202可以包括Windows、Unix、Linux等。数据203可以包括但不限于图像的标注信息、滑窗最大值、滑窗最小值、尺寸倍数等。
在一些实施例中,基于锚点切图的图像处理设备还可包括有显示屏22、输入输出接口23、通信接口24、电源25以及通信总线26。
本领域技术人员可以理解,图4中示出的结构并不构成对基于锚点切图的图像处理设备的限定,可以包括比图示更多或更少的组件。
可以理解的是,如果上述实施例中的基于锚点切图的图像处理方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
基于此,本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述任意一项基于锚点切图的图像处理方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
以上对本申请实施例所提供的一种基于锚点切图的图像处理方法、装置、设备和计算机可读存储介质进行了详细介绍。说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟 以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上对本申请所提供的一种基于锚点切图的图像处理方法、装置、设备和计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (10)

  1. 一种基于锚点切图的图像处理方法,其特征在于,包括:
    基于图像的标注信息,获取所述图像的锚点信息;其中,所述锚点信息包括中心点信息和锚点识别框的尺寸信息;
    按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息;
    基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图;
    利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
  2. 根据权利要求1所述的基于锚点切图的图像处理方法,其特征在于,在所述利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型之后还包括:
    在获取到新图像的情况下,按照设定的多种尺寸信息对所述新图像进行切分;
    按照所述深度检测网络模型要求的尺寸信息,对切分得到的各切片图像进行缩放;
    利用所述深度检测网络模型对缩放后的各切片图像进行分析,得到包含目标物的检测框集合。
  3. 根据权利要求2所述的基于锚点切图的图像处理方法,其特征在于,还包括:
    利用soft-nms算法,从所述检测框集合中删除冗余检测框。
  4. 根据权利要求1所述的基于锚点切图的图像处理方法,其特征在于,所述按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息包括:
    按照如下公式,计算所述滑窗尺寸信息S,
    S=max(mn,min(mx,λw))
    其中,mx表示滑窗最大值,mn表示滑窗最小值,λ表示尺寸倍数,w表示锚点识别框的尺寸信息。
  5. 根据权利要求1所述的基于锚点切图的图像处理方法,其特征在于,在所述基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图之后还包括:
    计算所述滑窗尺寸信息对应的滑窗与所述锚点信息对应的目标框的交并比;
    将所述交并比大于或等于预设阈值的切图作为训练样本,利用所述训练样本对初始深度检测网络模型进行训练。
  6. 根据权利要求5所述的基于锚点切图的图像处理方法,其特征在于,还包括:
    抹除所述交并比小于所述预设阈值的切图。
  7. 根据权利要求1至6任意一项所述的基于锚点切图的图像处理方法,其特征在于,所述目标物所在位置包括目标物的中心点横纵坐标、目标物对应的目标滑框的宽高值、所述目标滑框包含的目标物属于设定物品的概率以及所述目标滑框包含的目标物所属的类别。
  8. 一种基于锚点切图的图像处理装置,其特征在于,包括获取单元、确定单元、截取单元和训练单元;
    所述获取单元,用于基于图像的标注信息,获取所述图像的锚点信息;其中,所述锚点信息包括中心点信息和锚点识别框的尺寸信息;
    所述确定单元,用于按照设定的滑窗最大值、滑窗最小值、尺寸倍数以及所述锚点识别框的尺寸信息,确定出滑窗尺寸信息;
    所述截取单元,用于基于所述滑窗尺寸信息以及所述中心点信息,从所述图像中截取包含目标物的切图;
    所述训练单元,用于利用所述切图对初始深度检测网络模型进行训练,得到用于检测图像中目标物所在位置的深度检测网络模型。
  9. 一种基于锚点切图的图像处理设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序以实现如权利要求1至7任意一项所述基于锚点切图的图像处理方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任意一项所述基于锚点切图的图像处理方法的步骤。
PCT/CN2022/078357 2021-09-10 2022-02-28 一种基于锚点切图的图像处理方法、装置、设备和介质 WO2023035558A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111063785.3 2021-09-10
CN202111063785.3A CN113870196B (zh) 2021-09-10 2021-09-10 一种基于锚点切图的图像处理方法、装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2023035558A1 true WO2023035558A1 (zh) 2023-03-16

Family

ID=78995334

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078357 WO2023035558A1 (zh) 2021-09-10 2022-02-28 一种基于锚点切图的图像处理方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN113870196B (zh)
WO (1) WO2023035558A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870196B (zh) * 2021-09-10 2024-06-14 苏州浪潮智能科技有限公司 一种基于锚点切图的图像处理方法、装置、设备和介质
CN115108117B (zh) * 2022-05-26 2023-06-27 盈合(深圳)机器人与自动化科技有限公司 一种切割方法、***、终端及计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781839A (zh) * 2019-10-29 2020-02-11 北京环境特性研究所 一种基于滑窗的大尺寸图像中小目标识别方法
CN112001912A (zh) * 2020-08-27 2020-11-27 北京百度网讯科技有限公司 目标检测方法和装置、计算机***和可读存储介质
CN112927247A (zh) * 2021-03-08 2021-06-08 常州微亿智造科技有限公司 基于目标检测的切图方法、切图装置和存储介质
CN113870196A (zh) * 2021-09-10 2021-12-31 苏州浪潮智能科技有限公司 一种基于锚点切图的图像处理方法、装置、设备和介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977943B (zh) * 2019-02-14 2024-05-07 平安科技(深圳)有限公司 一种基于yolo的图像目标识别方法、***和存储介质
CN111626170B (zh) * 2020-05-20 2023-05-23 中铁二院工程集团有限责任公司 一种铁路边坡落石侵限检测的图像识别方法
CN111994377B (zh) * 2020-07-21 2022-04-08 浙江大华技术股份有限公司 包装箱工序检测的方法、装置和计算机设备
CN113221768A (zh) * 2021-05-18 2021-08-06 北京百度网讯科技有限公司 识别模型训练方法、识别方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781839A (zh) * 2019-10-29 2020-02-11 北京环境特性研究所 一种基于滑窗的大尺寸图像中小目标识别方法
CN112001912A (zh) * 2020-08-27 2020-11-27 北京百度网讯科技有限公司 目标检测方法和装置、计算机***和可读存储介质
CN112927247A (zh) * 2021-03-08 2021-06-08 常州微亿智造科技有限公司 基于目标检测的切图方法、切图装置和存储介质
CN113870196A (zh) * 2021-09-10 2021-12-31 苏州浪潮智能科技有限公司 一种基于锚点切图的图像处理方法、装置、设备和介质

Also Published As

Publication number Publication date
CN113870196A (zh) 2021-12-31
CN113870196B (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
Du et al. Overview of two-stage object detection algorithms
WO2021004402A1 (zh) 图像识别方法及装置、存储介质和处理器
CN107808143B (zh) 基于计算机视觉的动态手势识别方法
Zhang et al. Research on face detection technology based on MTCNN
CN109697416B (zh) 一种视频数据处理方法和相关装置
WO2023035558A1 (zh) 一种基于锚点切图的图像处理方法、装置、设备和介质
CN104050471B (zh) 一种自然场景文字检测方法及***
WO2019114036A1 (zh) 人脸检测方法及装置、计算机装置和计算机可读存储介质
CN110580699A (zh) 基于改进Faster RCNN算法的病理图像细胞核检测方法
Wu et al. Real-time traffic sign detection and classification towards real traffic scene
US20150170005A1 (en) Semantic object selection
WO2019071976A1 (zh) 基于区域增长和眼动模型的全景图像显著性检测方法
CN111553406A (zh) 基于改进yolo-v3的目标检测***、方法及终端
CN110163239A (zh) 一种基于超像素和条件随机场的弱监督图像语义分割方法
CN109033972A (zh) 一种目标检测方法、装置、设备及存储介质
CN111353491B (zh) 一种文字方向确定方法、装置、设备及存储介质
WO2021077947A1 (zh) 图像处理方法、装置、设备及存储介质
CN111931953A (zh) 一种废旧手机多尺度特征深度森林识别方法
CN113487610B (zh) 疱疹图像识别方法、装置、计算机设备和存储介质
CN111353544A (zh) 一种基于改进的Mixed Pooling-YOLOV3目标检测方法
WO2023246921A1 (zh) 目标属性识别方法、模型训练方法和装置
CN111145222A (zh) 一种结合烟雾运动趋势和纹理特征的火灾检测方法
CN106844785A (zh) 一种基于显著性分割的基于内容的图像检索方法
Cheng et al. A direct regression scene text detector with position-sensitive segmentation
CN114187595A (zh) 基于视觉特征和语义特征融合的文档布局识别方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866038

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE