CN116758273A - Object detection method, apparatus, computer device, storage medium, and program product - Google Patents

Object detection method, apparatus, computer device, storage medium, and program product Download PDF

Info

Publication number
CN116758273A
CN116758273A CN202310468599.0A CN202310468599A CN116758273A CN 116758273 A CN116758273 A CN 116758273A CN 202310468599 A CN202310468599 A CN 202310468599A CN 116758273 A CN116758273 A CN 116758273A
Authority
CN
China
Prior art keywords
target
image
information
position information
anchor frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310468599.0A
Other languages
Chinese (zh)
Inventor
高圣溥
饶竹一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Co ltd
Original Assignee
Shenzhen Power Supply Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Co ltd filed Critical Shenzhen Power Supply Co ltd
Priority to CN202310468599.0A priority Critical patent/CN116758273A/en
Publication of CN116758273A publication Critical patent/CN116758273A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to a target detection method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring a sample image with intrusion target position information; performing image processing on the sample image based on the position information to obtain anchor frame information; and inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected. By adopting the method, the target detection speed can be improved on the basis of ensuring the target detection precision.

Description

Object detection method, apparatus, computer device, storage medium, and program product
Technical Field
The present application relates to the field of computer vision, and in particular, to a target detection method, apparatus, computer device, storage medium, and program product.
Background
Object detection refers to finding all objects (objects) of interest in an image, and determining their category and location is one of the core problems in the field of computer vision. Because various objects have different appearances, shapes and postures, and the interference of factors such as illumination, shielding and the like during imaging is added, target detection is always the most challenging problem in the field of computer vision. Currently, in the field of power engineering, a target detection technology is mainly focused on application scenes of daily screening of a power transmission line and abnormal monitoring of equipment elements in a station.
Research on a target detection technology in a power system can be classified into a conventional method and a deep learning method. Traditional methods such as an edge detection-based method, a template matching-based method, and a machine learning-based method; deep learning methods such as convolutional neural network (CNN, convolutional Neural Network), recurrent neural network (RNN, recurrent Neural Networks), long short term memory (LSTM, long Short Term Memory), and the like. Compared with the traditional method, the deep learning method has higher intellectualization, and improves the target detection speed and the detection precision, so the deep learning method is widely applied.
However, in the development of the deep learning method, researchers are faced with such a problem: for the complex system background of the transformer substation, if higher detection precision is pursued, the calculated amount is large and the detection speed is reduced; if higher detection speeds are sought, some detection accuracy is sacrificed. Therefore, how to balance the detection speed and the detection accuracy remains a key and difficult point of research in the field of target detection.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target detection method, apparatus, computer device, computer-readable storage medium, and computer program product that are capable of improving the detection speed while ensuring the detection accuracy.
In a first aspect, the present application provides a method of target detection. The method comprises the following steps:
acquiring a sample image with intrusion target position information;
performing image processing on the sample image based on the position information to obtain anchor frame information;
and inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected.
In one embodiment, acquiring a sample image with intrusion target location information includes:
acquiring an original image with an intrusion target;
marking position information frame by frame on an original image to obtain a sample image;
the position information comprises the center point coordinates of the area where the intrusion target is located, the width of the area where the intrusion target is located and the height of the area where the intrusion target is located.
In one embodiment, performing image processing on the sample image based on the position information, and acquiring the anchor frame information includes:
extracting characteristic points of a sample image by adopting an optical flow method based on the position information, tracking the characteristic points, and acquiring optical flow tracks corresponding to the characteristic points;
screening the optical flow track by adopting a clustering algorithm to obtain a target track;
and acquiring anchor frame information based on the feature points corresponding to the target track.
In one embodiment, extracting feature points from a sample image and tracking the feature points by using an optical flow method based on position information, and obtaining optical flow tracks corresponding to the feature points includes:
extracting feature points of the sample image according to the position information to obtain a plurality of feature points;
fitting characteristic points of two different frames in a sample image, acquiring a motion vector of the characteristic points through a least square overdetermined equation, and acquiring an optical flow track according to the motion vector.
In one embodiment, the filtering the optical flow track by using a clustering algorithm, and obtaining the target track includes:
clustering the optical flow tracks to obtain a plurality of clustering clusters consisting of the optical flow tracks;
and selecting a target track according to the cluster.
In one embodiment, the object detection model includes a region candidate network and a region convolutional neural network;
inputting the anchor frame information as the super parameter into a target detection model for carrying out target detection on the image to be detected, so as to obtain a detection result of the intrusion target in the image to be detected, wherein the detection result comprises the following steps:
extracting a target region of the image to be detected based on anchor frame information through a region candidate network;
and classifying and regressing the target area through the area convolution neural network to obtain a detection result about the invasion target in the image to be detected.
In a second aspect, the application further provides a target detection device. The device comprises:
the sample acquisition module is used for acquiring a sample image with intrusion target position information;
the information acquisition module is used for carrying out image processing on the sample image based on the position information to acquire anchor frame information;
the detection module is used for inputting the anchor frame information as the super parameter into a target detection model for carrying out target detection on the image to be detected so as to obtain a detection result of an intrusion target in the image to be detected.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, the memory stores a computer program, and the processor executes the computer program to realize the following steps:
acquiring a sample image with intrusion target position information;
performing image processing on the sample image based on the position information to obtain anchor frame information;
and inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring a sample image with intrusion target position information;
performing image processing on the sample image based on the position information to obtain anchor frame information;
and inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected.
In a fifth aspect, the present application also provides a computer program product. Computer program product comprising a computer program which, when executed by a processor, realizes the steps of:
acquiring a sample image with intrusion target position information;
performing image processing on the sample image based on the position information to obtain anchor frame information;
and inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected.
According to the target detection method, the target detection device, the computer equipment, the storage medium and the computer program product, the anchor frame information about the intrusion target is obtained by performing image processing on the sample image marked with the position information, and then the anchor frame information is used as the super-parameter to be input into the target detection model, so that the detection result of the image to be detected is obtained. The method has the advantages that the position information of the invasion target is calibrated, the region of interest is selected, and the subsequent image processing process is concentrated in the region of interest, so that irrelevant information is reduced, the operation amount is reduced, and the operation speed is improved. By acquiring the anchor frame information of the intrusion target and inputting the anchor frame information as the super parameter into the target detection model, the target detection model can detect the target by the anchor frame which is more fit with the actual requirement, and the time of manual adjustment or self-adaptive learning of the anchor frame is reduced. Therefore, the target detection model is improved by the more targeted anchor frame, so that the target detection model can extract the invasion target in a smaller area and in a smaller time, and the detection speed of the invasion target is improved on the basis of ensuring the detection precision.
Drawings
FIG. 1 is a diagram of an application environment for a target detection method in one embodiment;
FIG. 2 is a flow chart of a method of detecting targets in one embodiment;
FIG. 3 is a schematic diagram of a data processing flow of the object detection model in one embodiment;
FIG. 4 is a flow chart of a method of detecting targets in one embodiment;
FIG. 5 is a block diagram of an object detection device in one embodiment;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The target detection method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
The line connection point of the transformer station in the power network is a facility for converting voltage, exchanging power and collecting and distributing electric energy. The main transformer and main high-voltage electrical equipment of the outdoor transformer substation are all arranged outdoors, the arrangement mode occupies a large area, and the electric device and the building can fully meet the distance requirements of various types, such as electric safety clear distance, fireproof distance and the like, and are convenient to operate, maintain and overhaul. However, the outdoor transformer substation needs to be kept in mind against foreign matter invasion, and for example, in windy weather, the risk of tripping the transmission line is further increased due to floaters such as plastic greenhouses, plastic films and the like around the outdoor transformer substation. The safety of the substation equipment is affected by the foreign matters at any time, and serious potential safety hazards are caused to the stable operation of the power grid. Accordingly, a detection and identification of foreign matter by a target detection technique has been proposed to solve this problem.
Conventional target detection includes a two-stage model and a one-stage model. The two-stage model is to split the target detection task into two tasks of positioning first and identifying and classifying later. The primary purpose of positioning is to retain as much useful foreground information as possible, filter out background information that is not useful for subsequent tasks, and then identify those foreground information separately. The two tasks of the two-stage model are respectively performed, so that higher detection precision can be achieved, but a series of difficulties need to be waited for to be overcome for detecting the moving target in real time at present. Unlike the two-stage model, the maximum bright point of the one-stage model can complete positioning and classification simultaneously only by a single network. Therefore, such networks must be trained very end-to-end, and must be faster when reasoning. Meanwhile, the improvement of the one-stage model on the detection of the small target is greatly improved. However, the first-stage model improves the detection speed compared to the second-stage model, but the detection accuracy is somewhat inferior.
In order to solve the problems, the application provides a target detection method, which improves the detection speed of foreign matters on the basis of ensuring the detection precision.
In one embodiment, as shown in fig. 2, a target detection method is provided, and the method is applied to the terminal 102 in fig. 1, for illustration, the method includes the following steps:
step 202, a sample image with intrusion target location information is acquired.
The sample image is an image of the transformer substation, and the sample image should contain an invasion target. The intrusion target is an object that enters the substation area in an unconventional manner, which may be a plastic film, kite, or the like. The position information is used for marking the size, the position, the area and the like of the intrusion target in the sample graph, and can be marked manually or automatically through a neural network. The sample images are all historical video images collected by a fixed camera facing the transformer substation.
In one embodiment, after the position is marked in the historical video image, two image preprocessing modes can be adopted to preprocess the marked video image, including clipping and mixing, so as to obtain more sample images. The processing sequence of clipping and mixing is not limited, so that four possible sample image amplification effects can be obtained, respectively clipping only, mixing only, clipping first and mixing then, and mixing first and clipping then.
And 204, performing image processing on the sample image based on the position information to acquire anchor frame information.
For the target detection task, there is one such classical solution: and traversing all possible pixel frames on the input image, selecting a correct pixel frame, and adjusting the position and the size of the pixel frame to enable the target object to be just in the pixel frame, so that the target detection task can be completed. These predicted pixel boxes are called anchor boxes.
And acquiring anchor frame information within a limited range of the position information, wherein the anchor frame information comprises the number, the size, the image division number and the like of anchor frames, and the anchor frame information is acquired according to the position information of the intrusion target so as to acquire the applicable anchor frame information of the conventional intrusion target.
And 206, inputting the anchor frame information as super parameters into a target detection model for target detection of the image to be detected so as to acquire a detection result of the intrusion target in the image to be detected.
The super-parameters refer to parameters preset before a machine learning process, and are not parameter data obtained through training. In general, the super parameters need to be optimized, and a group of optimal super parameters are selected for the learning machine so as to improve the learning performance and effect.
In the technical field of target detection, a general method is to give an anchor frame with a fixed size, slide from left to right and from top to bottom step by step according to a set pace, and input each window into a convolutional neural network for prediction and classification. However, the shape of the intrusion target of the transformer substation is variable, and the shape and the size of the intrusion target are more uncontrollable compared with the conventional recognition targets of automobiles, human bodies and the like. If the fixed size of the given anchor frame is not proper, too large interference information in the anchor frame is too much, so that the target detection speed is reduced; if the size is too small, all effective information cannot be completely acquired, and the target detection accuracy is reduced. In the other technical scheme, the anchor frame is manually or adaptively adjusted, and the calculation amount of adjustment can be increased due to the fact that the unsuitable anchor frame cannot achieve a good real-time detection effect.
Therefore, the embodiment acquires the information such as the shape and the size of the conventional intrusion target from the sample image, acquires anchor frame information, and inputs the anchor frame information into the target detection model as the super parameter, so that the target detection model can detect the intrusion target in a smaller area and in a smaller time.
In the target detection method, the anchor frame information about the intrusion target is acquired by performing image processing on the sample image marked with the position information, and then the anchor frame information is input into the target detection model as the super parameter to acquire the detection result of the image to be detected. The method has the advantages that the position information of the invasion target is calibrated, the region of interest is selected, and the subsequent image processing process is concentrated in the region of interest, so that irrelevant information is reduced, the operation amount is reduced, and the operation speed is improved. By acquiring the anchor frame information of the intrusion target and inputting the anchor frame information as the super parameter into the target detection model, the target detection model can detect the target by the anchor frame which is more fit with the actual requirement, and the time of manual learning or self-adaptive learning of the anchor frame is reduced. Therefore, the target detection model is improved by the more targeted anchor frame, so that the target detection model can extract the intrusion target in a smaller area and in a smaller time, and the detection speed of the intrusion target is improved on the basis of ensuring the detection precision.
In one embodiment, as shown in FIG. 3, step 202 includes: acquiring an original image with an intrusion target; marking position information frame by frame on an original image to obtain a sample image; the position information comprises the center point coordinates of the area where the intrusion target is located, the width of the area where the intrusion target is located and the height of the area where the intrusion target is located.
The original image is the historical video image and is acquired by a fixed camera facing the transformer substation. And marking the position information of each frame of the original image.
The position information comprises the coordinates of the central point of the area where the invasion target is located, the width of the area where the invasion target is located and the height of the area where the invasion target is located. In this embodiment, the coordinate system is determined according to the original image, for example, the original image is generally rectangular, four corners or center points of the original image are used as coordinate origins, and long-side and wide-side directions of the original image are used as x-axis and y-axis, so as to formulate the coordinate system. After the coordinate system is determined, the position information of the intrusion target is marked as (x, y, w, h), wherein (x, y) represents the center point coordinate of the area where the intrusion target is located, w represents the width of the area where the intrusion target is located, and h represents the height of the area where the intrusion target is located.
The area where the intrusion target is located is represented as a minimum rectangular frame which can completely cover the intrusion target, and the position information represents the range of the minimum rectangular frame.
The method has the advantages that the position information of the invasion target is calibrated, the region of interest is selected, and the subsequent image processing process is concentrated in the region of interest, so that irrelevant information is reduced, the operation amount is reduced, and the operation speed is improved.
In one embodiment, the original image is labeled with category information in addition to the location information. The category information is marked as L and is indicated as an intrusion target.
In one embodiment, step 204 includes: extracting characteristic points of a sample image by adopting an optical flow method based on the position information, tracking the characteristic points, and acquiring optical flow tracks corresponding to the characteristic points; screening the optical flow track by adopting a clustering algorithm to obtain a target track; and acquiring anchor frame information based on the feature points corresponding to the target track.
Among them, the optical flow method is a method for tracking visual movement, which can track a moving object in an image and calculate the movement speed of the object, and the principle is that the continuity of the image, that is, the position of the object in one image in the next frame of image changes when it moves.
Therefore, the characteristic points in the position information appointed range in each frame of sample image are extracted, and the characteristic points are tracked by utilizing the position change of the characteristic points of the moving object on different frames of sample images so as to acquire the optical flow track corresponding to each characteristic point.
Because the optical flow track obtained by the optical flow method has the problem that the error optical flow track is difficult to completely remove, the intrusion target and the background interference are distinguished by a clustering algorithm, so that the optical flow track is screened, and the target track is obtained. The target track is the optical flow track of the invading target after the error optical flow track is eliminated.
After the target track is acquired, the areas with the same movement trend are combined, and the areas can be regarded as movement track combinations of different characteristic points on the same invasion target. Therefore, according to the characteristic points corresponding to the target track, a plurality of characteristic points with the same motion trend can outline the range of the same intrusion target, and according to the range, the anchor frame information can be obtained.
In one embodiment, extracting feature points from a sample image and tracking the feature points by using an optical flow method based on position information, and obtaining optical flow tracks corresponding to the feature points includes: extracting feature points of the sample image according to the position information to obtain a plurality of feature points; fitting characteristic points of two different frames in a sample image, acquiring a motion vector of the characteristic points through a least square overdetermined equation, and acquiring an optical flow track according to the motion vector.
Specifically, since processing of a color image requires a lot of computation time and memory, it is necessary to process a sample image into a grayscale image first, and then to perform feature point extraction and tracking based on the grayscale image. The method of extracting feature points includes extraction based on the curvature of the contour (the point of the maximum value of the curvature of the object edge curve is the feature point) and extraction based on the change in the gray level of the image (the point of detecting the abrupt change in the gray level value of the image is the feature point).
After the feature points are acquired, the optical flow is solved by establishing an optical flow constraint equation set near the feature points. Since the system of equations has multiple unknowns, it is typically overdetermined, and the fitted feature point movement vector is typically obtained by solving a least squares solution. The process can be expressed by the following formula:
u=(I(x+Δx,y+Δy)-I(x,y))/Δt;
where u is a motion vector of a pixel, I (x, y) represents an intensity of the pixel (x, y) in the image, x and y represent an abscissa and an ordinate of the pixel, Δx and Δy represent an abscissa displacement amount and an ordinate displacement amount of the pixel, and Δt represents a time interval between two frames of the image.
And according to the movement vector, the starting point position of the characteristic point movement can be obtained, so that an optical flow track is obtained.
In one embodiment, filtering the optical flow track by using a clustering algorithm, and obtaining the target track includes: clustering the optical flow tracks to obtain a plurality of clustering clusters consisting of the optical flow tracks; and selecting a target track according to the cluster.
The present embodiment uses a K-means clustering method to screen feature points. The K-means clustering method is a clustering analysis algorithm based on distance measurement, and the basic principle is as follows: the data points are divided into K clusters such that the distance between the data points inside each cluster is minimum and the distance between clusters is maximum. The basic flow of the K-means algorithm is as follows:
and randomly selecting a clustering center from the feature points, classifying all the feature points by using a selected measurement mode according to the current clustering center, and calculating the mean value of the feature points of each current class to be used as the clustering center of the next iteration. And when the sum of squares of the distances from each feature point to the corresponding feature point is minimum, ending the iteration. Otherwise, the next iteration is continued.
Wherein, the sum of squares of the distances between each feature point and the corresponding feature point is represented as d, and the formula is as follows:
d=Σ((x_i-μ_j)^2);
where x_i represents the ith feature point, μ_j represents the cluster center of the jth class, Σ represents the summation.
After the cluster center is determined, the feature points corresponding to the cluster center form a cluster. Selecting a cluster with the optical flow track corresponding to the characteristic points in the cluster more conforming to the motion rule of the moving object, selecting the characteristic points corresponding to the cluster, taking the optical flow track corresponding to the selected characteristic points as a target track, and removing the rest invalid characteristic points.
In one embodiment, as shown in FIG. 3, the object detection model includes a region candidate network and a region convolutional neural network. Step 206 comprises: extracting a target region of the image to be detected based on anchor frame information through a region candidate network; and classifying and regressing the target area through the area convolution neural network to obtain a detection result about the invasion target in the image to be detected.
And (3) inputting the anchor frame information obtained in the step (204) into a target feature extraction model as a super parameter so as to optimize the target feature model. In this embodiment, the target feature model selects the Faster R-CNN model. The advantage of Faster R-CNN is that it can detect complex targets very quickly and can detect more targets. The model utilizes a Region candidate network (RPN, region Proposal Network) for extracting possible target regions by preset anchor frame information, and then utilizes a Region convolutional neural network (R-CNN, region-Convolutional Neural Network) to classify and regress each proposed target Region so as to determine a final detection result.
In this embodiment, the accuracy of the detection result of the target detection model is measured by the loss function. The formula of the loss function L is:
L=Lcls+λLreg;
wherein Lcls is a classification loss used for measuring the accuracy of the target detection model prediction category; lreg is regression loss used for measuring the accuracy of the anchor frame used by the target detection model; λ is a weight coefficient used to measure the relative importance of the classification loss and the regression loss.
The weight coefficient λ in the penalty function of fast R-CNN typically needs to be set manually, the size of which is typically determined by the characteristics of the training dataset and the requirements of the target detection task. In practice, the value of λ is typically adjusted between 0.1 and 10.
Generally, when the imbalance of the positive and negative sample ratios in the training data set is serious, the classification loss takes a more important place, and a larger lambda value can be set to balance the weights of the classification and regression losses. In contrast, when the proportion of positive and negative samples is relatively balanced, the importance of classification and regression losses is relatively close, and the lambda value can be properly adjusted to achieve a better training effect.
It should be noted that the setting of λ in the fast R-CNN is not unique, and the specific value needs to be adjusted according to the actual situation. Meanwhile, the optimal lambda value can be determined by a cross-validation method and the like, so that a better detection effect can be obtained.
In one embodiment, as shown in fig. 4, the target detection method includes:
step 402, an original image with an intrusion target is acquired.
Step 404, marking position information frame by frame on an original image to obtain a sample image; the position information comprises the center point coordinates of the area where the intrusion target is located, the width of the area where the intrusion target is located and the height of the area where the intrusion target is located.
Step 406, the sample image is amplified.
In step 408, the feature point extraction and tracking are performed on the sample image by adopting an optical flow method.
In step 410, the feature points are filtered using a clustering algorithm.
And step 412, acquiring anchor frame information based on the filtered feature points.
In step 414, the anchor box information is input into the Faster R-CNN model.
And step 416, acquiring an actually-shot image to be detected of the transformer substation.
And 418, performing target recognition on the image to be detected based on the fast R-CNN model.
Step 420, it is determined whether an intrusion target is detected. If yes, outputting a detection result; if not, returning to step 416, and acquiring a next actually-shot image to be detected of the transformer substation.
According to the application, the anchor frame generation mode in the prior art is optimized, the optical flow method and the clustering algorithm are combined, the anchor frame size is screened and optimized, and meanwhile, the training of adding the saliency map through position labeling is adopted to learn the characteristics of more small-size targets so as to distinguish the small-size targets from the background, so that the intrusion targets in the transformer substation can be accurately and rapidly positioned, and the intrusion targets can be fed back to the management and control department as effective information for processing, thereby effectively preventing and solving potential safety hazards possibly existing in the transformer substation.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an object detection device for realizing the above-mentioned object detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the object detection device provided below may be referred to the limitation of the object detection method hereinabove, and will not be repeated here.
In one embodiment, as shown in fig. 5, there is provided an object detection apparatus including: a sample acquisition module 502, an information acquisition module 504, and a detection module 506, wherein:
a sample acquisition module 502, configured to acquire a sample image with intrusion target location information;
an information obtaining module 504, configured to perform image processing on the sample image based on the position information, to obtain anchor frame information;
the detection module 506 is configured to input the anchor frame information as a super parameter into a target detection model for performing target detection on the image to be detected, so as to obtain a detection result about the intrusion target in the image to be detected.
The sample acquisition module 502 is further configured to acquire an original image with an intrusion target; marking position information frame by frame on an original image to obtain a sample image; the position information comprises the center point coordinates of the area where the intrusion target is located, the width of the area where the intrusion target is located and the height of the area where the intrusion target is located.
The information obtaining module 504 is further configured to extract feature points from the sample image and track the feature points by using an optical flow method based on the position information, so as to obtain optical flow tracks corresponding to the feature points; screening the optical flow track by adopting a clustering algorithm to obtain a target track; and acquiring anchor frame information based on the feature points corresponding to the target track.
The information obtaining module 504 is further configured to extract feature points of the sample image according to the location information, and obtain a plurality of feature points; fitting characteristic points of two different frames in a sample image, and obtaining a motion vector of the characteristic points through a least square overdetermined equation to obtain an optical flow track.
The information obtaining module 504 is further configured to cluster the optical flow tracks to obtain a plurality of clusters composed of the optical flow tracks; and selecting a target track according to the cluster.
The object detection model includes a region candidate network and a region convolutional neural network. The detection module 506 is further configured to perform, through the area candidate network, target area extraction on the image to be detected based on the anchor frame information; and classifying and regressing the target area through the area convolution neural network to obtain a detection result about the invasion target in the image to be detected.
The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor, which when executing the computer program implements the method embodiments described above.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the above-described method embodiments.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the above-described method embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A method of target detection, the method comprising:
acquiring a sample image with intrusion target position information;
performing image processing on the sample image based on the position information to acquire anchor frame information;
and inputting the anchor frame information as a super parameter into a target detection model for carrying out target detection on the image to be detected so as to obtain a detection result of the intrusion target in the image to be detected.
2. The method of claim 1, wherein the acquiring a sample image with intrusion target location information comprises:
acquiring an original image with the intrusion target;
marking position information frame by frame on the original image to obtain the sample image;
the position information comprises the center point coordinates of the area where the intrusion target is located, the width of the area where the intrusion target is located and the height of the area where the intrusion target is located.
3. The method of claim 1, wherein the image processing the sample image based on the location information, obtaining anchor frame information, comprises:
extracting characteristic points from the sample image by adopting an optical flow method based on the position information, tracking the characteristic points, and obtaining optical flow tracks corresponding to the characteristic points;
screening the optical flow track by adopting a clustering algorithm to obtain a target track;
and acquiring the anchor frame information based on the characteristic points corresponding to the target track.
4. The method of claim 3, wherein the extracting feature points from the sample image by using an optical flow method based on the position information and tracking the feature points, and obtaining optical flow trajectories corresponding to the feature points comprises:
extracting feature points of the sample image according to the position information to obtain a plurality of feature points;
fitting the characteristic points of two different frames in the sample image, acquiring a movement vector of the characteristic points through a least square overdetermined equation, and acquiring the optical flow track according to the movement vector.
5. The method of claim 3, wherein the filtering the optical flow trajectories with a clustering algorithm to obtain target trajectories comprises:
clustering the optical flow tracks to obtain a plurality of clusters consisting of the optical flow tracks;
and selecting the target track according to the cluster.
6. The method of claim 1, wherein the target detection model comprises a region candidate network and a region convolutional neural network;
inputting the anchor frame information as a super parameter into a target detection model for target detection of an image to be detected, so as to obtain a detection result of the intrusion target in the image to be detected, wherein the detection result comprises the following steps:
extracting a target region from the image to be detected based on the anchor frame information through the region candidate network;
and classifying and regressing the target area through the area convolution neural network to obtain a detection result of the intrusion target in the image to be detected.
7. An object detection device, the device comprising:
the sample acquisition module is used for acquiring a sample image with intrusion target position information;
the information acquisition module is used for carrying out image processing on the sample image based on the position information to acquire anchor frame information;
the detection module is used for inputting the anchor frame information as a super parameter into a target detection model for detecting a target of an image to be detected so as to obtain a detection result of the intrusion target in the image to be detected.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202310468599.0A 2023-04-23 2023-04-23 Object detection method, apparatus, computer device, storage medium, and program product Pending CN116758273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310468599.0A CN116758273A (en) 2023-04-23 2023-04-23 Object detection method, apparatus, computer device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310468599.0A CN116758273A (en) 2023-04-23 2023-04-23 Object detection method, apparatus, computer device, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN116758273A true CN116758273A (en) 2023-09-15

Family

ID=87950311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310468599.0A Pending CN116758273A (en) 2023-04-23 2023-04-23 Object detection method, apparatus, computer device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN116758273A (en)

Similar Documents

Publication Publication Date Title
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
Li et al. Adaptive deep convolutional neural networks for scene-specific object detection
CN111931764B (en) Target detection method, target detection frame and related equipment
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN107545263B (en) Object detection method and device
Wang et al. SSRNet: In-field counting wheat ears using multi-stage convolutional neural network
Li et al. A multi-scale cucumber disease detection method in natural scenes based on YOLOv5
CN109711416B (en) Target identification method and device, computer equipment and storage medium
CN112364931B (en) Few-sample target detection method and network system based on meta-feature and weight adjustment
CN111738344A (en) Rapid target detection method based on multi-scale fusion
EP2697775A1 (en) Method of detecting facial attributes
Cepni et al. Vehicle detection using different deep learning algorithms from image sequence
CN110287798B (en) Vector network pedestrian detection method based on feature modularization and context fusion
CN113807399A (en) Neural network training method, neural network detection method and neural network detection device
CN111199556A (en) Indoor pedestrian detection and tracking method based on camera
Jiang et al. A self-attention network for smoke detection
CN105405138A (en) Water surface target tracking method based on saliency detection
CN110580446A (en) Behavior semantic subdivision understanding method, system, computer device and medium
CN111091101A (en) High-precision pedestrian detection method, system and device based on one-step method
Feng et al. A novel saliency detection method for wild animal monitoring images with WMSN
Li et al. Electronic product surface defect detection based on a MSSD network
Kumar et al. Drone-based apple detection: Finding the depth of apples using YOLOv7 architecture with multi-head attention mechanism
CN109064444B (en) Track slab disease detection method based on significance analysis
CN114529583A (en) Power equipment tracking method and tracking system based on residual regression network
Li et al. Object detection for uav images based on improved yolov6

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination