CN114332456A - Target detection and identification method and device for large-resolution image - Google Patents

Target detection and identification method and device for large-resolution image Download PDF

Info

Publication number
CN114332456A
CN114332456A CN202210255384.6A CN202210255384A CN114332456A CN 114332456 A CN114332456 A CN 114332456A CN 202210255384 A CN202210255384 A CN 202210255384A CN 114332456 A CN114332456 A CN 114332456A
Authority
CN
China
Prior art keywords
image
sub
category
information
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210255384.6A
Other languages
Chinese (zh)
Inventor
张凯
马乐乐
崔超然
逯天斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Liju Robot Technology Co ltd
Original Assignee
Shandong Liju Robot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Liju Robot Technology Co ltd filed Critical Shandong Liju Robot Technology Co ltd
Priority to CN202210255384.6A priority Critical patent/CN114332456A/en
Publication of CN114332456A publication Critical patent/CN114332456A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition, in particular to a method and a device for detecting and recognizing a target of a high-resolution image, wherein the method comprises the following steps: acquiring a large-resolution image set, and performing data enhancement to obtain an enhanced image set; dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof; coding and fusing the subimages and the position information thereof to obtain corresponding data tensors; performing feature representation learning on the data tensor layer by layer based on a Faster R-CNN model, fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism, determining feature representations corresponding to subimages, further determining candidate target positions, and performing regression and classification to determine a final target position and a category to which the final target position belongs; and determining the final target position and the category of the original image according to the final target position and the category of the original image. Through the scheme, the final model performance is improved.

Description

Target detection and identification method and device for large-resolution image
Technical Field
The invention relates to the technical field of image recognition, in particular to a method and a device for detecting and recognizing a target of a high-resolution image.
Background
With the rapid development of information technology, the convenience, high efficiency, safety and reliability brought by information processing make industrial informatization become the development trend of various industries. The image is a medium which is most ubiquitous in daily life and plays a key role in the information transfer process. Therefore, it is an important research content in the computer vision direction to efficiently and reliably utilize image information, and the image information attracts a great number of researchers.
In the early stage, because the semantic information of the image is complex, the traditional machine learning algorithm cannot fully understand the image information, and therefore the research is relatively simple. In recent years, the advent of deep learning, the improvement of computing performance, and the arrival of the era of big data brought about the fact that it is necessary to fully understand the use of image information, and many research subjects of fire and heat, such as image classification, image segmentation, object detection, face recognition, and re-recognition, have been generated around this, and have been largely successful in these directions.
It is noted, however, that although image studies have been successful in all directions with the continuous updating of learning algorithms, many study directions still face significant challenges in some special contexts. Including the problem of large resolution image object detection. The image is generally taken by professional equipment and has its fixed purpose, as distinguished from ordinary life pictures, such as satellite images or other aerial images. Such images often have their fixed role, e.g. for observing terrain, vegetation, water conservancy, or for military reconnaissance, re-orThe device is used for meteorological monitoring and the like. Taking a terrestrial satellite image as an example, the image needs to be detected when being used for observing the terrain, vegetation or water conservancy, and if the target is small, the common image target detection method does not work because the resolution of the image is too large, and firstly, the input of the common method is generally positioned at 102*102 - 103*103The size of the large resolution image is usually much larger than the size of the large resolution image, and if the size of the original data is simply scaled, a large amount of information is lost, and especially when the detected target is small, the target may even be lost; secondly, the size of the image with large resolution is larger, and the amount of rich information is larger, so that the proportion of a target area to a background area is smaller; thirdly, due to the special application background, the quantity of the images is small, a large amount of experimental data cannot be obtained, and the training of the model is not facilitated. Due to the factors, the original method cannot obtain good precision in the aspect of large-resolution image target detection, so that the normal performance requirement cannot be met.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a method and a device for detecting and identifying a target of a high-resolution image.
According to a first aspect of the embodiments of the present invention, there is provided a method for detecting and identifying a target in a high-resolution image, the method including:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
In one embodiment, preferably, segmenting each image in the enhanced image set to obtain a corresponding sub-image and position information thereof includes:
dividing each original image by adopting a fixed window overlapping type dividing mode to obtain corresponding sub-images, and arranging the sub-images according to the sequence;
and performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In one embodiment, preferably, the fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model by using an attention mechanism to determine the corresponding feature representation of the sub-image includes:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 187573DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs includes:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
Figure 42396DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 822133DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 217343DEST_PATH_IMAGE004
which represents the parameters to be learned and,
Figure 720130DEST_PATH_IMAGE005
is shown asiA vector representation of the position of the individual candidate objects,
Figure 542593DEST_PATH_IMAGE006
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 809626DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 946209DEST_PATH_IMAGE008
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls representing classificationsThe loss of the carbon dioxide gas is reduced,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a second aspect of embodiments of the present invention, there is provided an object detection and recognition apparatus for a large-resolution image, the apparatus including:
the enhancement module is used for acquiring a large-resolution image set and enhancing data of the large-resolution image set to obtain an enhanced image set;
the segmentation module is used for segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
the processing module is used for coding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
the fusion module is used for performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism so as to determine feature representation corresponding to the subimages;
the first determining module is used for determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and the second determining module is used for determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
In one embodiment, preferably, the segmentation module includes:
the segmentation unit is used for segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images and arranging the sub-images in sequence;
and the preprocessing unit is used for preprocessing data of each sub-image and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In one embodiment, preferably, the fusion module is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 615088DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, the second determining module is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
Figure 608452DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 362781DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 286744DEST_PATH_IMAGE004
which represents the parameters to be learned and,
Figure 433298DEST_PATH_IMAGE005
is shown asiA vector representation of the position of the individual candidate objects,
Figure 863142DEST_PATH_IMAGE010
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 104768DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 317575DEST_PATH_IMAGE011
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspect.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the invention realizes target detection on a high-resolution image based on high-low layer information fusion, and compared with the previous method. Since the large-resolution image size is much larger than the normal image, the large-resolution image needs to be preprocessed in order to make the model feasible. If the mode of equal scaling is adopted, target information is likely to be lost, and in order to solve the problem, the invention adopts the window overlapping type to segment the image to obtain the sub-images of the original image, and the sub-images are sequentially used as the input of the model, so that the integrity of the information is effectively ensured; meanwhile, in order to avoid losing the position information of the sub-image in the original image, the invention additionally adds the position information on the characteristic representation to enhance the integrity of the spatial information of the sub-image; in addition, during feature learning, the method utilizes an attention mechanism to perform weighted fusion on high-layer information and low-layer information to obtain corresponding convolutional layer output for downstream tasks, and feature information is greatly enriched through the method, so that the final model performance is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method for object detection and recognition of a high resolution image according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a step S102 in a target detecting and recognizing method of a large-resolution image according to an exemplary embodiment.
FIG. 3 is a schematic diagram illustrating a method of object detection and recognition for a high resolution image according to an exemplary embodiment.
FIG. 4 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating a method for object detection and recognition of a high resolution image, as shown in FIG. 1, according to an exemplary embodiment, the method comprising:
step S101, acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
due to the special background of the large-resolution picture, the data volume is far smaller than that of a common picture task, so that the problem of model training insufficiency caused by the insufficiency of the data volume is easily caused.
Step S102, segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
because the size of the original image is too large, the whole image cannot be input into the model at one time, and the information is easily lost by the traditional downsampling method.
Step S103, encoding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
step S104, based on a Faster R-CNN model, performing feature representation learning layer by layer on the data tensor, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
step S105, determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and step S106, determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
The method is based on high-low layer information fusion, utilizes an attention mechanism and multi-layer weighted fusion as final feature representation. Since the large-resolution image size is much larger than the normal image, the large-resolution image needs to be preprocessed in order to make the model feasible. If the mode of equal scaling is adopted, target information is likely to be lost, and in order to solve the problem, the invention adopts the window overlapping type to segment the image to obtain the sub-images of the original image, and the sub-images are sequentially used as the input of the model, so that the integrity of the information is effectively ensured; meanwhile, in order to avoid losing the position information of the sub-image in the original image, the invention additionally adds the position information on the characteristic representation to enhance the integrity of the spatial information of the sub-image; in addition, during feature learning, the method utilizes an attention mechanism to perform weighted fusion on high-layer information and low-layer information to obtain corresponding convolutional layer output for downstream tasks, and feature information is greatly enriched through the method, so that the final model performance is improved.
Fig. 2 is a flowchart illustrating a step S102 in a target detecting and recognizing method of a large-resolution image according to an exemplary embodiment.
As shown in fig. 2, in one embodiment, preferably, the step S102 includes:
step S201, segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images, and arranging the sub-images in sequence;
step S202, performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In order to make the data information more accurate, the position information of the obtained sub-image in the original image is added, the position information comprises four coordinates of upper left, lower left, upper right and lower right, a four-dimensional vector is used for representing (x, y, w, h), wherein x and y are the coordinates of the center point of the sub-image in the original image, and w and h are the width and the height of the sub-image respectively.
In one embodiment, preferably, the fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model by using an attention mechanism to determine the corresponding feature representation of the sub-image includes:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 961045DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs includes:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
Figure 296212DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 212084DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 290899DEST_PATH_IMAGE012
which represents the parameters to be learned and,
Figure 788876DEST_PATH_IMAGE013
is shown asiA vector representation of the position of the individual candidate objects,
Figure 560523DEST_PATH_IMAGE006
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 448844DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 331350DEST_PATH_IMAGE014
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
The above technical solution of the present invention is described in detail with a specific embodiment, as shown in fig. 3, the Attention mechanism module effectively performs weighted fusion on the low-level and high-level information, mainly extracts three layers of representations, namely, low, medium and high-level representations, and uses the low, medium and high-level representations as q, k and v in the Attention mechanism, in this way, the general weighted fusion process is simplified, that is, it is not necessary to separately calculate corresponding weights for the low, medium and high-level representations, and then fuse the weighted information into one representation. By adopting the mode, the method greatly simplifies the original operation, fully utilizes high-low layer information and enriches characteristic representation, thereby providing a better representation for downstream tasks.
FIG. 4 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
As shown in fig. 4, an object detecting and recognizing apparatus for a large resolution image, the apparatus comprising:
the enhancing module 41 is configured to obtain a large-resolution image set, and perform data enhancement on the large-resolution image set to obtain an enhanced image set;
a segmentation module 42, configured to segment each original image in the enhanced image set to obtain a corresponding sub-image and position information thereof;
a processing module 43, configured to perform encoding and fusion processing on the sub-image and the position information thereof to obtain a corresponding data tensor;
a fusion module 44, configured to perform feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fuse low-layer information, middle-layer information, and high-layer information of the Faster R-CNN model by using an attention mechanism to determine feature representations corresponding to the subimages;
a first determining module 45, configured to determine a candidate target position according to the feature representation corresponding to each sub-image, and perform regression and classification to determine a final target position of each sub-image and a category to which the final target position belongs;
and a second determining module 46, configured to determine the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs.
FIG. 5 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
As shown in fig. 5, in one embodiment, the segmentation module 42 preferably includes:
a dividing unit 51, configured to divide each original image by using a fixed-window overlapping type dividing manner to obtain corresponding sub-images, and arrange the sub-images in sequence;
the preprocessing unit 52 is configured to perform data preprocessing on each sub-image, and determine position information of the sub-image in the original image, where the position information includes coordinates of a center point of the sub-image in the original image and a width and a height of the sub-image.
In one embodiment, preferably, the fusion module 44 is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 683834DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, the second determining module 46 is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
Figure 314798DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 752732DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 173349DEST_PATH_IMAGE004
which represents the parameters to be learned and,
Figure 583602DEST_PATH_IMAGE005
is shown asiA vector representation of the position of the individual candidate objects,
Figure 697051DEST_PATH_IMAGE006
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 622282DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 33541DEST_PATH_IMAGE011
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspect.
According to a fourth aspect of the embodiments of the present invention, there is provided a target detection and recognition system based on a large-resolution image, the system including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
It is further understood that the term "plurality" means two or more, and other terms are analogous. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "first," "second," and the like are used to describe various information and that such information should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," and the like are fully interchangeable. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.
It is further to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A method for object detection and recognition of a high resolution image, the method comprising:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
2. The method of claim 1, wherein segmenting each image in the enhanced image set to obtain a corresponding sub-image and its position information comprises:
dividing each original image by adopting a fixed window overlapping type dividing mode to obtain corresponding sub-images, and arranging the sub-images according to the sequence;
and performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
3. The method as claimed in claim 1, wherein fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model using an attention mechanism to determine the corresponding feature representation of the sub-image comprises:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 832709DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
4. The method of claim 1, wherein determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs comprises:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
5. The method of claim 1, wherein the candidate target locations are determined using a first calculation:
Figure 117059DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 529586DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 495268DEST_PATH_IMAGE004
which represents the parameters to be learned and,
Figure 676851DEST_PATH_IMAGE005
is shown asiA vector representation of the position of the individual candidate objects,
Figure 866524DEST_PATH_IMAGE006
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 953297DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 519408DEST_PATH_IMAGE008
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
6. An apparatus for object detection and recognition of a high resolution image, the apparatus comprising:
the enhancement module is used for acquiring a large-resolution image set and enhancing data of the large-resolution image set to obtain an enhanced image set;
the segmentation module is used for segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
the processing module is used for coding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
the fusion module is used for performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism so as to determine feature representation corresponding to the subimages;
the first determining module is used for determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and the second determining module is used for determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
7. The apparatus of claim 6, wherein the segmentation module comprises:
the segmentation unit is used for segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images and arranging the sub-images in sequence;
and the preprocessing unit is used for preprocessing data of each sub-image and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
8. The apparatus of claim 6, wherein the fusion module is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
Figure 821076DEST_PATH_IMAGE001
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
9. The apparatus of claim 6, wherein the second determining module is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
10. The apparatus of claim 6, wherein the candidate target locations are determined using a first calculation formula:
Figure 119333DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 506452DEST_PATH_IMAGE003
indicating a loss of the candidate target location,
Figure 876254DEST_PATH_IMAGE004
which represents the parameters to be learned and,
Figure 452335DEST_PATH_IMAGE005
is shown asiA vector representation of the position of the individual candidate objects,
Figure 249390DEST_PATH_IMAGE006
is shown asiThe offset at which the candidate target location changes to the true target location,
Figure 858226DEST_PATH_IMAGE007
which represents the amount of the offset of the prediction,
Figure 31718DEST_PATH_IMAGE008
representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein the content of the first and second substances,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
CN202210255384.6A 2022-03-16 2022-03-16 Target detection and identification method and device for large-resolution image Pending CN114332456A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210255384.6A CN114332456A (en) 2022-03-16 2022-03-16 Target detection and identification method and device for large-resolution image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210255384.6A CN114332456A (en) 2022-03-16 2022-03-16 Target detection and identification method and device for large-resolution image

Publications (1)

Publication Number Publication Date
CN114332456A true CN114332456A (en) 2022-04-12

Family

ID=81033942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210255384.6A Pending CN114332456A (en) 2022-03-16 2022-03-16 Target detection and identification method and device for large-resolution image

Country Status (1)

Country Link
CN (1) CN114332456A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180158189A1 (en) * 2016-12-07 2018-06-07 Samsung Electronics Co., Ltd. System and method for a deep learning machine for object detection
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning
CN109886269A (en) * 2019-02-27 2019-06-14 南京中设航空科技发展有限公司 A kind of transit advertising board recognition methods based on attention mechanism
CN111191730A (en) * 2020-01-02 2020-05-22 中国航空工业集团公司西安航空计算技术研究所 Method and system for detecting oversized image target facing embedded deep learning
CN111507958A (en) * 2020-04-15 2020-08-07 全球能源互联网研究院有限公司 Target detection method, training method of detection model and electronic equipment
CN112861982A (en) * 2021-02-24 2021-05-28 佛山市南海区广工大数控装备协同创新研究院 Long-tail target detection method based on gradient average
CN113538331A (en) * 2021-05-13 2021-10-22 中国地质大学(武汉) Metal surface damage target detection and identification method, device, equipment and storage medium
CN113989744A (en) * 2021-10-29 2022-01-28 西安电子科技大学 Pedestrian target detection method and system based on oversized high-resolution image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180158189A1 (en) * 2016-12-07 2018-06-07 Samsung Electronics Co., Ltd. System and method for a deep learning machine for object detection
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning
CN109886269A (en) * 2019-02-27 2019-06-14 南京中设航空科技发展有限公司 A kind of transit advertising board recognition methods based on attention mechanism
CN111191730A (en) * 2020-01-02 2020-05-22 中国航空工业集团公司西安航空计算技术研究所 Method and system for detecting oversized image target facing embedded deep learning
CN111507958A (en) * 2020-04-15 2020-08-07 全球能源互联网研究院有限公司 Target detection method, training method of detection model and electronic equipment
CN112861982A (en) * 2021-02-24 2021-05-28 佛山市南海区广工大数控装备协同创新研究院 Long-tail target detection method based on gradient average
CN113538331A (en) * 2021-05-13 2021-10-22 中国地质大学(武汉) Metal surface damage target detection and identification method, device, equipment and storage medium
CN113989744A (en) * 2021-10-29 2022-01-28 西安电子科技大学 Pedestrian target detection method and system based on oversized high-resolution image

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YUHUA C.等: "Domain Adaptive Faster R-CNN for Object Detection in the Wild", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
吴建鑫: "《模式识别》", 31 March 2020, 北京:机械工业出版社 *
唐子惠 编著: "《医学人工智能导论》", 30 April 2020, 上海:上海科学技术出版社 *
林刚 等: "基于改进Faster-RCNN的输电线巡检图像多目标检测及定位", 《电力自动化设备》 *
赵杰 等: "《智能机器人技术:安保、巡逻、处置类警用机器人研究实践》", 31 January 2021, 北京:机械工业出版社 *

Similar Documents

Publication Publication Date Title
CN109859190B (en) Target area detection method based on deep learning
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN114119638A (en) Medical image segmentation method integrating multi-scale features and attention mechanism
CN110414344B (en) Character classification method based on video, intelligent terminal and storage medium
CN110532920A (en) Smallest number data set face identification method based on FaceNet method
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN111274994A (en) Cartoon face detection method and device, electronic equipment and computer readable medium
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN116229192B (en) ODConvBS-YOLOv s-based flame smoke detection method
CN113379771A (en) Hierarchical human body analytic semantic segmentation method with edge constraint
Zhu et al. YOLOv7-CSAW for maritime target detection
Li et al. Recurrent prediction with spatio-temporal attention for crowd attribute recognition
CN113743521B (en) Target detection method based on multi-scale context awareness
CN113221731B (en) Multi-scale remote sensing image target detection method and system
CN114662605A (en) Flame detection method based on improved YOLOv5 model
CN115471901B (en) Multi-pose face frontization method and system based on generation of confrontation network
Ouyang et al. An anchor-free detector with channel-based prior and bottom-enhancement for underwater object detection
Liu et al. Tracking with mutual attention network
CN114332456A (en) Target detection and identification method and device for large-resolution image
CN113824989B (en) Video processing method, device and computer readable storage medium
CN115240163A (en) Traffic sign detection method and system based on one-stage detection network
CN115410089A (en) Self-adaptive local context embedded optical remote sensing small-scale target detection method
CN116958615A (en) Picture identification method, device, equipment and medium
CN115424027B (en) Image similarity comparison method, device and equipment for image foreground person

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220412