CN111461217A

CN111461217A - Aerial image small target detection method based on feature fusion and up-sampling

Info

Publication number: CN111461217A
Application number: CN202010247656.9A
Authority: CN
Inventors: 林沪; 刘琼
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-28
Anticipated expiration: 2040-03-31
Also published as: CN111461217B

Abstract

The invention discloses an aerial image small target detection method based on feature fusion and up-sampling. The method comprises the following steps: extracting a feature set of an input image by using a backbone network; constructing a channel standardization module, and standardizing the channel dimension of the features; constructing an upper sampling layer based on learning, and carrying out resolution up-sampling on the features to obtain a feature set with uniform resolution; group normalization of the features grouped by channels is performed; splicing the feature sets to generate fusion features; performing down-sampling on the fusion features for multiple times, and constructing a feature pyramid for detection; the head detection network is used to classify and locate the target. The invention relates to a feature fusion and feature up-sampling method used in a target detection training and testing stage, which can obviously improve the detection precision of small targets in aerial images and only slightly increase the calculation overhead.

Description

Aerial image small target detection method based on feature fusion and up-sampling

Technical Field

The invention relates to the field of aerial image target detection, in particular to an aerial image small target detection method based on feature fusion and up-sampling.

Background

Compared with a monitoring camera with a fixed position and a view field, the camera on the unmanned aerial vehicle has natural advantages, such as convenient deployment, strong maneuverability and wide view field. These advantages are expected to provide services for many applications such as security monitoring, search rescue, and people flow monitoring. In many drone applications, target detection in aerial images is a key component, critical to developing fully autonomous systems, and therefore an urgent need in the industry.

Although convolutional neural networks have achieved significant effects in the field of general target detection, their performance in an unmanned aerial vehicle aerial scene is not satisfactory. The main reason is that the relative scale and the absolute resolution of the target are smaller in the image in the aerial scene of the unmanned aerial vehicle than in the image in the ordinary scene. Therefore, the resolution of the corresponding feature response area in the extracted convolution feature map is smaller, and higher omission ratio is caused. More specifically, the feature map extracted by the convolutional neural network is often reduced 1/4 or 1/8 relative to the length and width of the input image, and the characterization capability of the feature map on small-scale targets is further weakened. Therefore, how to increase the feature expression of small-scale targets becomes a key point of the system design.

The existing convolutional neural network method mostly adopts an FPN feature fusion network to improve the feature expression of a small-scale target. The specific process is as follows: extracting a feature set of an input image by using a backbone network; using bilinear interpolation to up-sample the high-layer low-resolution characteristic diagram, and fusing the high-layer low-resolution characteristic diagram with the adjacent low-layer characteristic diagram in sequence; and detecting by using the fused feature set. However, the existing FPN feature fusion network cannot sufficiently fuse information of feature maps with different resolutions, and bilinear interpolation is not an efficient upsampling method. These two drawbacks result in FPN having limited effectiveness in the detection of small size targets.

In summary, the key to improving the small target detection at the aerial photography view angle is to improve the feature fusion strategy and the up-sampling method. The invention provides an aerial image small target detection method based on feature fusion and up-sampling, which comprises the following steps: extracting a feature set of an input image by using a backbone network; constructing a channel standardization module, and standardizing the channel dimension of the features; constructing an upper sampling layer based on learning, and carrying out resolution up-sampling on the features to obtain a feature set with uniform resolution; group normalization of the features grouped by channels is performed; splicing the feature sets to generate fusion features; performing down-sampling on the fusion features for multiple times, and constructing a feature pyramid for detection; and classifying and positioning the target by using the head detection network, and finally outputting a detection result.

The present invention relates to the following prior art documents:

prior art document 1: he Kaim, et al, "Deep residual learning for imaging recognition," Proceedings of the IEEE conference on computer vision and dpattern recognition.2016.

Prior document 2: wu Y, He K.group nomenclature [ C ]// Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.

Prior document 3: L in T Y, Goyal P, Girshick R, et al. focal local for dense object detection [ C ]// Proceedings of the IEEE international conference on computer vision.2017: 2980-.

The invention provides a feature extraction network, which is mainly composed of a residual module based on residual connection, can reduce the training difficulty of a deep network, and learns the features with deeper depth and stronger representation capability.A feature normalization method is provided in the prior document 2, so that the problems that the effect is poor and the optimal solution is difficult to converge when the batch is biased during network training in the prior batch normalization are solved.A high-performance one-stage dense target detector is trained in the prior document 3 based on an FPN network and a Focal L oss loss function.

Disclosure of Invention

The invention aims to improve the detection precision of small targets in aerial images, thereby better completing the tasks of security monitoring, search and rescue, stream of people monitoring and the like based on unmanned aerial vehicle target detection. In order to achieve the purpose, the invention provides an aerial image small target detection method based on feature fusion and up-sampling, wherein a channel standardization module and an up-sampling layer are constructed to carry out channel standardization and up-sampling on features; then, carrying out group normalization on the features and splicing the features into fusion features; performing down-sampling for multiple times on the basis of the fusion characteristics to generate a characteristic pyramid; and classifying and positioning the target by using the head network, and outputting a detection result.

The purpose of the invention is realized by at least one of the following technical solutions.

An aerial image small target detection method based on feature fusion and up-sampling comprises the following steps:

s1, extracting a feature set of the input image by using a backbone network;

s2, constructing a channel standardization module, and standardizing the channel dimension of the features extracted in the step S1;

s3, constructing an up-sampling layer based on learning, and performing resolution up-sampling on the normalized features to obtain a feature set with uniform resolution;

s4, carrying out group normalization of grouping the characteristics with uniform resolution according to channels;

s5, splicing the feature sets after group normalization to generate fusion features;

s6, downsampling the fusion features for multiple times, and constructing a feature pyramid for detection;

and S7, using the head to detect the network classification and positioning the target, and finally outputting the detection result.

Further, in step S1, the backbone network is a residual convolution network, the residual convolution network includes five stages, each stage is formed by connecting a plurality of similar residual modules in series, and the resolution of the output feature maps of the residual modules is the same; 2 times of down sampling exists between every two adjacent stages, and the length and the width of the feature map after down sampling are reduced by two times respectively; and the finally extracted feature set is a set consisting of the last feature map of the two to five stages of the backbone network.

Further, in step S2, the channel normalization module is implemented by a convolutional layer; the input of the channel standardization module is a feature map in a feature set output by the backbone network, and the output of the channel standardization module is a feature map with standardized channel dimensions; the resolution of the feature map output by the channel normalization module is the same as that of the input feature map; the channel dimension number of the output feature map of the channel normalization module is a fixed value.

Further, in step S3, the learning-based upsampling layer is formed by cascading several upsampling modules; for the feature maps with different resolutions input by the learning-based upsampling layer, the number of cascaded upsampling modules is different, and the resolution of the finally output feature maps is the same; the up-sampling module is formed by connecting a layer of channel expansion layer and a layer of pixel rearrangement layer in series; the resolution of the up-sampling feature map output by the up-sampling module is 2 times of that of the input feature map; the channel dimension number of the feature diagram output by the channel expansion layer is 4 times of the channel dimension number of the input feature diagram; the number of channels of the feature map output by the pixel rearrangement layer is 1/4 of the number of channels of the input feature map, and the resolution of the output feature map is 2 times of the resolution of the input feature map.

Further, the formula of the pixel rearrangement layer is as follows:

wherein ,

representing the pixel rearrangement layer, L representing the input feature map of the pixel rearrangement layer, x and y representing the abscissa and ordinate of the output feature map, C representing the channel coordinate of the input feature map, r representing the up-sampling magnification,

meaning rounding down and mod meaning remainder.

Further, in step S4, the group normalization by channel includes the steps of:

s4.1, let I ═ I (I)_N，i_C，i_H，i_W) A feature map indicating the resolution uniformity output in step S3 as 4D tensors indexed in the order of (N, C, H, W); calculating all of the characteristic maps I according to the following formulaMean μ and variance σ of pixels:

∈ denotes errors between adjacent floating point numbers in a computer, S denotes a pixel set formed by grouping feature maps I according to channels, k denotes one pixel in the pixel set S, m denotes the size of the pixel set S, and the pixel set S is defined as:

wherein G represents the number of groups, which is a predefined hyper-parameter, and the value range of G is an integral multiple of 16;

representing the number of channels in each group;

denotes rounding down, i_C、i_NRespectively, coordinates of the feature map I on an N, C axis; k is a radical of_N and k_CRespectively, the coordinates of the pixel k on the N, C axis;

s4.2, normalizing the characteristic diagram I, wherein the calculation formula is as follows:

wherein ,

representing the normalized feature map, sigma and mu being the mean and variance calculated in step S4.1;

s4.3, fitting a linear transformation after normalization to compensate for possible loss of feature expression capacity; the specific transformation formula is as follows:

where O represents the profile of the group normalized output grouped by channel, and γ and β represent the fitted scale and offset parameters, respectively, where the γ parameter is initialized to 1 and β is initialized to 0.

Further, in step S5, the splicing the feature sets to generate the fused features refers to a splicing operation of tensors; and splicing the characteristic graphs along the dimension direction by the splicing operation of the tensor to obtain a fusion characteristic tensor.

Further, in step S6, the downsampling the fused features for multiple times to construct the feature pyramid means that the feature map is subjected to a plurality of downsampling layers connected in series to generate a series of low-resolution feature maps; the feature map pyramid is a set formed by low-resolution feature maps output by a down-sampling layer; the resolution of the output low resolution feature map is 1/2 the resolution of the feature map of the downsampled layer input.

Further, in step S7, the head detection network sequentially inputs the feature map of the feature pyramid output in step S6, and outputs the category and position coordinates of the object; the head detection network comprises a classification full convolution network and a regression full convolution network;

further, the calculation steps of the target classification full convolution network are as follows:

s7.1.1, inputting the characteristic diagram of the characteristic pyramid output in the step S6 into a plurality of buffer convolution layers connected in series; the resolution and the channel dimension number of the characteristic diagram output by the buffer convolution layer are the same as those of the input characteristic diagram;

s7.1.2, inputting the characteristic diagram output by the buffer convolution layer into the classification prediction layer; the classified prediction layer consists of a convolutional layer; let x be (x)_N，x_C，x_H，x_W) The 4D tensor is indexed by the (N, C, H, W) sequence and represents the classification result output by the classification prediction layer; where N is the batch axis, C is the channel axis, and H and W are each longA degree and width axis; the batch, length and width (N, H, W) of the x are the same as the input feature map; the number of the channels of x is Cls A, Cls is the number of target categories, and A is the number of preset anchors;

further, the target regression full convolution network is calculated by the following steps:

s7.2.1, inputting the characteristic diagram of the characteristic pyramid output in the step S6 into the buffer convolution layers connected in series; the resolution and the channel dimension number of the characteristic diagram output by the buffer convolution layer are the same as those of the input characteristic diagram;

s7.2.2, inputting the characteristic diagram output by the buffer convolution layer into the regression prediction layer; the regression prediction layer consists of a convolution layer; let y be (y)_N，y_C，y_H，y_W) Representing the regression result output by the regression prediction layer by using 4D tensors indexed in the (N, C, H, W) sequence; where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the batch, length and width (N, H, W) of the y are the same as the input feature map; and the number C of the channels of the y is 4A, and A is the number of the preset anchors.

S7.3, the results x and y output by the classification full convolution network and the regression full convolution network are combined to obtain (z, z) as the 4D tensor z indexed in the order of (N, C, H, W)_N，z_C，z_H，z_W) (ii) a Where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the size of C is (4+ Cls). times.A, Cls is the number of target categories, and A is the number of preset anchors; the 4D tensor z is a target detection result output by the network, and includes the category and the position coordinates of the target.

Compared with the prior art, the invention has the beneficial effects that:

the invention improves the characteristic fusion process and the characteristic up-sampling method, can obviously improve the representation capability of the characteristic diagram, improves the small target detection precision, and only slightly increases the calculation overhead.

Drawings

FIG. 1 is a flow chart of a method for detecting small targets in aerial images based on feature fusion and upsampling;

FIG. 2 is a schematic diagram of a feature fusion network according to an embodiment of the present invention;

FIG. 3 is a diagram of a pixel rearrangement layer according to an embodiment of the present invention.

Detailed Description

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. It includes various specific details to aid understanding, but these are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

The terms and words used in the following description and claims are not limited to the written meaning, but are used only by the inventors to enable a clear and consistent understanding of the disclosure. Accordingly, it should be apparent to those skilled in the art that the following description of the various embodiments of the present disclosure is provided for illustration only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.

Example (b):

a method for detecting small targets in aerial images based on feature fusion and up-sampling is disclosed, as shown in FIG. 1, and comprises the following steps:

s1, extracting a feature set of the input image by using a backbone network;

the main network is a residual convolution network which comprises five stages, each stage is formed by connecting a plurality of similar residual modules in series, and the resolution ratios of the output characteristic graphs of the residual modules are the same; 2 times of down sampling exists between every two adjacent stages, and the length and the width of the feature map after down sampling are reduced by two times respectively; and the finally extracted feature set is a set consisting of the last feature map of the two to five stages of the backbone network.

the channel standardization module is realized by a convolution layer; the input of the channel standardization module is a feature map in a feature set output by the backbone network, and the output of the channel standardization module is a feature map with standardized channel dimensions; the resolution of the feature map output by the channel normalization module is the same as that of the input feature map; the channel dimension number of the output feature map of the channel normalization module is a fixed value.

In this embodiment, the convolution kernel size of the convolution layer in the channel normalization module is 1, the padding is 1, and the step length is 1; the channel dimension number of the feature map output by the channel normalization module is a fixed value 256.

the learning-based up-sampling layer is formed by cascading a plurality of up-sampling modules; for the feature maps with different resolutions input by the learning-based upsampling layer, the number of cascaded upsampling modules is different, and the resolution of the finally output feature maps is the same; the up-sampling module is formed by connecting a layer of channel expansion layer and a layer of pixel rearrangement layer in series; the resolution of the up-sampling feature map output by the up-sampling module is 2 times of that of the input feature map; the channel dimension number of the feature diagram output by the channel expansion layer is 4 times of the channel dimension number of the input feature diagram; the number of channels of the feature map output by the pixel rearrangement layer is 1/4 of the number of channels of the input feature map, and the resolution of the output feature map is 2 times of the resolution of the input feature map.

In this embodiment, the channel expansion layer is implemented by a convolution layer, the convolution kernel size is 1, the padding is 1, the step length is 1, and the channel dimension number of the output feature map is 1024; the channel dimension number of the output characteristic map of the pixel rearrangement layer is 256;

as shown in fig. 3, the formula of the pixel rearrangement layer is as follows:

wherein ,

meaning rounding down and mod meaning remainder.

the group normalization grouped by channel includes the steps of:

s4.1, let I ═ I (I)_N，i_C，i_H，i_W) A feature map indicating the resolution uniformity output in step S3 as 4D tensors indexed in the order of (N, C, H, W); where N is the batch axis, C is the channel axis, and H and W are the feature map length and width axes, respectively; the mean μ and variance σ of all pixels of the feature map I are calculated according to the following equations:

∈ denotes errors between adjacent floating point numbers in a computer, wherein the size of ∈ in Python language is 2.220446049250313e-16, S denotes a pixel set formed by grouping feature maps I according to channels, k denotes one pixel in the pixel set S, m denotes the size of the pixel set S, and the pixel set S is defined as:

wherein G represents the number of groups and is a predefined hyper-parameter, the value range of G is an integral multiple of 16, and the value of G is 32 under the default condition;

representing the number of channels in each group;

wherein ,

where y represents the profile of the group normalized output grouped by channel, and γ and β represent the fitted scale and offset parameters, respectively, where the γ parameter is initialized to 1 and β is initialized to 0.

S5, as shown in FIG. 2, splicing the feature set after group normalization to generate fusion features;

the splicing of the feature sets to generate the fused features refers to the splicing operation of tensors; and splicing the characteristic graphs along the dimension direction by the splicing operation of the tensor to obtain a fusion characteristic tensor.

the step of carrying out multiple downsampling on the fusion features to construct a feature pyramid refers to that a feature graph is subjected to a plurality of downsampling layers connected in series to generate a series of low-resolution feature graphs; the feature map pyramid is a set formed by low-resolution feature maps output by a down-sampling layer; the resolution of the output low resolution feature map is 1/2 of the resolution of the feature map of the downsampled layer input;

in this embodiment, the downsampling layer is implemented by a convolution layer; the convolution kernel size of the downsampling layer is 3, the padding is 1, and the step length is 2.

S7, classifying and positioning the target by using the head detection network, and finally outputting the detection result;

the head detection network sequentially inputs the feature map of the feature pyramid output in the step S6, and outputs the category and the position coordinates of the target; the head detection network comprises a classification full convolution network and a regression full convolution network;

the calculation steps of the target classification full convolution network are as follows:

s7.1.1, in this embodiment, the feature map of the feature pyramid output in step S6 is input into the series of 4 buffer convolution layers; the resolution and the channel dimension number of the characteristic diagram output by the buffer convolution layer are the same as those of the input characteristic diagram; the convolution kernel size of the buffer convolution layer is 3, the filling is 1, the step length is 1, and the number of output channels is 256;

s7.1.2, inputting the characteristic diagram output by the buffer convolution layer into the classification prediction layer; the classified prediction layer consists of a convolutional layer; let x be (x)_N，x_C，x_H，x_W) The 4D tensor is indexed by the (N, C, H, W) sequence and represents the classification result output by the classification prediction layer; where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the batch, length and width (N, H, W) of the x are the same as the input feature map; in this embodiment, the size of the convolution kernel of the classified prediction layer is 3, the padding is 1, the step length is 1, the number of output channels is Cls × a, Cls is the number of target classes, and a is the number of preset anchors;

the calculation steps of the target regression full convolution network are as follows:

s7.2.1, in this embodiment, the feature map of the feature pyramid output in step S6 is input into the series of 4 buffer convolution layers; the resolution and the channel dimension number of the characteristic diagram output by the buffer convolution layer are the same as those of the input characteristic diagram; the convolution kernel size of the buffer convolution layer is 3, the filling is 1, the step length is 1, and the number of output channels is 256;

s7.2.2, inputting the characteristic diagram output by the buffer convolution layer into the regression prediction layer; the regression prediction layer consists of a convolution layer; let y be (y)_N，y_C，y_H，y_W) Representing the regression result output by the regression prediction layer by using 4D tensors indexed in the (N, C, H, W) sequence; where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the batch, length and width (N, H, W) of the y are the same as the input feature map; in this embodiment, the convolution kernel size of the regression prediction layer is 3, the padding is 1, the step size is 1, the number of output channels is 4 × a, and a is the number of preset anchors.

The above examples of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An aerial image small target detection method based on feature fusion and up-sampling is characterized by comprising the following steps:

s1, extracting a feature set of the input image by using a backbone network;

2. The method for detecting the small target of the aerial image based on the feature fusion and the up-sampling as claimed in claim 1, wherein in the step S1, the main network is a residual convolution network, the residual convolution network comprises five stages, each stage is formed by connecting a plurality of similar residual modules in series, and the resolution of the feature graph output by each residual module is the same; 2 times of down sampling exists between every two adjacent stages, and the length and the width of the feature map after down sampling are reduced by two times respectively; and the finally extracted feature set is a set consisting of the last feature map of the two to five stages of the backbone network.

3. The method for detecting the small target in the aerial image based on the feature fusion and the up-sampling as claimed in claim 1, wherein in step S2, the channel normalization module is implemented by a convolutional layer; the input of the channel standardization module is a feature map in a feature set output by the backbone network, and the output of the channel standardization module is a feature map with standardized channel dimensions; the resolution of the feature map output by the channel normalization module is the same as that of the input feature map; the channel dimension number of the output feature map of the channel normalization module is a fixed value.

4. The method for detecting the small target in the aerial image based on the feature fusion and the up-sampling as claimed in claim 1, wherein in step S3, the learning-based up-sampling layer is formed by cascading a plurality of up-sampling modules; for the feature maps with different resolutions input by the learning-based upsampling layer, the number of cascaded upsampling modules is different, and the resolution of the finally output feature maps is the same; the up-sampling module is formed by connecting a layer of channel expansion layer and a layer of pixel rearrangement layer in series; the resolution of the up-sampling feature map output by the up-sampling module is 2 times of that of the input feature map; the channel dimension number of the feature diagram output by the channel expansion layer is 4 times of the channel dimension number of the input feature diagram; the number of channels of the feature map output by the pixel rearrangement layer is 1/4 of the number of channels of the input feature map, and the resolution of the output feature map is 2 times of the resolution of the input feature map.

5. The method for detecting the small target of the aerial image based on the feature fusion and the up-sampling according to claim 4, wherein the formula of the pixel rearrangement layer is as follows:

wherein ,

meaning rounding down and mod meaning remainder.

6. The method for detecting small targets in aerial images based on feature fusion and upsampling as claimed in claim 1, wherein in step S4, the group normalization by channel comprises the following steps:

representing the number of channels in each group;

wherein ,

7. The method for detecting the small target of the aerial image based on the feature fusion and the upsampling as recited in claim 1, wherein in the step S5, the feature set is spliced to generate the fusion feature, which is a splicing operation of tensor; and splicing the characteristic graphs along the dimension direction by the splicing operation of the tensor to obtain a fusion characteristic tensor.

8. The method for detecting the small target of the aerial image based on the feature fusion and the up-sampling as claimed in claim 1, wherein in the step S6, the multiple down-sampling of the fusion feature to construct the feature pyramid means that the feature graph is processed through a plurality of down-sampling layers connected in series to generate a series of feature graphs with low resolution; the feature map pyramid is a set formed by low-resolution feature maps output by a down-sampling layer; the resolution of the output low resolution feature map is 1/2 the resolution of the feature map of the downsampled layer input.

9. The method for detecting small targets in aerial images based on feature fusion and up-sampling as claimed in claim 1, wherein in step S7, the head detection network sequentially inputs the feature map of the feature pyramid output in step S6 and outputs the category and position coordinates of the target; the head detection network comprises a classification full convolution network and a regression full convolution network;

s7.1.2, inputting the characteristic diagram output by the buffer convolution layer into the classification prediction layer; the classified prediction layer consists of a convolutional layer; let x be (x)_N，x_C，x_H，x_W) The 4D tensor is indexed by the (N, C, H, W) sequence and represents the classification result output by the classification prediction layer; where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the batch, length and width (N, H, W) of the x are the same as the input feature map; the number of the channels of x is Cls A, Cls is the number of target categories, and A is the number of preset anchors;

s7.2.2, inputting the characteristic diagram output by the buffer convolution layer into the regression prediction layer; the regression prediction layer consists of a convolution layer; let y be (y)_N，y_C，y_H，y_W) Representing the regression result output by the regression prediction layer by using 4D tensors indexed in the (N, C, H, W) sequence; where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the batch, length and width (N, H, W) of the y are the same as the input feature map; the number C of the channels of the y is 4A, and A is the number of the preset anchors;

s7.3, the results x and y output by the classification full convolution network and the regression full convolution network are combined to obtain (z, z) as the 4D tensor z indexed in the order of (N, C, H, W)_N，z_C，z_H，z_W) (ii) a Where N is the batch axis, C is the channel axis, and H and W are the length and width axes, respectively; the size of C is (4+ Cls). times.A, Cls is the number of target categories, and A is the number of preset anchors; the 4D tensor z is the target detection result output by the network, and includes the category and the position coordinates of the target.