WO2022152050A1 - 一种对象检测方法、装置、计算机设备及存储介质 - Google Patents

一种对象检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022152050A1
WO2022152050A1 PCT/CN2022/070696 CN2022070696W WO2022152050A1 WO 2022152050 A1 WO2022152050 A1 WO 2022152050A1 CN 2022070696 W CN2022070696 W CN 2022070696W WO 2022152050 A1 WO2022152050 A1 WO 2022152050A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
feature
neural network
target image
Prior art date
Application number
PCT/CN2022/070696
Other languages
English (en)
French (fr)
Inventor
周云松
何园
王诚
李弘扬
蒋沁宏
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110063318.4A external-priority patent/CN112733773B/zh
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022152050A1 publication Critical patent/WO2022152050A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular, to an object detection method, apparatus, computer device and storage medium.
  • the monocular 3D (3-Dimension, three-dimensional) target detection technology currently used in the field of autonomous driving has a very reliable detection accuracy in a fixed camera coordinate system. Due to the influence of flatness and slope, the posture of the monocular camera may change when capturing road images during driving, which in turn causes the relationship between the camera coordinate system and the world coordinate system to change.
  • Embodiments of the present disclosure provide at least an object detection method, apparatus, computer device, and storage medium.
  • an embodiment of the present disclosure provides an object detection method, including: acquiring a target image; determining, based on the target image, posture change information of a camera that captures the target image when capturing the target image; The posture change information corrects the initial image feature of the target image to obtain the target image feature of the target image; and determines the information of the object in the target image based on the target image feature.
  • the attitude change information of the camera is obtained through the captured target image, and the initial image features of the target image are corrected based on the attitude change information, so as to avoid the influence of the pose change of the camera on the image characteristics.
  • the target image features all correspond to the camera devices with the same pose, so that the target image is less affected by the pose of the camera device, and then the accuracy and reliability of object detection can be improved when using the target image features for object detection.
  • the determining, based on the target image, the posture change information of the camera that captures the target image when capturing the target image includes: based on an initial image feature of the target image, Determine horizon information in the target image; and determine attitude change information of the camera device when capturing the target image based on the horizon information.
  • Using the horizon information can more accurately determine the attitude change information of the camera device, and using the attitude change information can improve the accuracy of object detection.
  • the horizon information includes position information of the horizon;
  • the attitude change information includes first rotation angle information of the camera on a horizontal plane; and the determination of the The posture change information of the camera device when capturing the target image includes: determining the first rotation angle information of the camera device based on the position information of the horizon.
  • the angle change of the camera device on the horizontal plane can be determined more accurately.
  • the posture change information includes second rotation angle information of the camera device on a vertical plane; the camera device that captures the target image is determined based on the target image to be shooting
  • the posture change information of the target image further includes: determining vanishing point information in the target image based on the initial image feature of the target image; determining, based on the vanishing point information, that the camera is shooting the second rotation angle information of the target image.
  • the angle change of the camera device on the vertical plane can be determined more accurately.
  • the determining the information of the object in the target image based on the target image feature includes: determining, based on the target image feature, that the object in the target image is in a calibration coordinate system information under the calibration coordinate system; based on the conversion relationship between the calibration coordinate system and the world coordinate system, and the information of the object under the calibration coordinate system, determine the information of the object in the world coordinate system.
  • the information of the object in the target image in the calibration coordinate system can be converted into the world coordinate system more accurately, and the information of the object in the world coordinate system can be obtained.
  • the posture change information is determined by using a first neural network.
  • the first neural network is obtained by training the following steps: acquiring a first training sample; the first training sample includes the sample initial features of the first sample image, the first sample The marked horizon information in the image and the marked vanishing point information in the first sample image; input the first sample image into the first neural network to be trained to obtain predicted horizon information and predicted vanishing point information; the marked horizon information, the predicted horizon information, the marked vanishing point information and the predicted vanishing point information, to determine a first loss; using the first loss to train the first neural network to be trained to obtain The first neural network to be trained is completed.
  • the first neural network is trained by using the first loss determined by marking the horizon information, the predicted horizon information, and the marked vanishing point information and the predicted vanishing point information, which can ensure that the trained first neural network can determine more accurate horizon information. and vanishing point information, so that more accurate attitude change information can be obtained.
  • the target image feature is determined using a second neural network.
  • the second neural network is obtained by training the following steps: acquiring a second training sample; the second training sample includes an original image, a calibration image, and a reference of a camera that captures the original image posture change information; the posture of the camera corresponding to the calibration image is a standard posture; extract the image features in the original image to obtain the original image features, wherein the original image features include a first content feature and a first style feature , the first content feature includes the object outline and the position of the edge in the original image, and the first style feature includes the texture and material information of the original image; based on the calibration image, determine the calibration image.
  • the calibration image feature includes a second content feature and a second style feature
  • the second content feature includes the object outline and the position of the edge in the calibration image
  • the second style feature includes the calibration image
  • the second content features in the calibration image captured by the camera device in the standard posture and the reference posture change information of the camera device are used to train the second content feature of the camera device.
  • the second neural network can not only ensure that the second neural network obtained by training can accurately determine the posture change information of the camera device, but also can reduce the amount of data used for training and improve the training efficiency.
  • the training of the second neural network based on the original image feature, the calibration image feature and the reference pose change information includes: combining the original image feature with the Input the second neural network to be trained with reference to the attitude change information to obtain the modified predicted image feature; use the predicted image feature and the second content feature to determine a second loss; The trained second neural network is trained to obtain a trained second neural network.
  • the second loss is determined by using the predicted image features and the second content features corresponding to the calibration image, and the second loss is used to train the second loss.
  • the neural network can improve the ability of the second neural network to correct the content features in the image features, and obtain the content features consistent with the standard pose, thereby improving the detection accuracy of the pose information.
  • the method further includes: based on the original image, determining a first style feature of the original image; the step of training the second neural network further includes: based on the original image feature , the first style feature and the reference posture change information to train the second neural network.
  • the second neural network can be further trained on the style feature, thereby improving the prediction accuracy of the second neural network on the style feature.
  • the training of the second neural network based on the original image feature, the first style feature and the reference pose change information includes: based on the predicted image feature and the The first style feature is used to determine a third loss; the second neural network to be trained is trained by using the third loss to obtain a trained second neural network.
  • the style features included in the modified predicted image features in the second neural network will be close to the first style features of the original image. Predicting the image features and the first style features to determine the third loss, and using the third loss to train the second neural network, can ensure that the second neural network does not make large adjustments to the style features in the image features, and ensures that the second neural network
  • the correction accuracy of the style feature by the network can improve the detection accuracy of the pose information.
  • the method further includes: based on the calibration image, determining a second style feature of the calibration image; the step of training the second neural network further includes: based on the original image feature , the second style feature and the reference posture change information, and train the second neural network.
  • the second neural network can be trained on the style feature, thereby improving the prediction accuracy of the second neural network on the style feature.
  • the training of the second neural network based on the original image feature, the second style feature and the reference pose change information includes: using the predicted image feature and the The second style feature is used to determine a fourth loss; the second neural network to be trained is trained by using the fourth loss to obtain a trained second neural network.
  • the predicted image features and the second style features are used to determine the fourth loss, and the fourth loss is used to train the second neural network, which can It is ensured that the second neural network does not adjust the style features in the image features to a large extent, and the correction accuracy of the style features by the second neural network is ensured, so that the detection accuracy of the pose information can be improved.
  • the method further includes: controlling the traveling device to drive or issuing prompt information based on the information of the object; the traveling device is equipped with the camera. device.
  • an embodiment of the present disclosure further provides an object detection device, including: an acquisition module, configured to acquire a target image; and a first determination module, configured to determine, based on the target image, that a camera that captures the target image is in attitude change information when shooting the target image; an adjustment module, used for correcting the initial image feature of the target image based on the attitude change information, to obtain the target image feature of the target image; a second determination module, using for determining the information of the object in the target image based on the target image feature.
  • an optional implementation manner of the present disclosure further provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the instructions stored in the memory.
  • machine-readable instructions when the machine-readable instructions are executed by the processor, when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or any possible implementation of the first aspect, is executed steps in the method.
  • an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any of the first aspect, when the computer program is run. steps in one possible implementation.
  • FIG. 1 shows a flowchart of an object detection method provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of detection when the posture of a camera device provided by an embodiment of the present disclosure changes
  • FIG. 3 shows a schematic diagram of a system for object detection with four neural networks provided by an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of a method for training a first neural network provided by an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of a method for training a second neural network provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic flowchart of training a second neural network to be trained according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of an object detection apparatus provided by an embodiment of the present disclosure.
  • FIG. 8 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
  • references herein to "a plurality or several” means two or more.
  • "And/or" which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character "/" generally indicates that the associated objects are an "or" relationship.
  • the monocular 3D target detection technology currently used in the field of automatic driving has a very reliable detection accuracy under the fixed camera coordinate system.
  • it is affected by the flatness and slope of the road surface.
  • the pose of the monocular camera may change when capturing road images during driving, which in turn causes the relationship between the camera coordinate system and the world coordinate system to change.
  • the detection of the object will lead to a decrease in the accuracy of the detection result when the coordinate system is converted, which will further reduce the reliability and accuracy of monocular 3D target detection.
  • the present disclosure provides an object detection method, device, computer equipment and storage medium, which acquires the posture change information of the camera through the captured target image, and corrects the initial image features of the target image based on the posture change information, thereby avoiding The influence of the pose change of the camera device on the image features, that is to say, each target image feature obtained by the correction corresponds to the camera device with the same pose, reducing the influence of the target image by the pose of the camera device, and then using this
  • the target image features are used for object detection, the accuracy and reliability of the detected objects can be improved.
  • CNN Convolutional Neural Network, convolutional neural network
  • CNN is a type of feedforward neural network that includes convolutional computation and has a deep structure, and is a deep learning one of the representatives.
  • the Extrinsic Parameter of the camera device represents the position of the object in the world coordinate system (also called the ground coordinate system), and the parameter of the conversion relationship between the position of the object in the camera device coordinate system. For example, the change parameters of the position and/or attitude required to transform the point from the world coordinate system to the camera coordinate system, etc. When the camera is calibrated, the external parameters of the camera will be calculated.
  • Vanishing points can be applied to road recognition methods, representing the meeting point of parallel road boundaries in an image. By identifying the vanishing point location in the image, the system can recover the boundaries of the two roads.
  • the horizon and vanishing points in the image are often used in depth vision odometry tasks to help determine the vehicle's ego-pose information equivalent to the ground plane.
  • the vanishing point can represent the intersection of the extension lines of lane lines, building boundary lines, etc. in the image, and the vanishing point is located on the horizon.
  • the tilt of the horizon can indicate changes in the camera roll angle, while the vertical movement of the vanishing point can indicate changes in the camera pitch angle.
  • the equipment includes, for example, terminal equipment, servers, automatic driving equipment, assisted driving equipment, and other processing equipment.
  • Digital Assistant, PDA handheld devices, computing devices, in-vehicle devices, wearable devices, personal computers, notebook computers, etc.
  • the object detection method may be implemented by a processor in a computer device calling computer-readable instructions stored in a memory.
  • the object detection method provided by the embodiment of the present disclosure will be described below by taking the execution subject as a computer device as an example.
  • a flowchart of an object detection method may include the following steps S101 to S104.
  • S104 Determine the information of the object in the target image based on the target image feature.
  • the objects in the target image may include vehicles, trees, human bodies, and other objects during the driving of the vehicle.
  • the target image may be a real-time scene image of the road surface captured by a camera installed on the vehicle during the driving of the vehicle.
  • the real-time scene image may be a frame of image in the video, or may be an image taken separately. If the target image is a frame of image in the video, the determined attitude change information of the camera is the attitude change information of the camera at the shooting time corresponding to the frame image.
  • the execution body corresponding to the object detection method may include four neural networks, namely a backbone neural network, a first neural network, a second neural network and a monocular 3D target detection network.
  • the backbone neural network is used to extract the initial image features of the target image
  • the first neural network is used to determine the posture change information of the camera that shoots the target image based on the initial image features
  • the second neural network is used to determine the posture change information according to the posture change information.
  • the initial image features are corrected to obtain the target image features of the target image
  • the monocular 3D target detection network is used to determine the information of the objects in the target image based on the corrected target image features.
  • the application scenarios of the object detection methods provided by the embodiments of the present disclosure are first introduced.
  • bumps may occur due to uneven road surfaces, causing the posture of the camera installed on the vehicle to change compared to the standard posture when capturing the target image, or, it may be caused by the road surface.
  • the change of the slope causes the attitude of the camera device to change compared to the standard attitude when shooting the target image, which in turn will cause the camera coordinate system and the ground coordinate system to deviate when shooting.
  • the standard posture is the posture of the camera device during calibration
  • the coordinate system of the photographing device in the standard posture is the camera coordinate system in the calibration state, which is hereinafter referred to as the calibration coordinate system.
  • the height and depth of field of the photographed object simultaneously have an effect on the position of the object on the image.
  • the vehicle where the camera is located is disturbed by external parameters due to the unevenness of the road, the looseness of the camera, etc., and the object pose detection of the target image based on the calibrated coordinate system will show that a certain key point is in the feature
  • the position shift may be considered to be caused by the change in the depth of field rather than the height of the object, which will reduce the accuracy of the detection results. Serious driving accidents will result. As shown in FIG.
  • the i coordinate system represents the ground coordinate system
  • the j coordinate system represents the camera coordinate system
  • the left image is the driving Illustration of the side view when shooting in the scene
  • the hexagon can represent the autonomous vehicle
  • the trapezoid represents the object in the target image captured by the camera in the autonomous vehicle
  • the dot represents the object detection of the object
  • the ground coordinate system i and the camera coordinate system j deviate, and the position of the object's target detection point in the camera coordinate system also deviates.
  • the image on the right is the heat map output by the monocular 3D target detection network, (Ui, Vi) represent the coordinates of the target detection point in the heat map when the camera is in the standard attitude, (Uj, Vj) represent the The coordinates of the target detection point in the heat map when the posture of the device changes.
  • the embodiments of the present disclosure provide an object detection method, which can correct the characteristics of the target image based on the posture change information of the camera to obtain the characteristics of the target image conforming to the standard posture of the camera, and then based on the target image The feature determines the detection result.
  • the position offset caused by the external parameter disturbance can be corrected, so as to eliminate the influence of the external parameter change in a targeted manner, improve the accuracy and reliability of the detection result, and then improve the application safety of automatic driving technology.
  • the target image captured by the camera
  • the camera may be a monocular camera
  • the target image includes objects to be detected
  • the number of objects may be one or more.
  • the information may include object coordinates, object dimensions, object depth and object orientation angle. In some examples, this information may be represented by a 2-dimensional bounding box and/or a 3-dimensional bounding box.
  • the horizon information in the target image is determined; based on the horizon information, the posture change information of the camera device when shooting the target image is determined.
  • the target image can be input into the backbone neural network, and then the backbone neural network can extract the initial image features of the target image, wherein the initial image features can include content features and style features of the target image, wherein the content features It can reflect the low-dimensional features in the image, and the style features can reflect the high-dimensional features in the image.
  • the content feature can be the contour of the object included in the target image, the position of the edge, etc., and the content feature is closely related to the shooting posture of the monocular camera, and will change according to the change of the shooting posture; the style feature can be the texture of the target image. , material information, etc., are less affected by the shooting posture and remain basically unchanged.
  • the acquired initial image features can be input into the first neural network, wherein the first neural network has been trained and has a certain prediction accuracy, and the first neural network can process the initial image features to determine the target image. Then, according to the determined horizon information, the attitude change information of the camera device when shooting the target image can be determined.
  • the process of determining the attitude change information according to the horizon information may be processed by the first neural network, or may be determined by a computer device based on a preset conversion function, which is not limited here.
  • the horizon information may include position information of the horizon; and the attitude change information may include first rotation angle information of the camera on the horizontal plane. Based on the position information of the horizon, the first rotation angle information of the camera can be determined.
  • the position information of the horizon can be determined by the coordinates of each point included in the horizon in the target image in the image, and further, based on the position information of the horizon, the distance between the horizon in the target image and the horizon in the standard attitude can be determined
  • the first rotation angle information of wherein the horizon under the standard attitude can be calculated by the first neural network; the first rotation angle information can be the flip angle information of the horizon of the target image on the horizontal plane, and the flip angle information can accurately reflect
  • the attitude of the monocular camera when shooting the target image, compared with the standard attitude, the angle change on the horizontal plane, that is, the attitude change information on the horizontal plane can be reflected. Therefore, the first rotation angle information reflects the flip angle information of the camera device on the horizontal plane, and the posture change information of the camera device when capturing the target image includes the first rotation angle information.
  • the method may further include determining vanishing point information in the target image based on the initial image feature of the target image.
  • the vanishing point information includes the position information of the vanishing point
  • the attitude change information includes the second rotation angle information of the camera device on the vertical plane, and further, the vanishing point in the target image and the vanishing point in the standard attitude can be determined.
  • the second rotation angle information between.
  • the position of the vanishing point on the horizon can be determined, and then the coordinate information of the position in the image can be determined, and the The coordinate information is used as the position information of the vanishing point, and then the second rotation angle information of the camera device on the vertical plane can be determined based on the position information of the vanishing point, wherein the second rotation angle information can reflect the attitude of the camera device compared with the standard attitude The pitch angle on the vertical plane, whereby the attitude change information of the camera device also includes second rotation angle information. Further, the second rotation angle information and the first rotation angle information may be used as the posture change information of the camera device. In this way, the attitude change information of the camera device is determined based on the angle change information of the camera device on the horizontal plane and the vertical plane, which improves the accuracy of the determined attitude change information.
  • the position information of the horizon and the position information of the vanishing point in the target image may be used for determination, or only one type of information may be used for determination, which is not limited here.
  • the parameters of the first neural network are obtained after supervised training and optimization based on an image dataset, so the accuracy of the detection of the horizon and vanishing points in the image is high.
  • the first neural network can output the position information of the horizon and/or the position information of the vanishing point.
  • the attitude change information can be input into the second neural network, and the second neural network can correct the initial image features of the target image based on the attitude change information, so as to obtain the target image features of the target image, wherein the target image features are the modified ones.
  • the characteristics are close to the characteristics contained in the image obtained by the photographing device at the photographing position of the target image with a standard attitude.
  • the posture change information includes first rotation angle information and second rotation angle information. According to the first rotation angle information, the correction of the initial image feature on the horizontal plane can be achieved, and then according to the second rotation angle information, the correction of the initial image feature on the horizontal plane on the vertical plane can be achieved. Based on this, The target image features of the target image that have been corrected on both the horizontal and vertical planes can be obtained. In other examples, the initial image features of the horizontal plane and the vertical plane can also be modified at the same time. In still other examples, corrections may be made to the original image features in the horizontal or vertical plane. The present disclosure does not limit the order of corrections and the attitude change information included in the corrections.
  • the target image features are input into the monocular 3D target detection network, and the monocular 3D target detection network can detect each object in the target image based on the target image features, and determine the key points of each object (such as the center point of the object). ) in the coordinates of the calibration coordinate system, and then based on the conversion relationship between the calibration coordinate system and the world coordinate system, the coordinates of each key point are converted to obtain the coordinates of the key points in the feature map. And determine the depth and size information of each object, so as to determine the real position information of each object in the world coordinate system, and use the real position information as the information of each object.
  • the information output by the monocular 3D target detection network may include information such as object coordinates, object size, and object orientation angle.
  • the object size is used to represent the size of the object in the real world
  • the object orientation angle is used to represent the object in the real world. orientation in the real world.
  • the calibration coordinate system is the coordinate system under the attitude of the camera device during the calibration (ie, the standard attitude).
  • FIG. 3 it is a schematic diagram of a system for object detection with four neural networks according to an embodiment of the present disclosure.
  • the input of the backbone neural network 310 is the target image 301 and the output is the initial image feature 311 .
  • the first neural network 320 may include a regression network, the input is the initial image feature 311 , and the output is the posture change information 321 of the camera that captures the target image 301 .
  • the second neural network 330 may include a transfer network, the input is the initial image feature 311 and the pose change information 321 , and the output is the target image feature 331 .
  • the input of the monocular 3D object detection network 340 (monocular 3D detection network) is the object feature 331, and the output 3D result 341 is, for example, the 3D bounding box of the detected object.
  • the embodiment of the present disclosure also provides a method for performing a partial neural network
  • the training method when specifically implemented, the backbone neural network and the monocular 3D target detection network can be existing neural networks, such as convolutional neural networks, recurrent neural networks, multilayer perceptrons, and the like.
  • the monocular 3D object detection network can be an Anchor-Free (no anchor frame required) detection network.
  • the first neural network and the second neural network are the unique neural networks provided by the embodiments of the present disclosure, and need to be trained to achieve the expected detection effect. Therefore, the following describes the training process of the first neural network and the second neural network respectively. for a detailed introduction.
  • a flowchart of a method for training a first neural network may include the following steps S401 to S404.
  • S402 Input the first sample image into the first neural network to be trained to obtain predicted horizon information and predicted vanishing point information.
  • S403 Determine the first loss based on the marked horizon information and the predicted horizon information and the marked vanishing point information and the predicted vanishing point information.
  • S404 Use the first loss to train the first neural network to be trained to obtain the trained first neural network.
  • the first training sample includes sample initial features of the first sample image, labeled horizon information in the first sample image, and labeled vanishing point information in the first sample image.
  • the first sample image may be an image captured by the camera after the posture is changed.
  • the first sample image is processed through the backbone neural network to obtain sample initial features of the first sample image.
  • the predicted horizon information is the horizon information in the first sample image predicted and output by the first neural network based on the initial characteristics of the sample, and the horizon information is marked as the standard sample image captured by the camera using the standard attitude at the position where the first sample image was captured. Horizon information in .
  • the predicted vanishing point information is the vanishing point information in the first sample image predicted and output by the first neural network
  • the marked vanishing point information is the standard sample image captured by the camera at the position where the first sample image was captured, using the standard attitude. vanishing point information.
  • the backbone neural network needs to be used for processing first to obtain the sample initial features of the first sample image in the first training sample, where the sample initial features may correspond to the sample initial feature maps, that is, The backbone neural network can output a sample initial feature map. Then, the initial feature map of the sample is input into the first neural network to be trained. Based on the initial feature map of the sample, the first neural network to be trained can determine the predicted horizon information and predicted vanishing point information in the first sample image.
  • the horizon information may include location information of the predicted horizon
  • the predicted vanishing point information may include location information of the predicted vanishing point.
  • the position information of the marked horizon and the position information of the marked vanishing point can be determined.
  • the labeling horizon information may be directly input, or the standard sample image may be input into the backbone neural network, and determined based on the standard sample feature map output by the backbone neural network. The method of determining the labeling horizon information is not discussed here. be limited.
  • the first loss can be calculated according to the position information of the predicted horizon, the position information of the predicted vanishing point and the corresponding position information of the marked horizon and the position information of the marked vanishing point, wherein the first loss can be the constructed first loss value of the function, and then use the first loss to train the first neural network to be trained.
  • the trained first neural network After performing multiple rounds of training on the first neural network using multiple first training samples, the trained first neural network can be obtained, and the trained first neural network can output relatively accurate posture change information during the application process.
  • the first loss can be expressed by formula 1 to indicate the predicted horizon information and the predicted vanishing point information, and formula 2 can be used to construct the first loss.
  • H j represents the initial feature map of the sample output by the backbone neural network.
  • L vo represents the first loss
  • represents the L1 norm
  • A represents the labeling matrix composed of the position information of the horizon and the position information of the vanishing point, which reflects the pose change information
  • g means that it can be A transformation function that converts the position information of the predicted horizon and the predicted vanishing point into a prediction matrix.
  • the first loss represents the Manhattan distance between the annotated pose change information and the predicted pose change information.
  • the first neural network to be trained can determine the predicted attitude change information of the camera according to the position information of the predicted horizon and the predicted vanishing point, and the corresponding position information of the marked horizon and the position of the marked vanishing point.
  • the position information deviation between the position information of the predicted horizon and the position information of the marked horizon can be used to determine the flip angle information of the horizon on the horizontal plane, that is, to determine the first rotation angle information of the camera on the horizontal plane;
  • the position information deviation between the position information of the vanishing point and the position information of the marked vanishing point is determined, and the pitch angle information of the horizon on the vertical plane is determined, that is, the second rotation angle information of the camera device on the vertical plane is determined, and then, based on The first rotation angle information and the second rotation angle information can determine the predicted posture change information of the camera device.
  • the first neural network is trained by using the first loss determined by the labeled horizon information and predicted horizon information, and the labeled vanishing point information and predicted vanishing point information, which can ensure that the trained first neural network can determine more accurate horizon information. and vanishing point information, and then use the horizon information and vanishing point information to obtain more accurate attitude change information.
  • a flowchart of a method for training a second neural network may include the following steps S501 to S504.
  • the second training sample includes the original image, the calibration image, and the reference attitude change information of the camera that captured the original image
  • the original image is the image captured by the camera when the attitude changes
  • the attitude of the camera corresponding to the calibration image is
  • the standard posture that is, the calibration image is the image taken by the camera device using the standard posture at the position where the original image was taken
  • the reference posture change information is the use of the backbone neural network and the first neural network to detect the original image, and the determined camera device is shooting.
  • Pose change information at the time of the original image can be manually set, and the original image can be captured by the camera after adjusting the posture according to the posture change.
  • the reference posture change information can be determined according to the posture change.
  • the calibration image can also be obtained by modifying the original image according to the determined reference attitude change information.
  • the original image and the calibration image therein can be input into the backbone neural network, and the image features in the original image can be extracted by using the backbone neural network to obtain the original image features, wherein the original image features It can include the first content feature and the first style feature corresponding to the original image, the first content feature includes the object outline and the position of the edge in the original image, and the first style feature includes the texture and material information of the original image;
  • the network extracts the calibration image features in the calibration image.
  • the calibration image features also include second content features and second style features corresponding to the calibration image.
  • the second content features include the object contour and the position of the edge in the calibration image.
  • the second style feature Includes texture and material information for calibration images.
  • the second neural network to be trained can be trained according to the following steps.
  • the neural network is trained to obtain a second neural network that has been trained.
  • the second loss is the content loss Lcontent between the predicted image feature and the second content feature corresponding to the calibration image by the second neural network to be trained.
  • the second loss may be the value of the constructed second loss function
  • the original image feature may correspond to the original image feature map
  • the second neural network to be trained may be represented as a transformation neural network ft .
  • the original image feature map H in is processed, and the corrected predicted image feature H out is output, and the output predicted image feature H out has a certain deviation relative to the second content feature H content ;
  • the second content feature H content determines the second loss.
  • the backbone neural network can be based on the reference pose change information and the original image Xj, the second content feature H content is determined.
  • the second content feature H content may be determined according to formula 3 .
  • f b represents the backbone neural network
  • X j represents the original image
  • the predicted image feature H out and the second content feature can be input into the loss computing neural network , then the loss is calculated by the neural network According to the predicted image feature H out and the second content feature H content , the second loss generated by the transformation neural network ft during the process of correcting the original image feature can be constructed.
  • the size of the feature map corresponding to the predicted image feature H out output by the transform neural network ft and the feature map of the second content feature H content is ( c m , h m , w m ), and the loss is calculated by the neural network
  • the activation function of the mth layer in For example, the second loss Lcontent may be determined by the squared Euclidean distance (formula 4) between the feature map corresponding to the predicted image feature H out and the feature map of the second content feature H content .
  • the second loss Lcontent can be determined based on the above formula 4, and then the second neural network to be trained is trained based on the second loss Lcontent. After performing multiple rounds of training on the second neural network using multiple second training samples, a trained second neural network can be obtained, and the predicted image features output by the trained neural network can be close to the calibration image features corresponding to the calibration image.
  • the third loss Lstyle can also be determined, and the third loss Lstyle and the second loss Lcontent are used together.
  • the third loss Lstyle may be determined according to the following steps, and the second neural network may be trained based on the third loss Lstyle.
  • the neural network is trained to obtain a second neural network that has been trained.
  • the first style feature H style included in the original image feature map H in can be directly used for determination.
  • the included first style feature H style can be extracted from the original image features, and then the first style feature H style and the predicted image feature H out can be input into the loss computing neural network In, and in turn, the loss computation neural network By processing the first style feature H stylt and the predicted image feature H out , a third loss Lstyle between the first style feature H style and the predicted image feature H out is constructed.
  • feature similarity information can be used Gram matrix (Gram matrix) express.
  • the size of the Gram matrix is (c m ⁇ c m )
  • the loss is calculated by the neural network.
  • the activation function of the mth layer in For example, for the predicted image feature H out or the first style feature H style , the feature similarity information on the mth layer can be determined according to formula 5, and formula 5 is shown in the following formula.
  • H represents the predicted image feature H out or the first style feature H style
  • c and c' represent different channels in the same feature map, Used to represent the activation function of different channels in the same feature map in the m layer
  • the feature similarity information on , c m is the feature map in the loss calculation neural network
  • the channel number information on the mth layer, h m is the feature map in the loss calculation neural network
  • w m is the feature map in the loss calculation neural network
  • the feature similarity information of the predicted image feature H out on the m layer can be determined
  • the feature similarity information on the m layer with the first style feature H style Further, the first method for training the second neural network may be determined based on the squared Frobenius norm between the two feature similarity information of the predicted image feature H out and the first style feature H style .
  • the third loss Lstyle during specific implementation, can be determined according to formula 6.
  • the Frobenius norm is a matrix norm that measures the difference between two matrices.
  • the second neural network to be trained can be trained by using the second loss Lcontent and the third loss Lstyle together, to obtain a trained second neural network.
  • the hyperparameters for tuning the second and third losses are determined according to the second loss Lcontent and the third loss Lstyle, where ⁇ 1 and ⁇ 2 are determined during the training of the second neural network.
  • the style features included in the modified predicted image features in the second neural network will be close to the first style features in the original image features of the original image.
  • use the first style feature of the predicted image feature and the original image feature to determine the third loss, and use the third loss to train the second neural network, which can ensure that the second neural network does not compare the style features in the image features.
  • the large-scale adjustment ensures the correction accuracy of the second neural network to the style feature, thereby improving the detection accuracy of subsequent targets.
  • the second neural network can also be trained by using the second style feature corresponding to the calibration image.
  • the third The method of loss is to determine the fourth loss between the predicted image feature H out and the second style feature, and then use the fourth loss and the second loss to train the second neural network to be trained to obtain the trained second neural network.
  • the style features included in the modified predicted image features will be close to the second style features of the calibration image.
  • the predicted image features and the second style features are used to determine the fourth loss, and the fourth loss is used to train the second neural network.
  • the network can ensure that the second neural network does not adjust the style features in the image features to a large extent, and ensures the correction accuracy of the style features by the second neural network, thereby improving the detection accuracy of subsequent targets.
  • the second loss can be used to train it, the second loss and the third loss can also be used to train it, and the second loss and the third loss can also be used to train it.
  • Four losses are used to train it, which is not limited here.
  • the first neural network to be trained and the second neural network to be trained can be trained respectively, and the first neural network to be trained and the second neural network to be trained can be trained first.
  • the losses of the two neural networks reach the preset convergence value
  • the first neural network to be trained and the second neural network to be trained are jointly trained to obtain the trained first neural network and the trained second neural network .
  • the first neural network to be trained and the second neural network to be trained may also be trained separately to obtain the trained first neural network and the trained second neural network, which are not limited here.
  • FIG. 6 a schematic flowchart of training a second neural network to be trained provided by an embodiment of the present disclosure, wherein image A represents an original image, image B represents a calibration image, and backbone represents a backbone neural network.
  • the backbone neural network processes the image A to obtain the original image feature H in , and processes the image B to obtain the second content feature H content corresponding to the calibration image.
  • the original image feature H in and the reference pose change information are input into the transformation neural network, and the transformation neural network can output the modified predicted image feature H out .
  • the loss calculation neural network can calculate the second loss Lcontent based on the predicted image feature H out and the second content feature H content , and calculate the first style feature H style based on the predicted image feature H out and the first style feature H style included in the original image feature H in .
  • the traveling device can be controlled to drive or send out prompt information, wherein the traveling device is equipped with a camera device.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • the embodiment of the present disclosure also provides an object detection device corresponding to the object detection method. Reference may be made to the implementation of the method, and repeated descriptions will not be repeated.
  • a schematic diagram of an object detection apparatus includes: an acquisition module 701 for acquiring a target image; and a first determination module 702 for determining a shooting location based on the target image
  • the posture change information of the camera device of the target image when shooting the target image includes: the adjustment module 703 for correct the initial image feature of the target image based on the posture change information, and obtain the target image of the target image. feature; the second determination module 704 is configured to determine the information of the object in the target image based on the target image feature.
  • the first determining module 702 is configured to determine horizon information in the target image based on the initial image feature of the target image; and determine the horizon information based on the horizon information. Posture change information when the camera device captures the target image.
  • the horizon information includes the position information of the horizon; the attitude change information includes the first rotation angle information of the camera device on the horizontal plane; the first determining module 702 is configured to based on The position information of the horizon determines the first rotation angle information of the camera device.
  • the posture change information includes second rotation angle information of the camera on the vertical plane;
  • the first determining module 702 is configured to, based on the initial image of the target image image features, to determine vanishing point information in the target image; based on the vanishing point information, to determine the second rotation angle information of the camera device.
  • the second determining module 704 is configured to, based on the target image feature, determine the information of the object in the target image under the calibration coordinate system; based on the calibration coordinate system and the world The conversion relationship between coordinate systems and the information of the object in the calibration coordinate system determine the information of the object in the world coordinate system.
  • the posture change information is determined by using a first neural network.
  • the apparatus further includes a first training module 705, configured to acquire a first training sample; the first training sample includes the sample initial features of the first sample image, the first sample The marked horizon information in this image and the marked vanishing point information in the first sample image; the first sample image is input into the first neural network to be trained to obtain predicted horizon information and predicted vanishing point information; based on The marked horizon information, the predicted horizon information, the marked vanishing point information and the predicted vanishing point information, determine a first loss; use the first loss to train the first neural network to be trained, Get the first neural network that is trained.
  • a first training module 705 configured to acquire a first training sample
  • the first training sample includes the sample initial features of the first sample image, the first sample The marked horizon information in this image and the marked vanishing point information in the first sample image
  • the first sample image is input into the first neural network to be trained to obtain predicted horizon information and predicted vanishing point information; based on The marked horizon information, the predicted horizon information, the marked
  • the target image feature is determined using a second neural network.
  • the apparatus further includes a second training module 706 for acquiring a second training sample;
  • the second training sample includes an original image, a calibration image, and a Refer to the posture change information;
  • the posture of the camera corresponding to the calibration image is a standard posture; extract the image features in the original image to obtain the original image features, and the original image features include a first content feature and a first style feature,
  • the first content feature includes an object outline and a position of an edge in the original image, and the first style feature includes texture and material information of the original image;
  • the calibration of the calibration image is determined Image features
  • the calibration image features include a second content feature and a second style feature, the second content feature includes an object outline and the position of an edge in the calibration image, and the second style feature includes the calibration image
  • the texture and material information; based on the original image features, the calibration image features and the reference pose change information, the second neural network is trained.
  • the second training module 706 is configured to use the predicted image feature and the second content feature to determine a second loss;
  • the second neural network is trained, and the trained second neural network is obtained.
  • the second training module 706 determines, based on the original image, a first style feature of the original image; based on the original image feature, the first style feature and the With reference to the posture change information, the second neural network is trained.
  • the second training module 706 is configured to determine a third loss based on the predicted image feature and the first style feature; The second neural network is trained, and the trained second neural network is obtained.
  • the second training module 706 is further configured to determine a second style feature of the calibration image based on the calibration image; based on the original image feature, the second style feature and the reference pose change information to train the second neural network.
  • the second training module 706 is configured to use the predicted image feature and the second style feature to determine a fourth loss;
  • the second neural network is trained, and the trained second neural network is obtained.
  • the device further includes a control module 707, configured to control the traveling device based on the information of the object after the second determination module 704 determines the information of the object in the target image Driving or sending out prompt information; the driving device is equipped with the camera device.
  • a control module 707 configured to control the traveling device based on the information of the object after the second determination module 704 determines the information of the object in the target image Driving or sending out prompt information; the driving device is equipped with the camera device.
  • An embodiment of the present disclosure also provides a computer device. As shown in FIG. 8 , a schematic structural diagram of a computer device provided by an embodiment of the present disclosure includes:
  • the processor 81 may perform the steps of any object detection method of the embodiments of the present disclosure.
  • the above-mentioned memory 82 includes a memory 821 and an external memory 822; the memory 821 here is also called an internal memory, which is used to temporarily store the operation data in the processor 81 and the data exchanged with the external memory 822 such as the hard disk.
  • the external memory 822 performs data exchange.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the object detection method described in the above method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the computer program product of the object detection method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the steps of the object detection methods described in the above method embodiments, For details, reference may be made to the foregoing method embodiments, which will not be repeated here.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种对象检测方法、装置、计算机设备和存储介质,其中,本公开实施例通过拍摄的目标图像获取摄像装置的姿态变化信息,进而利用姿态变化信息对目标图像的初始图像特征进行修正,实现了修正得到的目标图像特征与标准姿态下的摄像装置所拍摄的图像的图像特征基本一致,继而利用该目标图像特征进行对象检测,能够提高检测得到的信息的准确性和可靠性。

Description

一种对象检测方法、装置、计算机设备及存储介质
相关公开的交叉引用
本公开要求于2021年1月18日提交的、申请号为202110063318.4的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术领域,具体而言,涉及一种对象检测方法、装置、计算机设备和存储介质。
背景技术
目前应用于自动驾驶领域中的单目3D(3-Dimension,三维)目标检测技术,在固定的相机坐标系下已经具有十分可靠的检测精度,但是,在自动驾驶的实际应用中,受路面的平坦度以及坡度的影响,单目相机的姿态在拍摄行驶过程中的路面图像时可能发生改变,进而引起相机坐标系与世界坐标系之间关系的改变。
发明内容
本公开实施例至少提供一种对象检测方法、装置、计算机设备和存储介质。
第一方面,本公开实施例提供了一种对象检测方法,包括:获取目标图像;基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息;基于所述姿态变化信息对所述目标图像的初始图像特征进行修正,得到所述目标图像的目标图像特征;基于所述目标图像特征,确定所述目标图像中的对象的信息。
通过拍摄的目标图像获取摄像装置的姿态变化信息,基于姿态变化信息对目标图像的初始图像特征进行修正,从而避免了摄像装置的位姿变化对图像特征的影响,也就是说,修正得到的各个目标图像特征都对应于同一位姿的摄像装置,减少目标图像受摄像装置的位姿的影响,继而在利用该目标图像特征进行对象检测时,能够提高对象检测的准确性和可靠性。
在一种可能的实施方式中,所述基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息,包括:基于所述目标图像的初始图像特征,确定所述目标图像中的地平线信息;基于所述地平线信息,确定所述摄像装置在拍摄所述目标图像时的姿态变化信息。
利用地平线信息能够较为准确地确定摄像装置的姿态变化信息,利用该姿态变化信息能够提高对象检测的准确性。
在一种可能的实施方式中,所述地平线信息包括地平线的位置信息;所述姿态变化信息包括所述摄像装置在水平面上的第一旋转角度信息;所述基于所述地平线信息,确定所述摄像装置在拍摄所述目标图像时的姿态变化信息,包括:基于所述地平线的位置信息,确定所述摄像装置的所述第一旋转角度信息。
这样,基于目标图像中地平线的位置信息能够较为准确的确定摄像装置在水平面上的角度变化。
在一种可能的实施方式中,所述姿态变化信息包括所述摄像装置在竖直平面上的第二旋转角度信息;所述基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息,还包括:基于所述目标图像的所述初始图像特征,确定所述目标图像中的消失点信息;基于所述消失点信息,确定所述摄像装置在拍摄所 述目标图像时的所述第二旋转角度信息。
这样,基于目标图像中消失点信息,能够较为准确地确定摄像装置在竖直平面上的角度变化。
在一种可能的实施方式中,所述基于所述目标图像特征,确定所述目标图像中的对象的信息,包括:基于所述目标图像特征,确定所述目标图像中的对象在标定坐标系下的信息;基于所述标定坐标系和世界坐标系之间的转换关系、所述对象在所述标定坐标系下的信息,确定所述对象在所述世界坐标系中的信息。
这样,基于转换关系,能够较为准确地将目标图像中的对象在标定坐标系下的信息转换到世界坐标系中,得到对象在世界坐标系中的信息。
在一种可能的实施方式中,所述姿态变化信息利用第一神经网络确定。
在一种可能的实施方式中,所述第一神经网络采用以下步骤训练得到:获取第一训练样本;所述第一训练样本包括第一样本图像的样本初始特征、所述第一样本图像中的标注地平线信息和所述第一样本图像中的标注消失点信息;将所述第一样本图像输入待训练的第一神经网络,得到预测地平线信息和预测消失点信息;基于所述标注地平线信息和所述预测地平线信息以及所述标注消失点信息和所述预测消失点信息,确定第一损失;利用所述第一损失对所述待训练的第一神经网络进行训练,得到训练完成的第一神经网络。
这样,利用通过标注地平线信息和预测地平线信息以及标注消失点信息和预测消失点信息确定的第一损失对第一神经网络进行训练,能够保证训练得到的第一神经网络能够确定较为准确的地平线信息和消失点信息,从而能够得到较为准确的姿态变化信息。
在一种可能的实施方式中,所述目标图像特征利用第二神经网络确定。
在一种可能的实施方式中,所述第二神经网络采用以下步骤训练得到:获取第二训练样本;所述第二训练样本包括原始图像、校准图像以及拍摄所述原始图像的摄像装置的参考姿态变化信息;所述校准图像对应的摄像装置的姿态为标准姿态;提取所述原始图像中的图像特征,得到原始图像特征,其中,所述原始图像特征包括第一内容特征和第一风格特征,所述第一内容特征包括所述原始图像中的对象轮廓、边线的位置,所述第一风格特征包括所述原始图像的纹理和材质信息;基于所述校准图像,确定所述校准图像的图像内容特征,所述校准图像特征包括第二内容特征和第二风格特征,所述第二内容特征包括所述校准图像中的对象轮廓、边线的位置,所述第二风格特征包括所述校准图像的纹理和材质信息;基于所述原始图像特征、所述校准图像特征和所述参考姿态变化信息,训练所述第二神经网络。
由于图像特征中的内容特征,受摄像装置姿态变化的影响较大,因此利用标准姿态下的摄像装置拍摄得到的校准图像中的第二内容特征,以及摄像装置的参考姿态变化信息等来训练第二神经网络,不仅能够保证训练得到的第二神经网络能够准确的确定摄像装置的姿态变化信息,还能够降低训练所用的数据量,提高训练效率。
在一种可能的实施方式中,所述基于所述原始图像特征、所述校准图像特征和所述参考姿态变化信息,训练所述第二神经网络,包括:将所述原始图像特征和所述参考姿态变化信息输入待训练的第二神经网络,得到修正后的预测图像特征;利用所述预测图像特征和所述第二内容特征,确定第二损失;利用所述第二损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
由于修正后的预测图像特征所包含的内容特征会贴近校准图像的第二内容特征, 因此利用预测图像特征与校准图像对应的第二内容特征来确定第二损失,并用第二损失来训练第二神经网络,能够提高第二神经网络对图像特征中的内容特征的修正能力,得到与标准姿态相符的内容特征,从而能够提高位姿信息的检测精度。
在一种可能的实施方式中,所述方法还包括:基于所述原始图像,确定所述原始图像的第一风格特征;所述训练第二神经网络的步骤还包括:基于所述原始图像特征、所述第一风格特征和所述参考姿态变化信息,训练所述第二神经网络。
这样,能够实现在风格特征上对第二神经网络进行进一步地训练,进而可以提高第二神经网络在风格特征上的预测精度。
在一种可能的实施方式中,所述基于所述原始图像特征、所述第一风格特征和所述参考姿态变化信息,训练所述第二神经网络,包括:基于所述预测图像特征和所述第一风格特征,确定第三损失;利用所述第三损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
由于图像特征中的风格特征,受摄像装置姿态变化的影响不大,因此第二神经网络中,修正后的预测图像特征所包含的风格特征会贴近原始图像的第一风格特征,进一步的,利用预测图像特征与第一风格特征来确定第三损失,并用第三损失来训练第二神经网络,能够保证第二神经网络对图像特征中的风格特征不进行较大幅度的调整,保证第二神经网络对风格特征的修正精度,从而能够提高位姿信息的检测精度。
在一种可能的实施方式中,所述方法还包括:基于所述校准图像,确定所述校准图像的第二风格特征;所述训练第二神经网络的步骤还包括:基于所述原始图像特征、所述第二风格特征和所述参考姿态变化信息,训练所述第二神经网络。
这样,能够实现在风格特征上对第二神经网络进行训练,进而可以提高第二神经网络在风格特征上的预测精度。
在一种可能的实施方式中,所述基于所述原始图像特征、所述第二风格特征和所述参考姿态变化信息,训练所述第二神经网络,包括:利用所述预测图像特征和所述第二风格特征,确定第四损失;利用所述第四损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
由于修正后的预测图像特征所包含的风格特征会贴近校准图像的第二风格特征,因此利用预测图像特征与第二风格特征来确定第四损失,并用第四损失来训练第二神经网络,能够保证第二神经网络对图像特征中的风格特征不进行较大幅度的调整,保证第二神经网络对风格特征的修正精度,从而能够提高位姿信息的检测精度。
在一种可能的实施方式中,在确定所述目标图像中的对象的信息之后,还包括:基于所述对象的信息,控制行驶装置行驶或发出提示信息;所述行驶装置安装有所述摄像装置。
基于对行驶装置的控制,能够实现行驶装置行驶的过程中,准确的规避对象或给出提示信息,以提高自动驾驶以及对象的安全性,或者,通过发出提示信息的方式,对驾驶所述行驶装置的司机提出预警。
第二方面,本公开实施例还提供一种对象检测装置,包括:获取模块,用于获取目标图像;第一确定模块,用于基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息;调整模块,用于基于所述姿态变化信息对所述目标图像的初始图像特征进行修正,得到所述目标图像的目标图像特征;第二确定模块,用于基于所述目标图像特征,确定所述目标图像中的对象的信息。
第三方面,本公开可选实现方式还提供一种计算机设备,处理器、存储器,所 述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第四方面,本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
关于上述对象检测装置、计算机设备、及计算机可读存储介质的效果描述参见上述对象检测方法的说明,这里不再赘述。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种对象检测方法的流程图;
图2示出了本公开实施例所提供的一种摄像装置的姿态发生变化时的检测示意图;
图3示出了本公开实施例所提供的一种四个神经网络进行对象检测的***示意图;
图4示出了本公开实施例所提供的一种训练第一神经网络的方法的流程图;
图5示出了本公开实施例所提供的一种训练第二神经网络的方法的流程图;
图6示出了本公开实施例所提供的一种对待训练的第二神经网络进行训练的流程示意图;
图7示出了本公开实施例所提供的一种对象检测装置的示意图;
图8示出了本公开实施例所提供的一种计算机设备结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
另外,本公开实施例中的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示 或描述的内容以外的顺序实施。
在本文中提及的“多个或者若干个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
目前应用于自动驾驶领域中的单目3D目标检测技术,在固定的相机坐标系下已经具有十分可靠的检测精度,但是,在自动驾驶的实际应用中,受路面的平坦度以及坡度的影响,单目相机的姿态在拍摄行驶过程中的路面图像时可能发生改变,进而引起相机坐标系与世界坐标系之间关系的改变。在这种情况下对物体进行检测,在坐标系转换时会导致检测结果精度的下降,进而将产生单目3D目标检测的可靠性和精准性降低的问题。
基于此,本公开提供了一种对象检测方法、装置、计算机设备和存储介质,通过拍摄的目标图像获取摄像装置的姿态变化信息,基于姿态变化信息对目标图像的初始图像特征进行修正,从而避免了摄像装置的位姿变化对图像特征的影响,也就是说,修正得到的各个目标图像特征都对应于同一位姿的摄像装置,减少目标图像受摄像装置的位姿的影响,继而在利用该目标图像特征进行对象检测时,能够提高检测得到的对象的准确性和可靠性。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
需要说明的是,本公开实施例中所提到的特定名词包括:CNN(Convolutional Neural Network,卷积神经网络)是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表之一。
摄像装置的外参(Extrinsic Parameter)表示物体在世界坐标系(也称地面坐标系)中所处的位置,相对于该物体在摄像装置坐标系中所处的位置之间的转换关系的参数。例如,将点从世界坐标系转换到相机坐标系所需的位置和/或姿态的变化参数等。在进行摄像装置标定时,会计算得到该摄像装置的外参。
在透视画中,平行线的延长线看起来会聚在一起的一点或几点,这种点称为消失点(Vanishing Point)。消失点可以应用于道路识别方法,表示图像中平行道路边界的汇合点。通过识别图像中的消失点位置,***可以恢复两条道路的边界。
图像中的地平线和消失点在深度视觉测距任务中经常被用于帮助确定车辆相当于地面平面的自我位置信息(ego-pose information)。消失点可以表示图像中车道线、建筑边界线等的延长线的交点,消失点位于地平线上。地平线的倾斜可以表示相机滚动角(roll angle)的变化,而消失点的垂直移动可以表明相机俯仰角(pitch angle)的变化。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种对象检测方法进行详细介绍,本公开实施例所提供的对象检测方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备、服务器、自动驾驶设备、辅助驾驶设备、其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备、个人电脑、笔记本电脑等。在一些可能的实现方式中,该对象检测方法可以通过计算机设备中的处理器调用存储器中存储的计算机可读指令的方式来实现。
下面以执行主体为计算机设备为例对本公开实施例提供的对象检测方法加以说 明。
如图1所示,为本公开实施例提供的一种对象检测方法的流程图,可以包括以下步骤S101至S104。
S101:获取目标图像。
S102:基于目标图像,确定拍摄目标图像的摄像装置在拍摄目标图像时的姿态变化信息。
S103:基于姿态变化信息对目标图像的初始图像特征进行修正,得到目标图像的目标图像特征。
S104:基于目标图像特征,确定目标图像中的对象的信息。
这里,目标图像中的对象可以包括车辆、树木、人体和车辆行驶过程中的其他物体等,目标图像可以是在车辆行驶过程中,安装在车辆上的摄像装置拍摄的路面的实时场景图像。该实时场景图像可以是视频中的一帧图像,也可以是单独拍摄的一幅图像。若目标图像是视频中的一帧图像,则所确定的摄像装置的姿态变化信息为该帧图像对应的拍摄时刻下该摄像装置的姿态变化信息。该对象检测方法对应的执行主体可以包括四个神经网络,分别为骨干神经网络、第一神经网络、第二神经网络和单目3D目标检测网络。
其中,骨干神经网络用于提取目标图像的初始图像特征,第一神经网络用于基于初始图像特征,确定拍摄目标图像的摄像装置的姿态变化信息,第二神经网络用于按照姿态变化信息,对初始图像特征进行修正,得到目标图像的目标图像特征,单目3D目标检测网络用于基于修正得到的目标图像特征确定目标图像中的对象的信息。
具体实施时,为了便于对本公开实施例进行理解,首先对本公开实施例所提供的对象检测方法的应用场景进行介绍。在自动驾驶技术领域,车辆在实际行驶过程中,可能会由于路面不平而产生颠簸,造成安装在车辆上的摄像装置拍摄目标图像时的姿态相较于标准姿态发生变化,或者,可能会由于路面坡度的变化,造成摄像装置拍摄目标图像时的姿态相较于标准姿态发生变化,进而,将会导致拍摄时的相机坐标系和地面坐标系出现偏差。其中,标准姿态为摄像装置在标定时的姿态,拍摄装置在标准姿态下的坐标系为标定状态下的相机坐标系,后文简称为标定坐标系。
对于组成摄像装置的单目相机而言,所拍摄对象的高度和景深会同时对该对象在图像上的位置产生影响。在车辆行驶过程中,摄像装置所在的车辆由于道路的不平整、相机发生松动等原因,发生了外参扰动,基于标定坐标系对目标图像进行对象位姿检测,会出现某一关键点在特征图上的位置发生偏移的情况。而位置偏移可能会被认为是由该对象在景深上而非在高度上的变化引起的,将会造成检测结果的精度降低的问题,进而,检测结果精度的降低,在自动驾驶过程中可能会造成严重的行驶事故。如图2所示,为本公开实施例所提供的一种摄像装置的姿态发生变化时的检测示意图,其中,i坐标系表示地面坐标系,j坐标系表示相机坐标系,左侧图像为行驶场景中进行拍摄时的侧面图(side view)的示意,六边形可以表示自动驾驶车辆,梯形表示自动驾驶车辆中的摄像装置所拍摄的目标图像中的对象;圆点表示该对象的目标检测点(如关键点)在相机坐标系或地面坐标系中的位置。从左上角的侧面图可以看出,在车辆正常行驶时,地面坐标系与相机坐标系重合,该对象的目标检测点在相机坐标系和地面坐标系中重合。从左下角的侧面图可以看出,在摄像装置的姿态发生变化时,地面坐标系i与相机坐标系j出现偏差,该对象的目标检测点在相机坐标系中的位置也产生了偏移。右侧图像为单目3D目标检测网络输出的热力图(heat map),(Ui,Vi)表示摄像装置在标准姿态下时目标检测点在热力图中的坐标,(Uj,Vj)表示在摄像装置的姿态发生变化时目标 检测点在热力图中的坐标。
为了解决上述问题,本公开实施例提供了一种对象检测方法,能够基于摄像装置的姿态变化信息对目标图像的特征进行修正,得到符合摄像装置的标准姿态的目标图像特征,然后再基于目标图像特征确定检测结果,这样,能够修正由外参扰动引起的位置偏移,从而有针对性的消除外参变化的影响、提升检测结果的精度和可靠性,进而可以提高自动驾驶技术的应用安全性。
具体实施时,首先需要获取摄像装置拍摄的目标图像,其中,摄像装置可以是单目相机,目标图像中包括待检测的对象,对象的数目可以是一个或多个,每一个对象在所拍摄实时场景中包括不同的信息,例如,该信息可以包括对象坐标、对象尺寸、对象深度和对象朝向角。在一些例子中,该信息可以用2维边界框和/或3维边界框表示。在获取目标图像之后,可以按照以下步骤确定姿态变化信息。
基于目标图像的初始图像特征,确定目标图像中的地平线信息;基于地平线信息,确定摄像装置在拍摄目标图像时的姿态变化信息。
这里,在获取目标图像之后,可以将目标图像输入骨干神经网络,然后骨干神经网络可以提取出目标图像的初始图像特征,其中,初始图像特征可以包括目标图像的内容特征和风格特征,其中内容特征可以反映图像中的低维特征,风格特征可以反映图像中的高维特征。例如,内容特征可以为目标图像中包括的对象的轮廓、边线的位置等,并且内容特征与单目相机的拍摄姿态紧密相关,会根据拍摄姿态的变化而变化;风格特征可以为目标图像的纹理、材质信息等,受拍摄姿态的影响较小,基本保持不变。
进一步的,可以将获取的初始图像特征输入第一神经网络,其中,第一神经网络已经被训练完成,具有一定的预测精准度,第一神经网络可以对初始图像特征进行处理,确定目标图像中的地平线信息,然后根据确定的地平线信息,可以确定出摄像装置在拍摄目标图像时的姿态变化信息。具体实施时,根据地平线信息确定姿态变化信息的过程可以是第一神经网络进行处理的,也可以是计算机设备基于预设的转化函数确定的,这里不进行限定。
在一种实施方式中,地平线信息可以包括地平线的位置信息;姿态变化信息可以包括摄像装置在水平面上的第一旋转角度信息。基于地平线的位置信息,可以确定摄像装置的第一旋转角度信息。
具体实施时,地平线的位置信息可以由目标图像中的地平线所包括的各个点在图像中的坐标确定,进而,基于地平线的位置信息,可以确定目标图像中的地平线与标准姿态下的地平线之间的第一旋转角度信息,其中,标准姿态下的地平线可以由第一神经网络推算得到;第一旋转角度信息可以为目标图像的地平线在水平面上的翻转角度信息,该翻转角度信息可以准确的反映单目相机在拍摄目标图像时的姿态,与标准姿态相比在水平面上的角度变化,即可以反映出在水平面上的姿态变化信息。因此,第一旋转角度信息反映了摄像装置在水平面上的翻转角度信息,摄像装置在拍摄目标图像时的姿态变化信息包括第一旋转角度信息。
在另一种实施方式中,为了更好的反映摄像装置的姿态变化信息,所述方法还可以包括基于所述目标图像的所述初始图像特征,确定所述目标图像中的消失点信息。其中消失点信息包括消失点的位置信息,相应的,姿态变化信息包括摄像装置在竖直平面上的第二旋转角度信息,进而,可以确定出目标图像中的消失点与标准姿态下的消失点之间的第二旋转角度信息。
具体实施时,由于消失点位于目标图像对应的地平线上,因此,可以在获取地平线的位置信息之后,确定消失点在地平线上的位置,然后,可以确定该位置在图像中 的坐标信息,并将该坐标信息作为消失点的位置信息,然后基于消失点的位置信息可以确定摄像装置在竖直平面上的第二旋转角度信息,其中第二旋转角度信息能够反应摄像装置的姿态与标准姿态相比在竖直平面上的俯仰角度,由此摄像装置的姿态变化信息还包括第二旋转角度信息。进一步的,可以将第二旋转角度信息与第一旋转角度信息一起作为摄像装置的姿态变化信息。这样,基于摄像装置在水平面上和竖直平面上的角度变化信息,确定摄像装置的姿态变化信息,提高了确定的姿态变化信息的准确性。
另外,在基于目标图像确定姿态变化信息的过程中,可以同时使用目标图像中的地平线的位置信息和消失点的位置信息进行确定,也可以只使用一种信息进行确定,这里不进行限定。
在一些例子中,第一神经网络的参数是根据图像数据集进行有监督训练优化后得到的,因此其对图像中地平线和消失点的检测的精确性较高。将初始图像特征输入第一神经网络后,第一神经网络可以输出地平线的位置信息和/或消失点的位置信息。
进一步的,可以将姿态变化信息输入第二神经网络,第二神经网络基于姿态变化信息对目标图像的初始图像特征进行修正,得到目标图像的目标图像特征,其中,目标图像特征为经过修正后的特征,在一定程度上接近拍摄装置在目标图像的拍摄位置处,以标准姿态拍摄得到的图像所包含的特征。
在一些例子中,姿态变化信息包括第一旋转角度信息和第二旋转角度信息。根据第一旋转角度信息,可以实现对初始图像特征在水平面上的修正,然后根据第二旋转角度信息,可以实现对在水平面上修正过的初始图像特征在竖直平面上的修正,基于此,可以得到在水平面、竖直平面上都被修正过的目标图像的目标图像特征。在另一些例子中,也可以同时对水平面和竖直平面的初始图像特征进行修正。在再一些例子中,可以对水平面或竖直平面的初始图像特征进行修正。本公开不限制修正的顺序和修正包括的姿态变化信息。
然后,将目标图像特征输入到单目3D目标检测网络,单目3D目标检测网络可以基于目标图像特征对目标图像中的每一个对象进行检测,确定每一个对象的关键点(如对象的中心点)在标定坐标系的坐标,然后再基于标定坐标系和世界坐标系之间的转换关系,将每一个关键点的坐标进行转换,得到关键点在特征图中的坐标。并且确定每一个对象的深度和尺寸信息,从而确定每一个对象在世界坐标系中的真实位置信息,并将该真实位置信息作为每一个对象的信息。并且,具体实施时,单目3D目标检测网络输出的信息可以包括对象坐标、对象尺寸和对象朝向角等信息,对象尺寸用于表征对象在现实世界中的大小,对象朝向角用于表征对象在现实世界中的朝向。其中,标定坐标系为摄像装置在标定时的姿态(即标准姿态)下的坐标系。
如图3所示,为本公开实施例所提供的一种四个神经网络进行对象检测的***示意图。骨干神经网络310的输入为目标图像301,输出为初始图像特征311。第一神经网络320可以包括回归网络(regression network),输入为初始图像特征311,输出为拍摄目标图像301的摄像装置的姿态变化信息321。第二神经网络330可以包括转换网络(transfer network),输入为初始图像特征311和姿态变化信息321,输出为目标图像特征331。单目3D目标检测网络340(monocular 3D detection network)的输入为目标特征331,输出的3D结果341,例如检测对象的3D边界框。
由于本公开实施例所提供的对象检测方法是由4个不同的神经网络完成的,为了提高该对象检测方法的检测结果的可靠性和精确度,本公开实施例还提供了对部分神经网络进行训练的方法,具体实施时,骨干神经网络和单目3D目标检测网络可以为现有的神经网络,例如,卷积神经网络、递归神经网络、多层感知机等。在一种可行的实现方式中,单目3D目标检测网络可以是Anchor-Free(无需锚框)的检测网络。第一神 经网络和第二神经网络为本公开实施例所提供的特有的神经网络,需要进行训练才可以达到预期的检测效果,因此,下面分别对第一神经网络和第二神经网络的训练过程进行详细介绍。
如图4所示,为本公开实施例所提供的一种训练第一神经网络的方法的流程图,可以包括以下步骤S401至S404。
S401:获取第一训练样本。
S402:将第一样本图像输入待训练的第一神经网络,得到预测地平线信息和预测消失点信息。
S403:基于标注地平线信息和预测地平线信息以及标注消失点信息和预测消失点信息,确定第一损失。
S404:利用第一损失对待训练的第一神经网络进行训练,得到训练完成的第一神经网络。
第一训练样本包括第一样本图像的样本初始特征、第一样本图像中的标注地平线信息以及第一样本图像中的标注消失点信息。第一样本图像可以为摄像装置在姿态变化后拍摄的图像。将第一样本图像经过骨干神经网络进行处理,得到第一样本图像的样本初始特征。预测地平线信息为第一神经网络基于样本初始特征预测输出的第一样本图像中的地平线信息,标注地平线信息为摄像装置在拍摄第一样本图像的位置处,利用标准姿态拍摄的标准样本图像中的地平线信息。预测消失点信息为第一神经网络预测输出的第一样本图像中的消失点信息,标注消失点信息为摄像装置在拍摄第一样本图像的位置处,利用标准姿态拍摄的标准样本图像中的消失点信息。
在获取到第一样本图像之后,首先需要利用骨干神经网络进行处理,得到第一训练样本中的第一样本图像的样本初始特征,其中,样本初始特征可以对应于样本初始特征图,即骨干神经网络可以输出一个样本初始特征图。然后将样本初始特征图输入到待训练的第一神经网络,待训练的第一神经网络基于样本初始特征图,可以确定第一样本图像中的预测地平线信息和预测消失点信息,其中,预测地平线信息可以包括预测地平线的位置信息,预测消失点信息可以包括预测消失点的位置信息。相应的,基于第一样本图像中的标注地平线信息和标注消失点信息,可以确定标注地平线的位置信息和标注消失点的位置信息。具体实施时,标注地平线信息可以是直接输入的,也可以是将标准样本图像输入到骨干神经网络中,基于骨干神经网络输出的标准样本特征图确定的,关于标注地平线信息的确定方式,这里不进行限定。
之后,可以根据预测地平线的位置信息和预测消失点的位置信息以及对应的标注地平线的位置信息和标注消失点的位置信息,计算得到第一损失,其中,第一损失可以为构建的第一损失函数的值,然后利用第一损失对待训练的第一神经网络进行训练。使用多个第一训练样本对第一神经网络进行多轮训练后,可以得到训练完成的第一神经网络,训练完成的第一神经网络可以在应用过程中,输出较为准确的姿态变化信息。
具体实施时,可以利用公式一表明预测地平线信息和预测消失点信息,利用公式二构建第一损失。
Figure PCTCN2022070696-appb-000001
其中,
Figure PCTCN2022070696-appb-000002
表示第一样本图像中的预测地平线的位置信息,
Figure PCTCN2022070696-appb-000003
表示第一样本图像中的预测消失点的位置信息,f vo表示基于CNN构建的待训练的第一神经网络,H j表示骨干神经网络输出的样本初始特征图。
Figure PCTCN2022070696-appb-000004
其中,L vo表示第一损失,||||表示L1范数,A表示由标注地平线的位置信息和标注消失点的位置信息组成的标注矩阵,该矩阵反映了位姿变化信息,g表示可以将预测地平线的位置信息和预测消失点的位置信息转化为预测矩阵的转化函数。第一损失表示了标注的位姿变化信息和预测的位姿变换信息之间的曼哈顿距离。
待训练的第一神经网络可以根据预测地平线的位置信息和预测消失点的位置信息以及对应的标注地平线的位置信息和标注消失点的位置信息,确定摄像装置的预测姿态变化信息。具体实施时,可以利用预测地平线的位置信息和标注地平线的位置信息之间的位置信息偏差,确定地平线在水平面上的翻转角度信息,即确定摄像装置在水平面上的第一旋转角度信息;根据预测消失点的位置信息和标注消失点的位置信息之间的位置信息偏差,确定地平线在竖直平面上的俯仰角度信息,即确定摄像装置在竖直平面上的第二旋转角度信息,进而,基于第一旋转角度信息和第二旋转角度信息,可以确定摄像装置的预测姿态变化信息。
这样,利用标注地平线信息和预测地平线信息以及标注消失点信息和预测消失点信息确定的第一损失对第一神经网络进行训练,能够保证训练得到的第一神经网络能够确定出较为准确的地平线信息和消失点信息,进而利用该地平线信息和消失点信息,能够得到较为准确的姿态变化信息。
如图5所示,为本公开实施例所提供的一种训练第二神经网络的方法的流程图,可以包括以下步骤S501至S504。
S501:获取第二训练样本。
S502:提取原始图像中的图像特征,得到原始图像特征。
S503:基于校准图像,确定校准图像特征。
S504:基于原始图像特征、校准图像特征和参考姿态变化信息,训练第二神经网络。
这里,第二训练样本包括原始图像、校准图像以及拍摄原始图像的摄像装置的参考姿态变化信息,原始图像为摄像装置在姿态发生变化的情况下拍摄的图像,校准图像对应的摄像装置的姿态为标准姿态,即校准图像为摄像装置在拍摄原始图像的位置处,利用标准姿态拍摄的图像,参考姿态变化信息为利用骨干神经网络和第一神经网络对原始图像进行检测,确定的摄像装置在拍摄原始图像时的姿态变化信息。或者,可以人为设置摄像装置的姿态变化量,原始图像可以由摄像装置按照该姿态变化量调整姿态后拍摄得到,此时,参考姿态变化信息可以根据该姿态变化量确定。
在另一种实施方式中,校准图像还可以根据确定的参考姿态变化信息,对原始图像进行修正得到。
具体实施时,在获取第二训练样本之后,可以将其中的原始图像和校准图像输入到骨干神经网络中,利用骨干神经网络提取原始图像中的图像特征,得到原始图像特征,其中,原始图像特征可以包括原始图像对应的第一内容特征和第一风格特征,第一内容特征包括原始图像中的对象轮廓、边线的位置,第一风格特征包括原始图像的纹理和材质信息;同时,利用骨干神经网络提取校准图像中的校准图像特征,校准图像特征中也包括校准图像对应的第二内容特征和第二风格特征,第二内容特征包括校准图像中的对象轮廓、边线的位置,第二风格特征包括校准图像的纹理和材质信息。
在一种可能的实施方式中,可以按照以下步骤训练待训练的第二神经网络。
将原始图像特征和参考姿态变化信息输入待训练的第二神经网络,得到修正后的预测图像特征;利用预测图像特征和第二内容特征,确定第二损失;利用第二损失对待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
这里,第二损失为待训练的第二神经网络针对预测图像特征和校准图像对应的第二内容特征之间的内容损失Lcontent。具体实施时,第二损失可以为构建的第二损失函数的值,原始图像特征可以对应于原始图像特征图,待训练的第二神经网络可以表示为变换神经网络f t,在对变换神经网络f t进行训练的过程中,还需要使用损失计算神经网络
Figure PCTCN2022070696-appb-000005
得到第二损失。
具体实施时,将获取的原始图像特征对应的原始图像特征图H in和参考姿态变化信息
Figure PCTCN2022070696-appb-000006
输入到变换神经网络f t中,其中,原始图像特征图H in与原始图像经过骨干神经网络处理后得到的H j相同,即H in=H j,参考姿态变化信息
Figure PCTCN2022070696-appb-000007
可以帮助变换神经网络f t对原始图像特征图H in的内容特征进行修正,变换神经网络f t基于参考姿态变化信息
Figure PCTCN2022070696-appb-000008
对原始图像特征图H in进行处理,输出修正后的预测图像特征H out,输出的预测图像特征H out相对于第二内容特征H content存在一定的偏差;进而,需要根据预测图像特征H out和第二内容特征H content确定第二损失。
这里,如果校准图像是根据确定的参考姿态变化信息,对原始图像进行修正得到的,骨干神经网络可以基于参考姿态变化信息
Figure PCTCN2022070696-appb-000009
以及原始图像Xj,确定第二内容特征H content。具体实施时,可以按照公式三确定第二内容特征H content
Figure PCTCN2022070696-appb-000010
其中,f b表示骨干神经网络,
Figure PCTCN2022070696-appb-000011
表示参考姿态变化信息对应的逆矩阵,X j表示原始图像。
基于此,在确定预测图像特征H out之后,可以将预测图像特征H out和第二内容特征输入到损失计算神经网络
Figure PCTCN2022070696-appb-000012
中,然后损失计算神经网络
Figure PCTCN2022070696-appb-000013
可以根据预测图像特征H out和第二内容特征H content,构建变换神经网络f t在修正原始图像特征的过程中产生的第二损失。
以变换神经网络f t输出的预测图像特征H out对应的特征图以及第二内容特征H content的特征图的尺寸都为(c m,h m,w m),损失计算神经网络
Figure PCTCN2022070696-appb-000014
中的第m层的激活函数
Figure PCTCN2022070696-appb-000015
为例,第二损失Lcontent可以由预测图像特征H out对应的特征图和第二内容特征H content的特征图之间的平方欧式距离(公式四)来确定。
Figure PCTCN2022070696-appb-000016
其中,
Figure PCTCN2022070696-appb-000017
表示第m层的激活函数
Figure PCTCN2022070696-appb-000018
确定的第二损失Lcontent,
Figure PCTCN2022070696-appb-000019
表示预测图像特征H out在第m层的激活函数
Figure PCTCN2022070696-appb-000020
的输出信息,
Figure PCTCN2022070696-appb-000021
表示第二内容特征H content在第m层的激活函数
Figure PCTCN2022070696-appb-000022
的输出信息。
Figure PCTCN2022070696-appb-000023
表示L2范数。
基于上述公式四可以确定第二损失Lcontent,然后基于第二损失Lcontent对待训练的第二神经网络进行训练。使用多个第二训练样本对第二神经网络进行多轮训练后,可以得到训练完成的第二神经网络,训练完成的神经网络输出的预测图像特征可以贴近校准图像对应的校准图像特征。
进一步的,为了进一步提高训练完成的第二神经网络输出的预测图像特征的精准度,在确定第二损失Lcontent的同时,还可以确定第三损失Lstyle,利用第三损失Lstyle和第二损失Lcontent一起对第二神经网络进行训练,具体实施时,可以按照以下步骤确 定第三损失Lstyle并基于第三损失Lstyle对第二神经网络进行训练。
将原始图像特征和参考姿态变化信息输入待训练的第二神经网络,得到修正后的预测图像特征;利用预测图像特征和第一风格特征,确定第三损失;利用第三损失对待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
由于图像特征中的风格特征受拍摄装置姿态变化的影响较小,原始图像对应的第一风格特征与校准图像对应的第二风格特征之间的相似度较高,所以在确定第三损失时,可以直接使用原始图像特征图H in中所包括的第一风格特征H style进行确定。具体实施时,可以从原始图像特征中提取所包括的第一风格特征H style,然后将第一风格特征H style和预测图像特征H out输入到损失计算神经网络
Figure PCTCN2022070696-appb-000024
中,进而,损失计算神经网络
Figure PCTCN2022070696-appb-000025
通过对第一风格特征H stylt和预测图像特征H out的处理,构建出第一风格特征H style和预测图像特征H out之间的第三损失Lstyle。
在一种实施方式中,在确定第三损失Lstyle的过程中,首先需要确定与第一风格特征H style对应的特征图和预测图像特征H out对应的特征图分别对应的特征相似度信息,其中,特征相似度信息可以用Gram matrix(格拉姆矩阵)
Figure PCTCN2022070696-appb-000026
表示。具体实施时,以Gram matrix的尺寸为(c m×c m),损失计算神经网络
Figure PCTCN2022070696-appb-000027
中的第m层的激活函数
Figure PCTCN2022070696-appb-000028
为例,针对预测图像特征H out或第一风格特征H style,可以按照公式五确定其在第m层的上的特征相似度信息,公式五如下式所示。
Figure PCTCN2022070696-appb-000029
其中,H表示预测图像特征H out或第一风格特征H style,c和c′表示同一特征图中的不同channel(通道),
Figure PCTCN2022070696-appb-000030
用于表示同一特征图中不同通道在m层激活函数
Figure PCTCN2022070696-appb-000031
上的特征相似度信息,c m为特征图在损失计算神经网络
Figure PCTCN2022070696-appb-000032
的第m层上的通道数信息,h m为特征图在损失计算神经网络
Figure PCTCN2022070696-appb-000033
的第m层上的高度信息,w m为特征图在损失计算神经网络
Figure PCTCN2022070696-appb-000034
的第m层上的宽度信息。
基于公式五可以确定预测图像特征H out在m层上的特征相似度信息
Figure PCTCN2022070696-appb-000035
和第一风格特征H style在m层上的特征相似度信息
Figure PCTCN2022070696-appb-000036
进一步的,可以基于预测图像特征H out和第一风格特征H style两个特征相似度信息之间的平方弗罗贝尼乌斯(Frobenius)范数确定用于对第二神经网络进行训练的第三损失Lstyle,具体实施时,可以按照公式六确定第三损失Lstyle。
Figure PCTCN2022070696-appb-000037
其中,
Figure PCTCN2022070696-appb-000038
表示第m层的激活函数
Figure PCTCN2022070696-appb-000039
确定的第三损失Lstyle。
Figure PCTCN2022070696-appb-000040
表示Frobenius范数。Frobenius范数是一种矩阵范数,可以衡量两个矩阵之间的差异。
进一步的,可以利用第二损失Lcontent和第三损失Lstyle一起对待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
另外,还可以根据第二损失Lcontent和第三损失Lstyle,确定一个联合损失Ltotal=γ 1Lcontent+γ 2Lstyle,其中,γ 1和γ 2为在对第二神经网络进行训练的过程中,确定的调整第二损失和第三损失的超参数。
由于图像特征中的风格特征,受摄像装置姿态变化的影响不大,因此第二神经网络中,修正后的预测图像特征所包含的风格特征会贴近原始图像的原始图像特征中的 第一风格特征,进一步的,利用预测图像特征与原始图像特征的第一风格特征来确定第三损失,并用第三损失来训练第二神经网络,能够保证第二神经网络对图像特征中的风格特征不进行较大幅度的调整,保证第二神经网络对风格特征的修正精度,从而能够提高后续目标的检测精度。
在另一种实施方式中,还可以利用校准图像对应的第二风格特征对第二神经网络进行训练,具体实施时,可以按照利用原始图像的第一风格特征和预测图像特征H out确定第三损失的方式,确定预测图像特征H out和第二风格特征之间的第四损失,然后利用第四损失和第二损失对待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
这样,修正后的预测图像特征所包含的风格特征会贴近校准图像的第二风格特征,进一步的,利用预测图像特征与第二风格特征来确定第四损失,并用第四损失来训练第二神经网络,能够保证第二神经网络对图像特征中的风格特征不进行较大幅度的调整,保证第二神经网络对风格特征的修正精度,从而能够提高后续目标的检测精度。
具体实施时,关于对待训练的第二神经网络进行训练的方式,可以利用第二损失对其进行训练,也可以利用第二损失和第三损失对其进行训练,还可以利用第二损失和第四损失对其进行训练,这里不进行限定。
另外,针对第一神经网络和第二神经网络进行的训练,可以先分别对待训练的第一神经网络和待训练的第二神经网络进行训练,在待训练的第一神经网络和待训练的第二神经网络的损失都达到预设的收敛值时,再对待训练的第一神经网络和待训练的第二神经网络进行联合训练,得到训练完成的第一神经网络和训练完成的第二神经网络。也可以直接对待训练的第一神经网络和待训练的第二神经网络进行联合训练,得到训练完成的第一神经网络和训练完成的第二神经网络。也可以分别对待训练的第一神经网络和待训练的第二神经网络进行训练,得到训练完成的第一神经网络和训练完成的第二神经网络,这里不进行限定。
如图6所示,为本公开实施例所提供的训练待训练的第二神经网络的流程示意图,其中,图像A表示原始图像,图像B表示校准图像,backbone表示骨干神经网络。可见,骨干神经网络对图像A进行处理得到原始图像特征H in,并对图像B进行处理得到校准图像对应的第二内容特征H content。然后,原始图像特征H in和参考姿态变化信息被输入至变换神经网络中,变换神经网络可以输出修正后的预测图像特征H out。之后,损失计算神经网络可以基于预测图像特征H out和第二内容特征H content计算第二损失Lcontent,并基于预测图像特征H out和原始图像特征H in中包含的第一风格特征H style计算第三损失Lstyle。
在确定目标图像中的对象的信息之后,还可以基于对象的信息,控制行驶装置行驶或发出提示信息,其中,行驶装置安装有摄像装置。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与对象检测方法对应的对象检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述对象检测方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
如图7所示,为本公开实施例提供的一种对象检测装置的示意图,包括:获取模块701,用于获取目标图像;第一确定模块702,用于基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息;调整模块703,用于 基于所述姿态变化信息对所述目标图像的初始图像特征进行修正,得到所述目标图像的目标图像特征;第二确定模块704,用于基于所述目标图像特征,确定所述目标图像中的对象的信息。
在一种可能的实施方式中,所述第一确定模块702,用于基于所述目标图像的所述初始图像特征,确定所述目标图像中的地平线信息;基于所述地平线信息,确定所述摄像装置在拍摄所述目标图像时的姿态变化信息。
在一种可能的实施方式中,所述地平线信息包括地平线的位置信息;所述姿态变化信息包括所述摄像装置在水平面上的第一旋转角度信息;所述第一确定模块702,用于基于所述地平线的位置信息,确定所述摄像装置的所述第一旋转角度信息。
在一种可能的实施方式中,所述姿态变化信息包括所述摄像装置在竖直平面上的第二旋转角度信息;所述第一确定模块702,用于基于所述目标图像的所述初始图像特征,确定所述目标图像中的消失点信息;基于所述消失点信息,确定所述摄像装置的所述第二旋转角度信息。
在一种可能的实施方式中,所述第二确定模块704,用于基于所述目标图像特征,确定所述目标图像中的对象在标定坐标系下的信息;基于所述标定坐标系和世界坐标系之间的转换关系、所述对象在所述标定坐标系下的信息,确定所述对象在所述世界坐标系中的信息。
在一种可能的实施方式中,所述姿态变化信息利用第一神经网络确定。
在一种可能的实施方式中,所述装置还包括第一训练模块705,用于获取第一训练样本;所述第一训练样本包括第一样本图像的样本初始特征、所述第一样本图像中的标注地平线信息和所述第一样本图像中的标注消失点信息;将所述第一样本图像输入待训练的第一神经网络,得到预测地平线信息和预测消失点信息;基于所述标注地平线信息和所述预测地平线信息以及所述标注消失点信息和所述预测消失点信息,确定第一损失;利用所述第一损失对所述待训练的第一神经网络进行训练,得到训练完成的第一神经网络。
在一种可能的实施方式中,所述目标图像特征利用第二神经网络确定。
在一种可能的实施方式中,所述装置还包括第二训练模块706,用于获取第二训练样本;所述第二训练样本包括原始图像、校准图像以及拍摄所述原始图像的摄像装置的参考姿态变化信息;所述校准图像对应的摄像装置的姿态为标准姿态;提取所述原始图像中的图像特征,得到原始图像特征,所述原始图像特征包括第一内容特征和第一风格特征,所述第一内容特征包括所述原始图像中的对象轮廓、边线的位置,所述第一风格特征包括所述原始图像的纹理和材质信息;基于所述校准图像,确定所述校准图像的校准图像特征,所述校准图像特征包括第二内容特征和第二风格特征,所述第二内容特征包括所述校准图像中的对象轮廓、边线的位置,所述第二风格特征包括所述校准图像的纹理和材质信息;基于所述原始图像特征、所述校准图像特征和所述参考姿态变化信息,训练所述第二神经网络。
在一种可能的实施方式中,所述第二训练模块706,用于利用所述预测图像特征和所述第二内容特征,确定第二损失;利用所述第二损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
在一种可能的实施方式中,所述第二训练模块706,基于所述原始图像,确定所述原始图像的第一风格特征;基于所述原始图像特征、所述第一风格特征和所述参考姿态变化信息,训练所述第二神经网络。
在一种可能的实施方式中,所述第二训练模块706,用于基于所述预测图像特征和所述第一风格特征,确定第三损失;利用所述第三损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
在一种可能的实施方式中,所述第二训练模块706,还用于基于所述校准图像,确定所述校准图像的第二风格特征;基于所述原始图像特征、所述第二风格特征和所述参考姿态变化信息,训练所述第二神经网络。
在一种可能的实施方式中,所述第二训练模块706,用于利用所述预测图像特征和所述第二风格特征,确定第四损失;利用所述第四损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
在一种可能的实施方式中,所述装置还包括控制模块707,用于在所述第二确定模块704确定所述目标图像中的对象的信息之后,基于所述对象的信息,控制行驶装置行驶或发出提示信息;所述行驶装置安装有所述摄像装置。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开实施例还提供了一种计算机设备,如图8所示,为本公开实施例提供的一种计算机设备结构示意图,包括:
处理器81和存储器82;所述存储器82存储有处理器81可执行的机器可读指令,处理器81用于执行存储器82中存储的机器可读指令,所述机器可读指令被处理器81执行时,处理器81执行下述步骤:S101:获取目标图像;S102:基于目标图像,确定拍摄目标图像的摄像装置在拍摄目标图像时的姿态变化信息;S103:基于姿态变化信息对目标图像的初始图像特征进行修正,得到目标图像的目标图像特征以及S104:基于目标图像特征,确定目标图像中的对象的信息。或者,处理器81可以执行本公开实施例的任一种对象检测方法的步骤。
上述存储器82包括内存821和外部存储器822;这里的内存821也称内存储器,用于暂时存放处理器81中的运算数据,以及与硬盘等外部存储器822交换的数据,处理器81通过内存821与外部存储器822进行数据交换。
上述指令的具体执行过程可以参考本公开实施例中所述的对象检测方法的步骤,此处不再赘述。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的对象检测方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供对象检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的对象检测方法的步骤,具体可参见上述方法实施例,在此不再赘述。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅 仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (18)

  1. 一种对象检测方法,其特征在于,包括:
    获取目标图像;
    基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息;
    基于所述姿态变化信息对所述目标图像的初始图像特征进行修正,得到所述目标图像的目标图像特征;
    基于所述目标图像特征,确定所述目标图像中的对象的信息。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息,包括:
    基于所述目标图像的所述初始图像特征,确定所述目标图像中的地平线信息;
    基于所述地平线信息,确定所述摄像装置在拍摄所述目标图像时的姿态变化信息。
  3. 根据权利要求2所述的方法,其特征在于,所述地平线信息包括地平线的位置信息;所述姿态变化信息包括所述摄像装置在水平面上的第一旋转角度信息;
    所述基于所述地平线信息,确定所述摄像装置在拍摄所述目标图像时的姿态变化信息,包括:
    基于所述地平线的位置信息,确定所述摄像装置的所述第一旋转角度信息。
  4. 根据权利要求2或3所述的方法,其特征在于,所述基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息,还包括:
    基于所述目标图像的所述初始图像特征,确定所述目标图像中的消失点信息;
    基于所述消失点信息,确定所述摄像装置在拍摄所述目标图像时的所述姿态变化信息,其中,所述姿态变化信息包括所述摄像装置在竖直平面上的第二旋转角度信息,所述第二旋转角度信息由所述消失点信息确定。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述基于所述目标图像特征,确定所述目标图像中的对象的信息,包括:
    基于所述目标图像特征,确定所述目标图像中的对象在标定坐标系下的信息;
    基于所述标定坐标系和世界坐标系之间的转换关系、所述对象在所述标定坐标系下的信息,确定所述对象在所述世界坐标系中的信息。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述姿态变化信息利用第一神经网络确定。
  7. 根据权利要求6所述的方法,其特征在于,所述第一神经网络采用以下步骤训 练得到:
    获取第一训练样本,其中,所述第一训练样本包括第一样本图像的样本初始特征、所述第一样本图像中的标注地平线信息和所述第一样本图像中的标注消失点信息;
    将所述第一样本图像输入待训练的第一神经网络,得到预测地平线信息和预测消失点信息;
    基于所述标注地平线信息和所述预测地平线信息以及所述标注消失点信息和所述预测消失点信息,确定第一损失;
    利用所述第一损失对所述待训练的第一神经网络进行训练,得到训练完成的第一神经网络。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述目标图像特征利用第二神经网络确定。
  9. 根据权利要求8所述的方法,其特征在于,所述第二神经网络采用以下步骤训练得到:
    获取第二训练样本,其中,所述第二训练样本包括原始图像、校准图像以及拍摄所述原始图像的摄像装置的参考姿态变化信息,所述校准图像对应的摄像装置的姿态为标准姿态;
    提取所述原始图像中的图像特征,得到原始图像特征,其中,所述原始图像特征包括第一内容特征和第一风格特征,所述第一内容特征包括所述原始图像中的对象轮廓、边线的位置,所述第一风格特征包括所述原始图像的纹理和材质信息;
    基于所述校准图像,确定所述校准图像特征,其中,所述校准图像特征包括第二内容特征和第二风格特征,所述第二内容特征包括所述校准图像中的对象轮廓、边线的位置,所述第二风格特征包括所述校准图像的纹理和材质信息;
    基于所述原始图像特征、所述校准图像特征和所述参考姿态变化信息,训练所述第二神经网络。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述原始图像特征、所述校准图像特征和所述参考姿态变化信息,训练所述第二神经网络,包括:
    将所述原始图像特征和所述参考姿态变化信息输入待训练的第二神经网络,得到修正后的预测图像特征;
    利用所述预测图像特征和所述第二内容特征,确定第二损失;
    利用所述第二损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
  11. 根据权利要求10所述的方法,其特征在于,所述训练第二神经网络的步骤还包括:
    基于所述原始图像特征、所述第一风格特征和所述参考姿态变化信息,训练所述第二神经网络。
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述原始图像特征、所述第一风格特征和所述参考姿态变化信息,训练所述第二神经网络,包括:
    基于所述预测图像特征和所述第一风格特征,确定第三损失;
    利用所述第三损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
  13. 根据权利要求10所述的方法,其特征在于,所述训练第二神经网络的步骤还包括:
    基于所述原始图像特征、所述第二风格特征和所述参考姿态变化信息,训练所述第二神经网络。
  14. 根据权利要求13所述的方法,其特征在于,所述基于所述原始图像特征、所述第二风格特征和所述参考姿态变化信息,训练所述第二神经网络,包括:
    利用所述预测图像特征和所述第二风格特征,确定第四损失;
    利用所述第四损失对所述待训练的第二神经网络进行训练,得到训练完成的第二神经网络。
  15. 根据权利要求1至14任一项所述的方法,其特征在于,在确定所述目标图像中的对象的信息之后,还包括:
    基于所述对象的信息,控制行驶装置行驶或发出提示信息,其中,所述行驶装置安装有所述摄像装置。
  16. 一种对象检测装置,其特征在于,包括:
    获取模块,用于获取目标图像;
    第一确定模块,用于基于所述目标图像,确定拍摄所述目标图像的摄像装置在拍摄所述目标图像时的姿态变化信息;
    调整模块,用于基于所述姿态变化信息对所述目标图像的初始图像特征进行修正,得到所述目标图像的目标图像特征;
    第二确定模块,用于基于所述目标图像特征,确定所述目标图像中的对象的信息。
  17. 一种计算机设备,其特征在于,包括:处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指 令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至15任意一项所述的对象检测方法的步骤。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被计算机设备运行时,所述计算机设备执行如权利要求1至15任意一项所述的对象检测方法的步骤。
PCT/CN2022/070696 2021-01-18 2022-01-07 一种对象检测方法、装置、计算机设备及存储介质 WO2022152050A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110063318.4 2021-01-18
CN202110063318.4A CN112733773B (zh) 2021-01-18 一种对象检测方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022152050A1 true WO2022152050A1 (zh) 2022-07-21

Family

ID=75592210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070696 WO2022152050A1 (zh) 2021-01-18 2022-01-07 一种对象检测方法、装置、计算机设备及存储介质

Country Status (1)

Country Link
WO (1) WO2022152050A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160033753A1 (en) * 2014-07-31 2016-02-04 Canon Kabushiki Kaisha Image acquiring apparatus
CN105809701A (zh) * 2016-03-25 2016-07-27 成都易瞳科技有限公司 全景视频姿态标定方法
CN109740571A (zh) * 2019-01-22 2019-05-10 南京旷云科技有限公司 图像采集的方法、图像处理的方法、装置和电子设备
CN111405190A (zh) * 2020-04-23 2020-07-10 南京维沃软件技术有限公司 图像处理方法及装置
CN112733773A (zh) * 2021-01-18 2021-04-30 上海商汤智能科技有限公司 一种对象检测方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160033753A1 (en) * 2014-07-31 2016-02-04 Canon Kabushiki Kaisha Image acquiring apparatus
CN105809701A (zh) * 2016-03-25 2016-07-27 成都易瞳科技有限公司 全景视频姿态标定方法
CN109740571A (zh) * 2019-01-22 2019-05-10 南京旷云科技有限公司 图像采集的方法、图像处理的方法、装置和电子设备
CN111405190A (zh) * 2020-04-23 2020-07-10 南京维沃软件技术有限公司 图像处理方法及装置
CN112733773A (zh) * 2021-01-18 2021-04-30 上海商汤智能科技有限公司 一种对象检测方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN112733773A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
US11436437B2 (en) Three-dimension (3D) assisted personalized home object detection
US10713814B2 (en) Eye tracking method and system
WO2021139484A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
US11703949B2 (en) Directional assistance for centering a face in a camera field of view
WO2020206708A1 (zh) 障碍物的识别方法、装置、计算机设备和存储介质
WO2020014909A1 (zh) 拍摄方法、装置和无人机
WO2019119328A1 (zh) 一种基于视觉的定位方法及飞行器
WO2021143935A1 (zh) 一种检测方法、装置、电子设备及存储介质
CN113537208A (zh) 一种基于语义orb-slam技术的视觉定位方法及***
JP7438320B2 (ja) クロスモーダルセンサデータの位置合わせ
US20160140744A1 (en) Aligning Panoramic Imagery and Aerial Imagery
JP2016029564A (ja) 対象検出方法及び対象検出装置
CN112257696B (zh) 视线估计方法及计算设备
EP3506149A1 (en) Method, system and computer program product for eye gaze direction estimation
WO2022012425A1 (zh) 目标检测方法、装置及电子设备
CN116866719B (zh) 一种基于图像识别的高清视频内容智能分析处理方法
WO2021184359A1 (zh) 目标跟随方法、目标跟随装置、可移动设备和存储介质
US20240062415A1 (en) Terminal device localization method and related device therefor
WO2023056789A1 (zh) 农机自动驾驶障碍物识别方法、***、设备和存储介质
US11842440B2 (en) Landmark location reconstruction in autonomous machine applications
US11645773B2 (en) Method for acquiring distance from moving body to at least one object located in any direction of moving body by performing near region sensing and image processing device using the same
CN111510704A (zh) 校正摄像头错排的方法及利用其的装置
Itu et al. Automatic extrinsic camera parameters calibration using Convolutional Neural Networks
WO2022152050A1 (zh) 一种对象检测方法、装置、计算机设备及存储介质
CN105184736B (zh) 一种窄重叠双视场高光谱成像仪的图像配准的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22738931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.12.2023)