WO2020233414A1 - 物体识别方法、装置及车辆 - Google Patents

物体识别方法、装置及车辆 Download PDF

Info

Publication number
WO2020233414A1
WO2020233414A1 PCT/CN2020/089116 CN2020089116W WO2020233414A1 WO 2020233414 A1 WO2020233414 A1 WO 2020233414A1 CN 2020089116 W CN2020089116 W CN 2020089116W WO 2020233414 A1 WO2020233414 A1 WO 2020233414A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature
feature data
traffic
images
Prior art date
Application number
PCT/CN2020/089116
Other languages
English (en)
French (fr)
Inventor
苗振伟
陈纪凯
王兵
王刚
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020233414A1 publication Critical patent/WO2020233414A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • This application relates to the field of image processing technology, in particular to object recognition methods, devices and equipment, object body feature extraction model construction methods, devices and equipment, and vehicles.
  • the multi-sensor fusion perception system uses different types of sensors to detect the surrounding environment of the driving vehicle, so that the vehicle can accurately judge the traffic situation.
  • the multi-camera sensor solution is a common vehicle multi-sensor solution at present.
  • the autonomous vehicle in this solution observes multiple perspectives of the vehicle through multiple cameras mounted to cover the surrounding environment as much as possible.
  • an autonomous vehicle is equipped with a multi-camera solution, it is necessary to fuse the same object from different cameras, and it is an important issue to find out the spatial and timing correspondence between objects captured by different cameras.
  • two types of data fusion processing are specifically involved: 1) Different cameras may capture the same object. Finding out the same object in different images captured by different cameras can help autonomous vehicles perceive the surrounding environment better 2) For a single camera, it is also necessary to associate the same object at different times, which can be used for tracking or assisting other sensors to obtain information, so as to better perceive the surrounding environment.
  • the inventor found that the current related technical solutions have the problem of not being able to accurately correlate the same traffic object images appearing in different traffic environment images.
  • the recognition accuracy of the same traffic object appearing in the environment image is low.
  • the present application provides an object recognition method to solve the problem of the prior art that the same object in different environmental images cannot be accurately recognized.
  • This application additionally provides object recognition devices and equipment, object body feature extraction model construction methods, devices and equipment, and vehicles.
  • This application provides an object recognition method, including:
  • the object body feature extraction model is learned from a training set of object feature data with object identification and annotation data.
  • Optional also includes:
  • the input data of the network structure is object feature data
  • the output data is the object ontology feature data
  • the object body feature extraction model is learned from the training data set; the triple includes two groups of object feature data corresponding to objects with the same object identifier, and one group corresponding to other objects Object feature data.
  • the determining object images in multiple environment images includes:
  • the extracting the model from the object body feature and determining the object body feature data of the object image includes:
  • the object body feature data is determined according to the feature data of the at least one depth level.
  • the determining feature data of at least one depth level of the object image based on the position data and the at least one object feature map includes:
  • a collection of feature data having the feature dimension of each object feature extraction layer is used as the feature data of the at least one depth level.
  • the determining feature data of the object image in each object feature map according to the position data includes:
  • feature data of the object image in each object feature map is acquired.
  • the determining the similarity between images of different objects according to the feature data of the object body includes:
  • the multiple environmental images include: traffic environment images at the same time taken by multiple image acquisition devices, traffic environment images at different times taken by the same image acquisition device, and traffic at different times taken by multiple image acquisition devices Environmental image
  • the objects include: traffic objects.
  • the objects include: vehicles, people, and obstacles.
  • This application also provides a method for constructing an object ontology feature extraction model, including:
  • the input data of the network structure is object feature data
  • the output data is the object ontology feature data
  • the object body feature extraction model is learned from the training data set; the triple includes two groups of object feature data corresponding to objects with the same object identifier, and one group corresponding to other objects Object feature data.
  • This application also provides an object recognition device, including:
  • the object image determining unit is used to determine object images in multiple environment images
  • the object body feature extraction unit is used to determine the object body feature data of the object image through the object body feature extraction model
  • a similarity determination unit configured to determine the similarity between images of different objects according to the feature data of the object body
  • the image associating unit is configured to determine different object images respectively corresponding to the objects in the multiple environment images according to the similarity.
  • This application also provides a device for constructing an object body feature extraction model, including:
  • the training data determining unit is used to determine a set of correspondences between multiple object feature data and object identifiers as a training data set;
  • the model network construction unit is used to construct the network structure of the object ontology feature extraction model, the input data of the network structure is object feature data, and the output data is the object ontology feature data;
  • the model training unit is configured to learn from the training data set based on a triple loss function to obtain the object body feature extraction model; the triple includes two sets of object feature data corresponding to objects with the same object identifier, and A set of object feature data corresponding to other objects.
  • This application also provides an electronic device, including:
  • the memory is used to store the program for implementing the object recognition method. After the device is powered on and the method program is run through the processor, the following steps are executed: determining the object images in multiple environment images; extracting the model through the object body feature, Determine the object body feature data of the object image; determine the similarity between different object images according to the object body feature data; determine the different objects corresponding to each object in the plurality of environment images according to the similarity image.
  • This application also provides a vehicle, including:
  • At least one image acquisition device At least one image acquisition device
  • the memory is used to store the program for realizing the method of identifying traffic objects. After the device is powered on and running the program of the method through the processor, the following steps are executed: determining the images of traffic objects in multiple traffic environment images; Extract the model to determine the object body feature data of the traffic object image; determine the similarity between different traffic object images according to the object body feature data; determine the similarity between each of the multiple traffic environment images according to the similarity Traffic objects correspond to different traffic object images.
  • This application also provides an electronic device, including:
  • the memory is used to store the program for implementing the method for constructing the object body feature extraction model. After the device is powered on and the method program is run through the processor, the following steps are executed: determining the correspondence between multiple object feature data and object identifiers The relation set is used as a training data set; the network structure of the object ontology feature extraction model is constructed, the input data of the network structure is object feature data, and the output data is object ontology feature data; based on the triplet loss function, from the training data
  • the object body feature extraction model is obtained through intensive learning; the triplet includes two sets of object feature data corresponding to objects with the same object identifier, and a set of object feature data corresponding to other objects.
  • the present application also provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the above-mentioned various methods.
  • the present application also provides a computer program product including instructions, which when run on a computer, causes the computer to execute the above-mentioned various methods.
  • the object recognition method determines the object images in multiple environmental images; determines the object body feature data of the object image through the object body feature extraction model; and determines the distance between different object images according to the object body feature data. According to the similarity, determine the different object images corresponding to each object in the multiple environment images; this processing method allows the object body feature data of different object images to be extracted, according to the object body feature data.
  • the similarity of, the same object is recognized in different environmental images; therefore, the recognition accuracy of the same object that appears in different environmental images can be effectively improved.
  • the method for constructing an object ontology feature extraction model determines a set of correspondences between multiple object feature data and object identifiers as a training data set; and constructs a network structure of the object ontology feature extraction model.
  • the network structure The input data is the object feature data, and the output data is the object feature data; based on the triple loss function, the object feature extraction model is learned from the training data set; the triple includes objects with the same object identifier
  • This processing method allows the object feature extraction model to be learned from the object feature data training set with object identification data; therefore, it can be effective Improve the accuracy of the object feature extraction model.
  • Fig. 1 is a flowchart of an embodiment of an object recognition method provided by the present application
  • Fig. 2 is a specific flowchart of an embodiment of the object recognition method provided by the present application.
  • FIG. 3 is a specific flowchart of an embodiment of the object recognition method provided by the present application.
  • FIG. 4 is a schematic diagram of an embodiment of the object recognition device provided by the present application.
  • Figure 5 is a schematic diagram of an embodiment of an electronic device provided by the present application.
  • Figure 6 is a schematic diagram of an embodiment of a vehicle provided by the present application.
  • FIG. 7 is a flowchart of an embodiment of a method for constructing an object ontology feature extraction model provided by the present application.
  • FIG. 8 is a schematic diagram of an embodiment of an object body feature extraction model construction device provided by the present application.
  • Fig. 9 is a schematic diagram of an embodiment of an electronic device provided by the present application.
  • FIG. 1 is a flowchart of an embodiment of an object recognition method provided by this application.
  • the execution subject of the method includes but is not limited to unmanned vehicles, such as smart logistics vehicles, etc.
  • the identifiable objects include traffic objects, such as pedestrians, vehicles, obstacles, etc. on the road, as well as other objects. The method will be described below by taking the recognition of traffic objects as an example.
  • An object recognition method provided by this application includes:
  • Step S101 Determine object images in multiple environment images.
  • the multiple environment images in this embodiment are traffic environment images, which may include traffic environment images taken by multiple image acquisition devices at the same time, may also include traffic environment images taken by the same image acquisition device at different times, and may also include multiple Images of the traffic environment at different moments taken by the image acquisition device.
  • the image acquisition device may be a camera, or a common camera, etc.
  • the object in this embodiment may be a traffic object.
  • the traffic object may be a vehicle, a person, or an obstacle, such as a tree.
  • the method provided in the embodiments of the present application can be applied to an autonomous driving solution with one or more cameras.
  • the vehicle in the multi-camera automatic driving scheme, the vehicle is equipped with k cameras to observe the environment around the vehicle from k viewing angles.
  • the k cameras have collected k*( ⁇ +1) images of the environment during the driving of the vehicle at t n- ⁇ ,..., t n-1 , t n , and this embodiment will call them It is the image of the traffic environment.
  • the method provided in this embodiment it is possible to find out the k*( ⁇ +1) traffic environment images captured by k cameras that belong to the same object in each traffic object image, which can help the autonomous vehicle to better Perceive the surrounding environment.
  • the vehicle In the automatic driving solution of one camera, the vehicle is only equipped with one camera. For a single camera, the method provided in this embodiment can associate the same traffic object at ⁇ +1 time, which can be used for tracking or assist other sensors Get information to better perceive the surrounding environment.
  • the traffic environment image data can be transmitted to the traffic object detection model (also called the traffic object detector), and the traffic object (traffic participant) and its presence in the traffic environment image can be detected through the model
  • the location data that is, to determine the image of the traffic object in the image of the traffic environment.
  • the position data may be the vertex coordinate data of the rectangular bounding box of the traffic object image, that is, the position data may be a four-dimensional vector representing the x and y coordinates of the upper left corner and the lower right corner, respectively.
  • the traffic object detection model in this embodiment can use the RefineDet method based on deep learning.
  • This method uses the fast running speed of single-stage methods such as SSD and combines Faster R-CNN.
  • This type of two-stage method therefore has the advantage of high object detection accuracy.
  • this method detects the image of a traffic object (a traffic object is an obstacle for a moving vehicle) image in the image of the traffic environment, it obtains the bounding box coordinates of the image of the traffic object, that is, the image of the traffic object is Location data in traffic environment images.
  • Step S103 Determine the object body feature data of the object image through the object body feature extraction model.
  • this embodiment of the application refers to this feature data as the object body feature data.
  • the object body feature data is determined through the object body feature extraction model.
  • the object body feature extraction model can be learned from a large number of traffic object feature data training sets marked with object identifiers.
  • the object ontology feature extraction model can extract object ontology feature data with the same dimension from a large number of traffic object feature data with the same or different feature dimensions.
  • FIG. 3 is a specific flowchart of the method provided in the embodiment of this application.
  • the method may further include the following steps:
  • Step S301 Determine a set of correspondences between multiple object feature data and object identifiers as a training data set.
  • the object feature data in this embodiment includes traffic object feature data, which is different from the object body feature data.
  • the traffic object characteristic data may be the traffic object image itself, that is, it includes data of each pixel in the traffic object image.
  • the traffic object characteristic data may also be a traffic object image normalized by the image size, that is, it includes data of each pixel in the traffic object image normalized by the image size.
  • the traffic object feature data may also be feature data representing the category of traffic objects, and so on.
  • the characteristic data of the traffic object adopts the characteristic data that characterizes the category of the traffic object, and this type of characteristic data can be determined using the following steps:
  • Step S3011 Acquire a set of correspondences between traffic environment images and traffic object identifiers including traffic object images.
  • Table 1 shows the set of correspondences involving vehicles in this embodiment.
  • Traffic environment image identification Traffic object identification Traffic environment image notes 1 Vehicle 001 The image taken by the camera c of vehicle A at t1 ... ... ... 100 Vehicle 001 The image taken by the camera f of vehicle C at t3 101 Vehicle 002 The image taken by the camera c of vehicle B at t1 ... ... ... 200 Vehicle 002 The image taken by the camera f of vehicle H at time t3 ... ... ...
  • the corresponding relationship set may include traffic environment images captured by multiple cameras mounted on multiple vehicles at multiple times.
  • the number of traffic environment images corresponding to each vehicle is 100, that is, for one vehicle, 100 traffic environment images including the vehicle are to be collected.
  • Step S3013 Determine the location data of the traffic object image in the traffic environment image through the traffic object detection model, and the traffic object feature map output during the detection process of at least one traffic object feature extraction layer included in the traffic object detection model.
  • step S101 the method of determining the position data of the traffic object image in the traffic environment image has been described, and the details are not repeated here.
  • the traffic object detection model may include one or more traffic object feature extraction layers.
  • the traffic object feature map output by the at least one traffic object feature extraction layer included in the model during the object detection process is also obtained.
  • the network structure of the traffic object detection model may be a convolutional neural network, and the network may include multiple convolutional layers, that is, the traffic object feature extraction layer.
  • the traffic object feature extraction layer can be used to input features from this layer. The image features deeper than the input feature map are extracted in the figure, and these image features form the output feature map. Since the traffic object detection model based on the convolutional neural network is a relatively mature existing technology, it will not be repeated here.
  • the traffic object feature extraction layer of all traffic object feature extraction layers in the traffic object detection model can be selected, or the traffic object feature extraction layer of some traffic object feature extraction layers in the model can be selected. If the traffic object feature extraction layer of all traffic object feature extraction layers in the traffic object detection model is selected, the feature retention is more comprehensive, so the accuracy of the object feature extraction model can be effectively improved, thereby improving the accuracy of the object feature data , But it will take up more computing units and storage units; if you select part of the traffic object feature extraction layer in the traffic object detection model, it will cause the loss of some features, so it will reduce the object feature extraction model. Accuracy, thereby reducing the accuracy of the object body feature data, but can effectively save the calculation unit and storage unit.
  • Step S3015 According to the location data and the at least one traffic object feature map, determine feature data of at least one depth level of the traffic object image as the traffic object feature data.
  • the feature data of each depth level corresponding to the traffic object image can be determined from at least one traffic object feature map based on the location data , And use these feature data as the traffic object feature data.
  • the following steps may be adopted to determine feature data of at least one depth level of the traffic object image:
  • the image size of the at least one traffic object feature map may be the same or different.
  • this step can be implemented in the following manner: According to the image size proportional relationship between the location data and the at least one traffic object feature extraction layer, the traffic object image is acquired in each traffic object feature Characteristic data in the figure.
  • the image size ratio relationship between at least one traffic object feature extraction layer can be determined.
  • the traffic object detection model includes 6 traffic object feature extraction layers, and the size of the traffic environment image is 1000*500 ,
  • the position data of a vehicle in the image is (x left , y left , x right , y right ),
  • the image size of the output feature map of the fifth-level traffic object feature extraction layer is 100*100
  • the image size of the output feature map of the extraction layer is 50*50
  • the ratio of the two layers is 2:1.
  • the output feature of the vehicle in the sixth layer can be determined according to the position data of the vehicle and the image size of the sixth layer.
  • the corresponding position data in the figure, the pixel value in the range is used as the feature data of the sixth depth level, and then the fifth layer is determined based on the image ratio relationship between the fifth layer and the sixth layer and the position data of the sixth layer.
  • the corresponding position data in the output feature map of the layer, the pixel value in the range is used as the feature data of the fifth depth level, and so on.
  • the image data within the position data range of the traffic object feature map may be directly used as the depth level feature data of the layer.
  • the feature extraction layers of different traffic objects can have the same feature dimension or different feature dimensions.
  • the feature data of the same depth level of traffic object images of different sizes in different traffic environment images of the same traffic object have the same feature dimension, that is, the features of the same depth level of traffic object images of different sizes are normalized
  • the dimension of the data is used to calculate the similarity of the two feature data, so as to determine whether the traffic object images of different sizes in different traffic environment images are the same traffic object.
  • the traffic object detection model includes 6 traffic object feature extraction layers, the feature dimension corresponding to the first layer of traffic object feature extraction layer is 1000 dimensions, and the feature dimension corresponding to the second layer of traffic object feature extraction layer is 800 dimensions ,..., the feature dimension corresponding to the sixth-level traffic object feature extraction layer is 900 dimensions.
  • the feature data of the traffic object image in the traffic object feature map can be transformed into feature data having the feature dimension through the ROIAlign operation or the ROIPooling operation.
  • the ROIAlign layer (special layer for target detection) is a method of region feature aggregation, which can solve the problem of region mis-alignment caused by two quantizations in the ROI Pooling operation.
  • region mis-alignment caused by two quantizations in the ROI Pooling operation.
  • the traffic object image detection is performed on the traffic environment image captured by the camera c at time t through the traffic object detection model, and the detected i-th foreground (traffic object image) is recorded as
  • the coordinates of the rectangular frame in the traffic environment image are It is a four-dimensional vector, which represents the x and y coordinates of the upper left corner and the lower right corner, respectively.
  • the object can be obtained after RoiAlign operation K depth-level features ⁇ f 0 , f 1 ,..., f k ⁇ , connect all these features (concatenate) to get the feature of the object
  • Table 2 shows the set of correspondences between the traffic object feature data and the traffic object identifier in this embodiment.
  • this embodiment uses the above steps S3011-S3015 to use the output feature map of the intermediate layer of the traffic object detection model (RefineDet module) and the traffic object image bounding box after traffic object detection. , And use the RoiAlign layer to output fixed-size feature maps, and use the collection of fixed-size feature maps as traffic object feature data.
  • Step S303 Construct a network structure of the object body feature extraction model.
  • the object ontology feature extraction model belongs to the category of “similarity preserving hashing” (Similarity Preserving Hashing). It aims to find a hash mapping function to map the original feature to the Hamming Space while maintaining the original The similarity between features, so the network structure of the model can be a hash network structure.
  • the input data of the network structure is the traffic object feature data
  • the output data is the object body feature data of the traffic object image.
  • the network structure includes a 1*1 convolutional layer and a fully connected layer, and outputs a q-dimensional vector, that is, the feature dimension of the object body is q-dimensional, and the vector is compressed and enriched. "Who is the object" is a feature of very compact information.
  • the network structure may also include multiple convolutional layers, and the size of the convolution kernel may also be determined according to business requirements (such as object recognition accuracy, etc.).
  • Step S305 Based on the triple loss function, learn from the training data set to obtain the object body feature extraction model.
  • a triple loss function is used to train the model of the hash network.
  • the triplet includes two sets of traffic object feature data corresponding to objects with the same object identifier, and a set of traffic object feature data corresponding to other traffic objects.
  • triples are constructed according to the training data set. Specifically, the feature data of traffic objects and traffic object identifiers in the training data are used to construct triples.
  • the definitions of these triples are as follows:
  • the triplet loss function (Triplet Loss) is defined as:
  • is used to control the first distance between the hash codes of similar features of different images of the same traffic object, and the offset of the second distance between the hash codes of dissimilar features of different images of different traffic objects , That is, the difference between the two distances must be at least ⁇ .
  • the loss value of the object body feature extraction model during the training process is determined according to the constructed triples and the triple loss function.
  • the triple loss function makes the first distance and The difference between the second distances reaches a distance threshold ⁇ , the first distance is the distance between the object body feature data corresponding to the two sets of traffic object feature data, and the second distance is the two sets of traffic object feature data The distance between a set of traffic object feature data in and a set of traffic object feature data of other traffic objects; if the loss value reaches a loss threshold, the training of the object body feature extraction model is stopped.
  • quantization processing is performed on the object body feature data output by the object body feature extraction model.
  • a q-dimensional 0-1 vector can be obtained.
  • the 0-1 vector is called a hash code (Hash Code), the hash code already has the distinctive features of the corresponding object.
  • the object body feature extraction model in this embodiment outputs the object body feature data expressed by q-dimensional 0-1 vector.
  • the 0-1 vector can use the built-in bit operation of the computer to greatly speed up the calculation speed; on the other hand, it has high storage efficiency and takes up less memory.
  • the object body feature extraction model and its construction method are described. After the object body feature extraction model is constructed, the model can be used to extract the object body feature data of the traffic object image.
  • step S103 may include the following sub-steps:
  • Step S1031 Obtain the traffic object feature map output by the at least one traffic object feature extraction layer included in the traffic object detection model in the process of detecting the traffic object image; and, obtain the position of the traffic object image in the traffic environment image data.
  • Step S1031 corresponds to the above-mentioned step S3013.
  • step S3013 For related description, please refer to the part of step S3013, which will not be repeated here.
  • Step S1033 Determine feature data of at least one depth level of the traffic object image according to the location data and the at least one traffic object feature map.
  • step S1033 may include the following sub-steps: 1) Determine the feature data of the traffic object image in each traffic object feature map according to the location data; 2) Obtain features corresponding to each traffic object feature extraction layer Dimension; 3) for each traffic object feature map, transform the feature data of the traffic object image in the traffic object feature map into feature data with the feature dimension; 4) extract the feature of each traffic object A collection of feature data having the feature dimension is used as the feature data of the at least one depth level.
  • step 1 may be implemented in the following manner: according to the image size ratio relationship between the location data and the at least one traffic object feature extraction layer, feature data of the traffic object image in each traffic object feature map is acquired.
  • Step S1033 corresponds to the above-mentioned step S3015.
  • step S3015 For related description, please refer to the part of step S3015, which will not be repeated here.
  • Step S1035 Using the object body feature extraction model, determine the object body feature data according to the feature data of the at least one depth level.
  • the feature data of the at least one depth level is used as the traffic object feature data of the traffic object image to be recognized, and these feature data are input to the object body feature extraction model, and the model determines the traffic object image to be recognized.
  • Feature data of the object body is used as the traffic object feature data of the traffic object image to be recognized.
  • Step S105 Determine the similarity between different object images according to the feature data of the object body.
  • step S103 the following step may be further included: converting the real-numbered object body feature data output by the object body feature extraction model into binary object body feature data; correspondingly, step S105 can be adopted This is achieved in the following manner: performing an XOR operation on the binary object body feature data of the different traffic object images as the similarity.
  • step S103 when making predictions using a hash of the trained network were obtained and detected characteristic data of the input object RoiAlign transport network to the hash to give the real characteristic dimension q h i. H and then the present embodiment obtained q-dimensional quantized binary hash code i, i.e. for either dimension, if the value is less than the threshold value [tau], then a value of 0, set to 1 and vice versa.
  • the corresponding 0-1 binary hash code can be obtained by inputting it into the hash network, and the similarity between any two hash codes can be used for the two
  • the number of bits after a hash code XNOR (XNOR) is measured by the number of bits of '1'.
  • Step S107 According to the similarity, different object images corresponding to each object in the multiple environment images are determined.
  • the method provided in the embodiments of this application can not only calculate the similarity between objects of different cameras, but also calculate the similarity between objects at different timings of the same camera. If the similarity is high, it can be considered as the difference between the two. It is the same object, so you can quickly find the related objects in space and time between different cameras.
  • the object recognition method determines the object images in multiple environmental images; determines the object body feature data of the object image through the object body feature extraction model; according to the object body features The data determines the similarity between different object images; according to the similarity, different object images corresponding to each object in the multiple environmental images are determined; this processing method allows the object body feature data of different object images to be extracted, according to The similarity between the feature data of the object body recognizes the same object in different environment images; therefore, the recognition accuracy of the same object that appears in different environment images can be effectively improved.
  • an object recognition method is provided, and correspondingly, the present application also provides an object recognition device.
  • This device corresponds to the embodiment of the above method.
  • FIG. 4 is a schematic diagram of an embodiment of the object recognition device of this application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • This application additionally provides an object recognition device, including:
  • the object image determining unit 401 is configured to determine object images in multiple environment images
  • the object body feature extraction unit 403 is used to determine the object body feature data of the object image through the object body feature extraction model
  • the similarity determination unit 405 is configured to determine the similarity between different object images according to the feature data of the object body
  • the image associating unit 407 is configured to determine different object images respectively corresponding to each object in the multiple environment images according to the similarity.
  • the object body feature extraction model is learned from a training set of object feature data with object identification and annotation data.
  • Optional also includes:
  • the training data determining unit is used to determine a set of correspondences between multiple object feature data and object identifiers as a training data set;
  • a model network construction unit configured to construct a network structure of the object ontology feature extraction model, the input data of the network structure is object feature data, and the output data is the object ontology feature data;
  • the model training unit is configured to learn from the training data set based on a triple loss function to obtain the object body feature extraction model; the triple includes two sets of object feature data corresponding to objects with the same object identifier, and A set of object feature data corresponding to other objects.
  • the object image determining unit 401 is specifically configured to determine the object image through an object detection model
  • the object body feature extraction unit 403 includes:
  • An object feature map obtaining subunit configured to obtain an object feature map output by at least one object feature extraction layer included in the object detection model in the process of detecting an object image
  • a location data acquisition sub-unit for acquiring location data of the object image in the environment image
  • a first feature data determining subunit configured to determine feature data of at least one depth level of the object image according to the position data and the at least one object feature map;
  • the second feature data subunit is used to determine the feature data of the object body according to the feature data of the at least one depth level through the feature extraction model of the object body.
  • the first characteristic data determining subunit includes:
  • the feature data cropping subunit is configured to determine feature data of the object image in each object feature map according to the position data
  • the feature dimension acquisition subunit is used to acquire the feature dimension corresponding to each object feature extraction layer
  • the feature data dimension normalization subunit is configured to transform the feature data of the object image in the object feature map into feature data having the feature dimension for each object feature map;
  • the feature data merging subunit is configured to use a collection of feature data having the feature dimension of each object feature extraction layer as the feature data of the at least one depth level.
  • the feature data cropping subunit is specifically configured to obtain the features of the object image in each object feature map according to the image size ratio relationship between the position data and the at least one object feature extraction layer data.
  • the similarity determination unit 405 includes:
  • the feature data conversion subunit is used to convert the real-numbered object feature data output by the object feature extraction model into binary object feature data
  • the similarity calculation subunit is used to perform an XOR operation on the binary object body feature data of the different object images as the similarity.
  • the multiple environmental images include: traffic environment images at the same time taken by multiple image acquisition devices, traffic environment images at different times taken by the same image acquisition device, and traffic at different times taken by multiple image acquisition devices
  • the objects include: traffic objects.
  • the objects include: vehicles, people, and obstacles.
  • an object recognition method is provided.
  • this application also provides an electronic device.
  • the embodiment of the device corresponds to the embodiment of the above method.
  • FIG. 5 is a schematic diagram of an embodiment of the electronic device of this application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • the present application additionally provides an electronic device, including: a processor 501; and a memory 502 for storing a program for implementing an object recognition method. After the device is powered on and runs the method program through the processor, the following steps are executed: Determine object images in multiple environmental images; determine the object body feature data of the object image through the object body feature extraction model; determine the similarity between different object images according to the object body feature data; according to the similarity , Determining different object images respectively corresponding to each object in the plurality of environment images.
  • an object recognition method is provided.
  • the present application also provides a vehicle.
  • the embodiment of the device corresponds to the embodiment of the above method.
  • FIG. 6 is a schematic diagram of an embodiment of the vehicle of this application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • the present application additionally provides a vehicle, including: at least one image acquisition device 601; a processor 602; and a memory 603 for storing a program for implementing a method for identifying traffic objects.
  • the device is powered on and the processor runs the program for the method Then, perform the following steps: determine the traffic object images in multiple traffic environment images; determine the object body feature data of the traffic object image through the object body feature extraction model; determine different traffic objects based on the object body feature data The degree of similarity between the images; according to the degree of similarity, different traffic object images corresponding to the respective traffic objects in the multiple traffic environment images are determined.
  • an object recognition method is provided.
  • this application also provides a method for constructing an object body feature extraction model. This method corresponds to the embodiment of the above method.
  • FIG. 7 is a flowchart of an embodiment of a method for constructing an object ontology feature extraction model of this application. Since the method embodiment is basically similar to the above method embodiment, the description is relatively simple, and the relevant part can refer to the part of the description of the above method embodiment. The method embodiments described below are only illustrative.
  • This application additionally provides a method for constructing an object ontology feature extraction model, including:
  • Step S701 Determine a set of correspondences between multiple object feature data and object identifiers as a training data set
  • Step S703 Construct a network structure of the object ontology feature extraction model, the input data of the network structure is object feature data, and the output data is the object ontology feature data;
  • Step S705 Based on the triple loss function, learn from the training data set to obtain the object body feature extraction model; the triple includes two sets of object feature data corresponding to objects with the same object identifier, and corresponding to other objects A set of object feature data.
  • the method for constructing an object ontology feature extraction model determines a set of correspondences between multiple object feature data and object identifiers as a training data set; and constructs a network structure of the object ontology feature extraction model.
  • the network structure The input data is the object feature data, and the output data is the object feature data; based on the triple loss function, the object feature extraction model is learned from the training data set; the triple includes objects with the same object identifier
  • This processing method allows the object feature extraction model to be learned from the object feature data training set with object identification data; therefore, it can be effective Improve the accuracy of the object feature extraction model.
  • a method for constructing an object ontology feature extraction model is provided.
  • the present application also provides an object ontology feature extraction model construction device. This device corresponds to the embodiment of the above method.
  • FIG. 8 is a schematic diagram of an embodiment of an apparatus for constructing an object body feature extraction model of this application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment. The device embodiments described below are merely illustrative.
  • This application additionally provides an object body feature extraction model construction device, including:
  • the training data determining unit 801 is configured to determine a set of correspondences between multiple object feature data and object identifiers as a training data set;
  • the model network construction unit 803 is configured to construct a network structure of the object ontology feature extraction model, the input data of the network structure is object feature data, and the output data is the object ontology feature data;
  • the model training unit 805 is configured to learn from the training data set based on the triple loss function to obtain the object body feature extraction model; the triple includes two sets of object feature data corresponding to objects with the same object identifier, A set of object feature data corresponding to other objects.
  • this application also provides an electronic device.
  • the embodiment of the device corresponds to the embodiment of the above method.
  • FIG. 9 is a schematic diagram of an embodiment of the electronic device of this application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described below are merely illustrative.
  • the present application additionally provides an electronic device, including: a processor 901; and a memory 902, for storing a program for implementing the method for constructing an object body feature extraction model.
  • the device After the device is powered on and the program of the method is run through the processor, it executes The following steps: determine the correspondence set between multiple object feature data and object identifiers as a training data set; construct a network structure of the object ontology feature extraction model, the input data of the network structure is object feature data, and the output data is Object ontology feature data; learn from the training data set based on a triple loss function to obtain the object ontology feature extraction model; the triple includes two sets of object feature data corresponding to objects with the same object identifier, and other A set of object feature data corresponding to the object.
  • the computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include non-transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • this application can be provided as methods, systems or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种物体识别方法、装置及车辆,以及一种物体本体特征提取模型构建方法、装置及设备,以及车辆。其中,物体识别方法包括:确定多个环境图像中的物体图像(S101);通过物体本体特征提取模型确定所述物体图像的物体本体特征数据(S103);根据所述物体本体特征数据确定不同物体图像间的相似度(S105);根据所述相似度确定与所述多个环境图像中各个物体分别对应的不同物体图像(S107)。采用这种处理方式在不同环境图像中识别出同一物体,可以有效提升在不同环境图像中出现的同一物体的识别准确率。

Description

物体识别方法、装置及车辆
本申请要求2019年05月20日递交的申请号为201910421145.1、发明名称为“物体识别方法、装置及车辆”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,具体涉及物体识别方法、装置及设备,物体本体特征提取模型构建方法、装置及设备,以及车辆。
背景技术
在自动驾驶领域中,机器感知是重要组成部分。多传感器融合的感知***利用不同类型的传感器探测行驶车辆周围的环境,以使车辆准确判断交通情况。
可见,多传感器数据的准确融合是实现道路交通安全和通向自动驾驶的关键。
多摄像头传感器方案是目前常见的车辆多传感器方案,该方案中的自动驾驶车辆通过装载的多个摄像头观察车辆多个视角,尽可能全方面覆盖周围的环境。自动驾驶的车辆在配置多摄像头方案情况下,需要将不同摄像头的同一物体融合起来,找出不同摄像头拍摄的物体之间的空间及时序对应关系是一个重要课题。在多摄像头方案中,具体涉及两类数据融合处理:1)不同摄像头可能会拍摄到同一个物体,找出不同摄像头拍摄到不同图像中的同一物体,能帮助自动驾驶车辆更好的感知周围环境;2)对于单一摄像头,也需要将不同时刻同一物体关联起来,可以做跟踪,或者辅助其它传感器获取信息,从而更好地感知周围环境。
然而,在实现本发明过程中,发明人发现目前采用的相关技术方案均存在无法准确地对出现在不同交通环境图像中的同一交通物体图像进行关联的问题,换句话说,就是对在不同交通环境图像中出现的同一交通物体的识别准确率较低。
发明内容
本申请提供物体识别方法,以解决现有技术存在的无法准确识别出不同环境图像中的同一物体的问题。本申请另外提供物体识别装置及设备,物体本体特征提取模型构建方法、装置及设备,以及车辆。
本申请提供一种物体识别方法,包括:
确定多个环境图像中的物体图像;
通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;
根据所述物体本体特征数据,确定不同物体图像间的相似度;
根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
可选的,所述物体本体特征提取模型从具有物体标识标注数据的物体特征数据训练集中学习得到。
可选的,还包括:
确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
构建所述物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
可选的,所述确定多个环境图像中的物体图像,包括:
通过物体检测模型,确定所述物体图像;
所述通过物体本体特征提取模型,并确定所述物体图像的物体本体特征数据,包括:
获取所述物体检测模型包括的至少一个物体特征提取层在检测物体图像过程中输出的物体特征图;以及,获取所述物体图像在所述环境图像中的位置数据;
根据所述位置数据和所述至少一个物体特征图,确定所述物体图像的至少一个深度等级的特征数据;
通过所述物体本体特征提取模型,根据所述至少一个深度等级的特征数据确定所述物体本体特征数据。
可选的,所述根据所述位置数据和所述至少一个物体特征图,并确定所述物体图像的至少一个深度等级的特征数据,包括:
根据所述位置数据确定所述物体图像在各个物体特征图中的特征数据;
获取与各个物体特征提取层分别对应的特征维数;
针对各个物体特征图,将所述物体图像在所述物体特征图中的特征数据变换为具有所述特征维数的特征数据;
将各个物体特征提取层的具有所述特征维数的特征数据的合集作为所述至少一个深度等级的特征数据。
可选的,所述根据所述位置数据确定所述物体图像在各个物体特征图中的特征数据,包括:
根据所述位置数据和所述至少一个物体特征提取层之间的图像尺寸比例关系,获取所述物体图像在各个物体特征图中的特征数据。
可选的,所述根据所述物体本体特征数据,并确定不同物体图像间的相似度,包括:
将所述物体本体特征提取模型输出的实数型的物体本体特征数据转换为二进制的物体本体特征数据;
对所述不同物体图像的二进制的物体本体特征数据执行同或运算,作为所述相似度。
可选的,所述多个环境图像包括:多个图像采集装置拍摄的同一时刻的交通环境图像,同一图像采集装置拍摄的不同时刻的交通环境图像,多个图像采集装置拍摄的不同时刻的交通环境图像;
所述物体包括:交通物体。
可选的,所述物体包括:车辆,人,障碍物。
本申请还提供一种物体本体特征提取模型构建方法,包括:
确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
本申请还提供一种物体识别装置,包括:
物体图像确定单元,用于确定多个环境图像中的物体图像;
物体本体特征提取单元,用于通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;
相似度确定单元,用于根据所述物体本体特征数据,确定不同物体图像间的相似度;
图像关联单元,用于根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
本申请还提供一种物体本体特征提取模型构建装置,包括:
训练数据确定单元,用于确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
模型网络构建单元,用于构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
模型训练单元,用于基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
本申请还提供一种电子设备,包括:
处理器;以及
存储器,用于存储实现物体识别方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个环境图像中的物体图像;通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同物体图像间的相似度;根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
本申请还提供一种车辆,包括:
至少一个图像采集装置;
处理器;以及
存储器,用于存储实现交通物体识别方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个交通环境图像中的交通物体图像;通过物体本体特征提取模型,确定所述交通物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同交通物体图像间的相似度;根据所述相似度,确定与所述多个交通环境图像中各个交通物体分别对应的不同交通物体图像。
本申请还提供一种电子设备,包括:
处理器;以及
存储器,用于存储实现物体本体特征提取模型构建方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各种方法。
本申请还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各种方法。
与现有技术相比,本申请具有以下优点:
本申请实施例提供的物体识别方法,通过确定多个环境图像中的物体图像;通过物体本体特征提取模型确定所述物体图像的物体本体特征数据;根据所述物体本体特征数据确定不同物体图像间的相似度;根据所述相似度确定与所述多个环境图像中各个物体分别对应的不同物体图像;这种处理方式,使得提取不同物体图像的物体本体特征数据,根据该物体本体特征数据间的相似度,在不同环境图像中识别出同一物体;因此,可以有效提升在不同环境图像中出现的同一物体的识别准确率。
本申请实施例提供的物体本体特征提取模型构建方法,通过确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据;这种处理方式,使得从具有物体标识标注数据的物体特征数据训练集中学习得到物体本体特征提取模型;因此,可以有效提升物体本体特征提取模型的准确度。
附图说明
图1是本申请提供的物体识别方法的实施例的流程图;
图2是本申请提供的物体识别方法的实施例的具体流程图;
图3是本申请提供的物体识别方法的实施例的具体流程图;
图4是本申请提供的物体识别装置的实施例的示意图;
图5是本申请提供的电子设备的实施例的示意图;
图6是本申请提供的车辆的实施例的示意图;
图7是本申请提供的物体本体特征提取模型构建方法的实施例的流程图;
图8是本申请提供的物体本体特征提取模型构建装置的实施例的示意图;
图9是本申请提供的电子设备的实施例的示意图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很 多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。
在本申请中,提供了物体识别方法、装置及设备,物体本体特征提取模型构建方法、装置及设备,以及车辆。在下面的实施例中逐一对各种方案进行详细说明。
第一实施例
请参考图1,其为本申请提供的一种物体识别方法实施例的流程图。该方法的执行主体包括但不限于无人驾驶车辆,如智能物流车等等,可识别的物体包括交通物体,如道路中行人、车辆、障碍物等等,也可识别其他物体。下面以交通物体的识别为例,对该方法进行说明。本申请提供的一种物体识别方法包括:
步骤S101:确定多个环境图像中的物体图像。
本实施例的多个环境图像为交通环境图像,可以包括多个图像采集装置拍摄的同一时刻的交通环境图像,也可以包括同一图像采集装置拍摄的不同时刻的交通环境图像,还可以包括多个图像采集装置拍摄的不同时刻的交通环境图像。所述图像采集装置,可以是摄像头,也可以是普通相机等等。
本实施例的物体可以是交通物体。所述交通物体,可以是车辆,也可以是人,还可以是障碍物,如树木等等。
本申请实施例提供的方法可应用在拥有一路或多路摄像头的自动驾驶方案中。如图2所示,在多路摄像头的自动驾驶方案中,车辆装载k个摄像头,从k个视角观察车辆周围的环境。这k个摄像头在t n-τ,…,t n-1,t n,这τ+1个时刻共采集了k*(τ+1)张车辆驾驶途中的环境图像,本实施例将其称为交通环境图像。通过本实施例提供的方法,可找出k个摄像头拍摄到的k*(τ+1)张交通环境图像中的同属于同一物体的各个交通物体图像,由此可帮助自动驾驶车辆更好的感知周围环境。在一路摄像头的自动驾驶方案中,车辆只装载1个摄像头,对于单一摄像头,通过本实施例提供的方法,可将τ+1个时刻的同一交通物体关联起来,可以做跟踪,或者辅助其它传感器获取信息,从而更好地感知周围环境。
车辆装载的摄像头拍摄到图像之后,可将交通环境图像数据传输到交通物体检测模型(又称为交通物体探测器),通过该模型检测得到交通物体(交通参与者)及其在交通环境图像中的位置数据,也就是说确定出交通环境图像中的交通物体图像。
所述位置数据,可以是交通物体图像的矩形包围盒的顶点坐标数据,即位置数据可以是一个四维向量,分别表示左上角和右下角的x坐标和y坐标。
如图2所示,在本实施例的所述交通物体检测模型可采用基于深度学习的RefineDet方法,该方法在借鉴SSD这类单阶段方法运行速率快的基础上,又结合了Faster R-CNN这类两阶段方法,因此具有物体检测准确率高的优点。该方法在检测到交通环境图像中的交通物体(对正在行驶的车辆而言交通物体就是障碍物)图像时,即得到交通物体图像的包围盒(bounding box)坐标,即所述交通物体图像在交通环境图像中的位置数据。
在确定多个交通环境图像中的交通物体图像后,就可以进入下一步通过物体本体特征提取模型确定所述交通物体图像的物体本体特征数据。
步骤S103:通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据。
与同一交通物体对应的多个交通环境图像中的不同交通物体图像,通常具有不同的图像尺寸,同时图像的拍摄角度可能也不相同,但是由于这些图像同属一个交通物体,因此这些图像间通常具有相似的特征数据,本申请实施例将这种特征数据称为所述物体本体特征数据。
本申请实施例提供的方法,通过所述物体本体特征提取模型确定所述物体本体特征数据。所述物体本体特征提取模型,可从大量标注有物体标识的交通物体特征数据训练集中学习得到。所述物体本体特征提取模型,可从具有相同或不同特征维数的大量交通物体特征数据中,提取出具有相同维数的物体本体特征数据。
请参考图3,其为本申请实施例提供的方法的具体流程图。在本实施例中,所述方法还可包括如下步骤:
步骤S301:确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集。
本实施例的物体特征数据包括交通物体特征数据,该特征数据不同于所述物体本体特征数据。所述交通物体特征数据,可以是交通物体图像本身,也就是说,包括交通物体图像中各个像素点的数据。所述交通物体特征数据,也可以是图像尺寸归一化的交通物体图像,也就是说,包括图像尺寸归一化后的交通物体图像中各个像素点的数据。所述交通物体特征数据,还可以是表征交通物体类别的特征数据等等。
在本实施例中,所述交通物体特征数据采用所述表征交通物体类别的特征数据,该类特征数据可采用如下步骤确定:
步骤S3011:获取包括交通物体图像的交通环境图像与交通物体标识之间的对应关系集。
表1示出了本实施例中涉及车辆的对应关系集。
表1
交通环境图像标识 交通物体标识 交通环境图像备注
1 车辆001 车辆A的摄像头c在t1时刻拍到的图像
100 车辆001 车辆C的摄像头f在t3时刻拍到的图像
101 车辆002 车辆B的摄像头c在t1时刻拍到的图像
200 车辆002 车辆H的摄像头f在t3时刻拍到的图像
由表1可见,该对应关系集可包括多个车辆装载的多个摄像头在多个时刻拍摄的交通环境图像。在本实施例中,与每个车辆对应的交通环境图像数量为100,也就是说,对于一个车辆而言,要采集100张包括该车辆的交通环境图像。
步骤S3013:通过交通物体检测模型确定交通物体图像在交通环境图像中的位置数据、及所述交通物体检测模型包括的至少一个交通物体特征提取层在检测过程中输出的交通物体特征图。
步骤S101已对确定交通物体图像在交通环境图像中的位置数据的方式进行了说明,此处不再赘述。
所述交通物体检测模型可包括一个或多个交通物体特征提取层,本实施例还要获取该模型包括的至少一个交通物体特征提取层在物体检测过程中输出的交通物体特征图。所述交通物体检测模型的网络结构可以是卷积神经网络,该网络可包括多个卷积层,即所述交通物体特征提取层,通过所述交通物体特征提取层可以从该层的输入特征图中提取出较输入特征图更为深度的图像特征,这些图像特征形成输出特征图。由于基于卷积神经网络的交通物体检测模型属于较为成熟的现有技术,因此此处不再赘述。
具体实施时,可以选取交通物体检测模型中的所有交通物体特征提取层的交通物体特征提取层,也可以选取模型中部分交通物体特征提取层的交通物体特征提取层。如果选取交通物体检测模型中的所有交通物体特征提取层的交通物体特征提取层,则特征保留的比较全面,因此可以有效提升物体本体特征提取模型的准确度,从而提升物体本体特征数据的准确度,但是会占用较多的计算单元及存储单元;如果选取交通物体检测模型中的部分交通物体特征提取层的交通物体特征提取层,则会导致部分特征流失,因此 会降低物体本体特征提取模型的准确度,从而降低物体本体特征数据的准确度,但是可以有效节约计算单元及存储单元。
步骤S3015:根据所述位置数据和所述至少一个交通物体特征图,确定所述交通物体图像的至少一个深度等级的特征数据,作为所述交通物体特征数据。
在确定出交通物体图像在交通环境图像中的位置数据、及交通物体特征图后,就可以根据该位置数据从至少一个交通物体特征图中确定出与交通物体图像对应的各个深度等级的特征数据,将这些特征数据作为所述交通物体特征数据。
在一个示例中,可采用如下步骤确定所述交通物体图像的至少一个深度等级的特征数据:
1)根据所述位置数据确定所述交通物体图像在各个交通物体特征图中的特征数据。
所述至少一个交通物体特征图的图像尺寸可以相同,也可以不同。
在图像尺寸不同的情况下,本步骤可采用如下方式实现:根据所述位置数据和所述至少一个交通物体特征提取层之间的图像尺寸比例关系,获取所述交通物体图像在各个交通物体特征图中的特征数据。
根据交通物体检测模型的网络结构可确定出至少一个交通物体特征提取层之间的图像尺寸比例关系,例如,交通物体检测模型包括6个交通物体特征提取层,交通环境图像的尺寸为1000*500,该图像中一个车辆的位置数据是(x left,y left,x right,y right),第5层交通物体特征提取层的输出特征图的图像尺寸为100*100,第6层交通物体特征提取层的输出特征图的图像尺寸为50*50,则两个层的比例关系为2:1,可先根据车辆的位置数据及第6层的图像尺寸确定该车辆在第6层的输出特征图中的对应位置数据,将该范围内的像素值作为第6深度等级的特征数据,然后再根据第5层与第6层间的图像比例关系和第6层的位置数据,确定出第5层的输出特征图中的对应位置数据,将该范围内的像素值作为第5深度等级的特征数据,等等。
在图像尺寸相同的情况下,可以是直接将交通物体特征图中所述位置数据范围内的图像数据作为该层的深度等级的特征数据。
2)获取与各个交通物体特征提取层分别对应的特征维数。
不同交通物体特征提取层可具有相同的特征维数,也可具有不同的特征维数。通过该步骤使得同一交通物体在不同交通环境图像中的不同大小的交通物体图像的同一深度等级的特征数据具有同样的特征维度,也即归一化不同大小的交通物体图像的同一深度等级的特征数据的维度,以便于计算两个特征数据的相似度,从而确定在不同交通环境 图像中的不同大小的交通物体图像是否为同一交通物体。
例如,交通物体检测模型包括6个交通物体特征提取层,与第1层交通物体特征提取层对应的特征维数为1000维,与第2层交通物体特征提取层对应的特征维数为800维,…,与第6层交通物体特征提取层对应的特征维数为900维。
3)针对各个交通物体特征图,将所述交通物体图像在所述交通物体特征图中的特征数据变换为具有所述特征维数的特征数据。
具体实施时,可通过ROIAlign操作或ROIPooling操作,将所述交通物体图像在所述交通物体特征图中的特征数据变换为具有所述特征维数的特征数据。
在本实施例中,ROIAlign层(目标检测特殊层)是一种区域特征聚集方式,可以解决ROI Pooling操作中两次量化造成的区域不匹配(mis-alignment)的问题。实验显示,在检测任务中将ROI Pooling替换为ROI Align可以提升检测模型的准确性。
最后,将各个交通物体特征提取层的具有所述特征维数的特征数据的合集作为所述交通物体特征数据。
例如,通过所述交通物体检测模型对于摄像头c的时刻t拍摄的交通环境图像进行交通物体图像检测,检测得到的第i个前景(交通物体图像)记为
Figure PCTCN2020089116-appb-000001
它所得到的在交通环境图像中的矩形框坐标为
Figure PCTCN2020089116-appb-000002
它是一个四维的向量,分别表示左上角和右下角的x坐标和y坐标。在基于深度学习的检测模型里面,选定k个中间层产生的特征图{F 0,F 1,…,F k},输入矩形框坐标
Figure PCTCN2020089116-appb-000003
之后,经过RoiAlign操作可以得到该物体
Figure PCTCN2020089116-appb-000004
的k个深度等级的特征{f 0,f 1,…,f k},将这些特征全部连接(concatenate)起来得到该物体的特征
Figure PCTCN2020089116-appb-000005
需要说明的是,由于确定了与各个交通物体特征提取层分别对应的特征维数,因此对于任一摄像头产生的任一个尺寸的物体图像,它们产生的特征
Figure PCTCN2020089116-appb-000006
维度是完全相同的。
表2示出了本实施例中交通物体特征数据与交通物体标识之间的对应关系集。
表2
Figure PCTCN2020089116-appb-000007
由表2可见,对于一个车辆而言,由于其在不同交通环境图像中的位置、大小、拍摄角度的不同,因此该车辆在100张交通环境图像中的交通物体特征数据通常并不相同。
综上所述,本实施例通过上述步骤S3011-S3015实现利用交通物体检测模型(RefineDet模块)的中间层的输出特征图(feature map)和交通物体检测后的交通物体图像包围盒(bounding box),并使用RoiAlign层输出固定大小的特征图,将固定大小特征图的合集作为交通物体特征数据。
步骤S303:构建所述物体本体特征提取模型的网络结构。
所述物体本体特征提取模型,属于“相似性保持哈希”(Similarity Preserving Hashing)范畴,它旨在找到一个哈希映射函数,将原始特征映射到海明空间(Hamming Space),同时要保持原特征之间的相似性,因此该模型的网络结构可以是哈希网络结构。所述网络结构的输入数据为所述交通物体特征数据,输出数据为所述交通物体图像的物体本体特征数据。
在本实施例中,所述网络结构包括1个1*1的卷积层和一个全连接层,输出q维的向量,即物体本体特征维度为q维,该向量是经压缩后富含“物体是谁”信息量非常紧致的特征。具体实施时,所述网络结构也可以包括多个卷积层,卷积核的大小也可以根据业务需求(如物体识别精度等等)确定。
步骤S305:基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型。
本实施例使用三元组损失函数来训练得到哈希网络的模型。所述三元组包括具有相同物体标识的物体对应的两组交通物体特征数据、和其他交通物体对应的一组交通物体特征数据。
在本实施例中,根据训练数据集构造大量三元组,具体而言就是利用训练数据中的交通物体特征数据和交通物体标识构造三元组,这些三元组的定义如下:
{f i,f j,f k}:~(f i,f j)>(f i,f k)
该三元组的含义是交通物体特征数据f i,f j之间的相似度大于f i,f k之间的相似度,在多摄像头的任务中,f i,f j相当于同一个实际交通物体在不同摄像头中的图像块转化成的特征,f k是任意不相关的物体图像的特征。
如果记特征f i经过哈希网络得到的哈希码为h i,则三元组损失函数(Triplet Loss)定义为:
l(h i,h j,h k)=max(0,ξ-||h i-h j||+||h i-h k||)
其中ξ,用来控制同一交通物体的不同图像的相似特征的哈希码之间的第一距离、与不同交通物体的不同图像的不相似特征的哈希码之间第二距离的偏置量,即两个距离之间的差至少要达到ξ。在获得训练数据、构建出哈希网络结构和三元组损失函数的情况下,就可以训练该网络结构得到网络里的权重。
具体而言,在模型训练过程中,根据构造的三元组和三元组损失函数确定所述物体本体特征提取模型在训练过程中的损失值,所述三元组损失函数使得第一距离与第二距离间差值达到距离阈值ξ,所述第一距离是与所述两组交通物体特征数据对应的物体本体特征数据间的距离,所述第二距离是所述两组交通物体特征数据中的一组交通物体特征数据、与其他交通物体的一组交通物体特征数据间的距离;如果所述损失值达到损失阈值,则停止训练所述物体本体特征提取模型。
在本实施例中,对所述物体本体特征提取模型输出的物体本体特征数据执行量化处理,经过量化之后可以得到q维的0-1向量,本实施例将0-1向量称为哈希码(Hash Code),该哈希码已经具有对应物体的显著特征。也就是说,本实施例中的所述物体本体特征提取模型输出q维0-1向量表达的物体本体特征数据。采用这种处理方式,一方面,0-1向量可以使用计算机内置的位操作大大加快运算速度,另一方面,它的存储效率高,占用 内存少。
至此,对所述物体本体特征提取模型及其构建方式进行了说明。构建完成物体本体特征提取模型后,就可以利用该模型提取所述交通物体图像的物体本体特征数据。
在本实施例中,步骤是S103可包括如下子步骤:
步骤S1031:获取所述交通物体检测模型包括的至少一个交通物体特征提取层在检测交通物体图像过程中输出的交通物体特征图;以及,获取所述交通物体图像在所述交通环境图像中在位置数据。
步骤S1031与上述步骤S3013相对应,相关说明强参见步骤S3013部分,此处不再赘述。
步骤S1033:根据所述位置数据和所述至少一个交通物体特征图,确定所述交通物体图像的至少一个深度等级的特征数据。
具体实施时,步骤S1033可包括如下子步骤:1)根据所述位置数据确定所述交通物体图像在各个交通物体特征图中的特征数据;2)获取与各个交通物体特征提取层分别对应的特征维数;3)针对各个交通物体特征图,将所述交通物体图像在所述交通物体特征图中的特征数据变换为具有所述特征维数的特征数据;4)将各个交通物体特征提取层的具有所述特征维数的特征数据的合集作为所述至少一个深度等级的特征数据。
其中,步骤1可采用如下方式实现:根据所述位置数据和所述至少一个交通物体特征提取层之间的图像尺寸比例关系,获取所述交通物体图像在各个交通物体特征图中的特征数据。
步骤S1033与上述步骤S3015相对应,相关说明强参见步骤S3015部分,此处不再赘述。
步骤S1035:通过所述物体本体特征提取模型,根据所述至少一个深度等级的特征数据确定所述物体本体特征数据。
本步骤将所述至少一个深度等级的特征数据作为待识别的交通物体图像的交通物体特征数据,将这些特征数据输入至所述物体本体特征提取模型,通过该模型确定待识别的交通物体图像的所述物体本体特征数据。
步骤S105:根据所述物体本体特征数据,确定不同物体图像间的相似度。
通过上述步骤获得多个待识别的交通环境图像中各个交通物体图像的所述物体本体特征数据,对于其中任意交通物体图像的两两组合,根据各个交通物体图像的物体本体特征数据确定不同交通物体图像间的相似度。
在本实施例中,在步骤S103后,还可包括如下步骤:将所述物体本体特征提取模型输出的实数型的物体本体特征数据转换为二进制的物体本体特征数据;相应的,步骤S105可采用如下方式实现:对所述不同交通物体图像的二进制的物体本体特征数据执行同或运算,作为所述相似度。
步骤S103在使用训练得到的哈希网络做预测时,分别将经检测和RoiAlign得到的交通物体输入特征数据到哈希网络中,得到q维的实数特征h i。本实施例再将h i量化后得到q维的二进制哈希码,即对于任一维度,如果该值小于阈值τ,则取值为0,反之置为1。
本实施例对于每个摄像头在任一时刻检测到的每一个物体,输入到哈希网络中可以得到相应的0-1二进制哈希码,任意两个哈希码之间的相似度可以用这两个哈希码同或(XNOR)后‘1‘的比特位数来衡量,该相似度用两个二进制码之间具有相同二进制位的位数N来衡量,即:二进制位相同的数量(h i,h j)=N;这种处理方式,使得采用计算机内置的同或操作计算相似度;因此,可以有效提升计算速度。
步骤S107:根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
本申请实施例提供的方法,不仅可以对不同摄像头的物体之间分别计算相似度,也可以对同一摄像头不同时序上的物体之间分别计算相似度,相似度高的就可以认为两者之间是同一物体,从而可以快速找到不同摄像头之间在空间和时间上的相关联物体。
从上述实施例可见,本申请实施例提供的物体识别方法,通过确定多个环境图像中的物体图像;通过物体本体特征提取模型确定所述物体图像的物体本体特征数据;根据所述物体本体特征数据确定不同物体图像间的相似度;根据所述相似度确定与所述多个环境图像中各个物体分别对应的不同物体图像;这种处理方式,使得提取不同物体图像的物体本体特征数据,根据该物体本体特征数据间的相似度,在不同环境图像中识别出同一物体;因此,可以有效提升在不同环境图像中出现的同一物体的识别准确率。
第二实施例
在上述的实施例中,提供了一种物体识别方法,与之相对应的,本申请还提供一种物体识别装置。该装置是与上述方法的实施例相对应。
请参看图4,其为本申请的物体识别装置的实施例的示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。 下述描述的装置实施例仅仅是示意性的。
本申请另外提供一种物体识别装置,包括:
物体图像确定单元401,用于确定多个环境图像中的物体图像;
物体本体特征提取单元403,用于通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;
相似度确定单元405,用于根据所述物体本体特征数据,确定不同物体图像间的相似度;
图像关联单元407,用于根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
可选的,所述物体本体特征提取模型从具有物体标识标注数据的物体特征数据训练集中学习得到。
可选的,还包括:
训练数据确定单元,用于确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
模型网络构建单元,用于构建所述物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
模型训练单元,用于基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
可选的,所述物体图像确定单元401,具体用于通过物体检测模型,确定所述物体图像;
所述物体本体特征提取单元403,包括:
物体特征图获取子单元,用于获取所述物体检测模型包括的至少一个物体特征提取层在检测物体图像过程中输出的物体特征图;
位置数据获取子单元,用于获取所述物体图像在所述环境图像中的位置数据;
第一特征数据确定子单元,用于根据所述位置数据和所述至少一个物体特征图,确定所述物体图像的至少一个深度等级的特征数据;
第二特征数据子单元,用于通过所述物体本体特征提取模型,根据所述至少一个深度等级的特征数据确定所述物体本体特征数据。
可选的,所述第一特征数据确定子单元包括:
特征数据裁剪子单元,用于根据所述位置数据确定所述物体图像在各个物体特征图中的特征数据;
特征维数获取子单元,用于获取与各个物体特征提取层分别对应的特征维数;
特征数据维数归一化子单元,用于针对各个物体特征图,将所述物体图像在所述物体特征图中的特征数据变换为具有所述特征维数的特征数据;
特征数据合并子单元,用于将各个物体特征提取层的具有所述特征维数的特征数据的合集作为所述至少一个深度等级的特征数据。
可选的,所述特征数据裁剪子单元,具体用于根据所述位置数据和所述至少一个物体特征提取层之间的图像尺寸比例关系,获取所述物体图像在各个物体特征图中的特征数据。
可选的,相似度确定单元405包括:
特征数据转换子单元,用于将所述物体本体特征提取模型输出的实数型的物体本体特征数据转换为二进制的物体本体特征数据;
相似度计算子单元,用于对所述不同物体图像的二进制的物体本体特征数据执行同或运算,作为所述相似度。
可选的,所述多个环境图像包括:多个图像采集装置拍摄的同一时刻的交通环境图像,同一图像采集装置拍摄的不同时刻的交通环境图像,多个图像采集装置拍摄的不同时刻的交通环境图像;相应的,所述物体包括:交通物体。
可选的,所述物体包括:车辆,人,障碍物。
第三实施例
在上述的实施例中,提供了一种物体识别方法,与之相对应的,本申请还提供一种电子设备。该设备的实施例是与上述方法的实施例相对应。
请参看图5,其为本申请的电子设备的实施例的示意图。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的设备实施例仅仅是示意性的。
本申请另外提供一种电子设备,包括:处理器501;以及存储器502,用于存储实现物体识别方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个环境图像中的物体图像;通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同物体图像间的相似度;根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
第四实施例
在上述的实施例中,提供了一种物体识别方法,与之相对应的,本申请还提供一种车辆。该设备的实施例是与上述方法的实施例相对应。
请参看图6,其为本申请的车辆的实施例的示意图。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的设备实施例仅仅是示意性的。
本申请另外提供一种车辆,包括:至少一个图像采集装置601;处理器602;以及存储器603,用于存储实现交通物体识别方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个交通环境图像中的交通物体图像;通过物体本体特征提取模型,确定所述交通物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同交通物体图像间的相似度;根据所述相似度,确定与所述多个交通环境图像中各个交通物体分别对应的不同交通物体图像。
第五实施例
在上述的实施例中,提供了一种物体识别方法,与之相对应的,本申请还提供一种物体本体特征提取模型构建方法。该方法是与上述方法的实施例相对应。
请参看图7,其为本申请的物体本体特征提取模型构建方法的实施例的流程图。由于该方法实施例基本相似于上述方法实施例,所以描述得比较简单,相关之处参见上述方法实施例的部分说明即可。下述描述的方法实施例仅仅是示意性的。
本申请另外提供一种物体本体特征提取模型构建方法,包括:
步骤S701:确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
步骤S703:构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
步骤S705:基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
本申请实施例提供的物体本体特征提取模型构建方法,通过确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特 征数据;这种处理方式,使得从具有物体标识标注数据的物体特征数据训练集中学习得到物体本体特征提取模型;因此,可以有效提升物体本体特征提取模型的准确度。
第六实施例
在上述的实施例中,提供了一种物体本体特征提取模型构建方法,与之相对应的,本申请还提供一种物体本体特征提取模型构建装置。该装置是与上述方法的实施例相对应。
请参看图8,其为本申请的物体本体特征提取模型构建装置的实施例的示意图。由于装置实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的装置实施例仅仅是示意性的。
本申请另外提供一种物体本体特征提取模型构建装置,包括:
训练数据确定单元801,用于确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
模型网络构建单元803,用于构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
模型训练单元805,用于基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
第七实施例
在上述的实施例中,提供了一种物体本体特征提取模型构建方法,与之相对应的,本申请还提供一种电子设备。该设备的实施例是与上述方法的实施例相对应。
请参看图9,其为本申请的电子设备的实施例的示意图。由于设备实施例基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。下述描述的设备实施例仅仅是示意性的。
本申请另外提供一种电子设备,包括:处理器901;以及存储器902,用于存储实现物体本体特征提取模型构建方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因此本申请的保护范围应当以本申请权利要求所界定的范围为准。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
2、本领域技术人员应明白,本申请的实施例可提供为方法、***或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (15)

  1. 一种物体识别方法,其特征在于,包括:
    确定多个环境图像中的物体图像;
    通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;
    根据所述物体本体特征数据,确定不同物体图像间的相似度;
    根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
  2. 根据权利要求1所述的方法,其特征在于,所述物体本体特征提取模型从具有物体标识标注数据的物体特征数据训练集中学习得到。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
    构建所述物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
    基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
  4. 根据权利要求1所述的方法,其特征在于,
    所述确定多个环境图像中的物体图像,包括:
    通过物体检测模型,确定所述物体图像;
    所述通过物体本体特征提取模型,并确定所述物体图像的物体本体特征数据,包括:
    获取所述物体检测模型包括的至少一个物体特征提取层在检测物体图像过程中输出的物体特征图;以及,获取所述物体图像在所述环境图像中的位置数据;
    根据所述位置数据和所述至少一个物体特征图,确定所述物体图像的至少一个深度等级的特征数据;
    通过所述物体本体特征提取模型,根据所述至少一个深度等级的特征数据确定所述物体本体特征数据。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述位置数据和所述至少一个物体特征图,并确定所述物体图像的至少一个深度等级的特征数据,包括:
    根据所述位置数据确定所述物体图像在各个物体特征图中的特征数据;
    获取与各个物体特征提取层分别对应的特征维数;
    针对各个物体特征图,将所述物体图像在所述物体特征图中的特征数据变换为具有 所述特征维数的特征数据;
    将各个物体特征提取层的具有所述特征维数的特征数据的合集作为所述至少一个深度等级的特征数据。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述位置数据确定所述物体图像在各个物体特征图中的特征数据,包括:
    根据所述位置数据和所述至少一个物体特征提取层之间的图像尺寸比例关系,获取所述物体图像在各个物体特征图中的特征数据。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述物体本体特征数据,并确定不同物体图像间的相似度,包括:
    将所述物体本体特征提取模型输出的实数型的物体本体特征数据转换为二进制的物体本体特征数据;
    对所述不同物体图像的二进制的物体本体特征数据执行同或运算,作为所述相似度。
  8. 根据权利要求1所述的方法,其特征在于,
    所述多个环境图像包括:多个图像采集装置拍摄的同一时刻的交通环境图像,同一图像采集装置拍摄的不同时刻的交通环境图像,多个图像采集装置拍摄的不同时刻的交通环境图像;
    所述物体包括:交通物体。
  9. 根据权利要求1所述的方法,其特征在于,所述物体包括:车辆,人,障碍物。
  10. 一种物体本体特征提取模型构建方法,其特征在于,包括:
    确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
    构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
    基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
  11. 一种物体识别装置,其特征在于,包括:
    物体图像确定单元,用于确定多个环境图像中的物体图像;
    物体本体特征提取单元,用于通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;
    相似度确定单元,用于根据所述物体本体特征数据,确定不同物体图像间的相似度;
    图像关联单元,用于根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
  12. 一种物体本体特征提取模型构建装置,其特征在于,包括:
    训练数据确定单元,用于确定多个物体特征数据与物体标识之间的对应关系集,作为训练数据集;
    模型网络构建单元,用于构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;
    模型训练单元,用于基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
  13. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储实现物体识别方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个环境图像中的物体图像;通过物体本体特征提取模型,确定所述物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同物体图像间的相似度;根据所述相似度,确定与所述多个环境图像中各个物体分别对应的不同物体图像。
  14. 一种车辆,其特征在于,包括:
    至少一个图像采集装置;
    处理器;以及
    存储器,用于存储实现交通物体识别方法的程序,该车辆通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个交通环境图像中的交通物体图像;通过物体本体特征提取模型,确定所述交通物体图像的物体本体特征数据;根据所述物体本体特征数据,确定不同交通物体图像间的相似度;根据所述相似度,确定与所述多个交通环境图像中各个交通物体分别对应的不同交通物体图像。
  15. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储实现物体本体特征提取模型构建方法的程序,该设备通电并通过所述处理器运行该方法的程序后,执行下述步骤:确定多个物体特征数据与物体标识之 间的对应关系集,作为训练数据集;构建物体本体特征提取模型的网络结构,所述网络结构的输入数据为物体特征数据,输出数据为物体本体特征数据;基于三元组损失函数,从所述训练数据集中学习得到所述物体本体特征提取模型;所述三元组包括具有相同物体标识的物体对应的两组物体特征数据、和其他物体对应的一组物体特征数据。
PCT/CN2020/089116 2019-05-20 2020-05-08 物体识别方法、装置及车辆 WO2020233414A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910421145.1A CN111967290B (zh) 2019-05-20 2019-05-20 物体识别方法、装置及车辆
CN201910421145.1 2019-05-20

Publications (1)

Publication Number Publication Date
WO2020233414A1 true WO2020233414A1 (zh) 2020-11-26

Family

ID=73358241

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089116 WO2020233414A1 (zh) 2019-05-20 2020-05-08 物体识别方法、装置及车辆

Country Status (2)

Country Link
CN (1) CN111967290B (zh)
WO (1) WO2020233414A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766174A (zh) * 2021-01-21 2021-05-07 哈尔滨市科佳通用机电股份有限公司 一种铁路列车车厢组底板丢失故障检测方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020001398A1 (en) * 2000-06-28 2002-01-03 Matsushita Electric Industrial Co., Ltd. Method and apparatus for object recognition
CN106097391A (zh) * 2016-06-13 2016-11-09 浙江工商大学 一种基于深度神经网络的识别辅助的多目标跟踪方法
CN106709528A (zh) * 2017-01-10 2017-05-24 深圳大学 基于多目标函数深度学习的车辆重识别方法及装置
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
CN108345837A (zh) * 2018-01-17 2018-07-31 浙江大学 一种基于人体区域对齐化特征表达学习的行人再识别方法
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN109740413A (zh) * 2018-11-14 2019-05-10 平安科技(深圳)有限公司 行人重识别方法、装置、计算机设备及计算机存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102400017B1 (ko) * 2017-05-17 2022-05-19 삼성전자주식회사 객체를 식별하는 방법 및 디바이스
KR20180133657A (ko) * 2017-06-07 2018-12-17 한화에어로스페이스 주식회사 기계 학습을 이용한 다중 뷰포인트 차량 인식 장치
US20190095764A1 (en) * 2017-09-26 2019-03-28 Panton, Inc. Method and system for determining objects depicted in images
CN109657533B (zh) * 2018-10-27 2020-09-25 深圳市华尊科技股份有限公司 行人重识别方法及相关产品

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020001398A1 (en) * 2000-06-28 2002-01-03 Matsushita Electric Industrial Co., Ltd. Method and apparatus for object recognition
CN106097391A (zh) * 2016-06-13 2016-11-09 浙江工商大学 一种基于深度神经网络的识别辅助的多目标跟踪方法
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
CN106709528A (zh) * 2017-01-10 2017-05-24 深圳大学 基于多目标函数深度学习的车辆重识别方法及装置
CN108345837A (zh) * 2018-01-17 2018-07-31 浙江大学 一种基于人体区域对齐化特征表达学习的行人再识别方法
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN109740413A (zh) * 2018-11-14 2019-05-10 平安科技(深圳)有限公司 行人重识别方法、装置、计算机设备及计算机存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766174A (zh) * 2021-01-21 2021-05-07 哈尔滨市科佳通用机电股份有限公司 一种铁路列车车厢组底板丢失故障检测方法
CN112766174B (zh) * 2021-01-21 2021-10-15 哈尔滨市科佳通用机电股份有限公司 一种铁路列车车厢组底板丢失故障检测方法

Also Published As

Publication number Publication date
CN111967290A (zh) 2020-11-20
CN111967290B (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
CN107833236B (zh) 一种动态环境下结合语义的视觉定位***和方法
US11643076B2 (en) Forward collision control method and apparatus, electronic device, program, and medium
CN110609920B (zh) 一种视频监控场景下的行人混合搜索方法及***
Munaro et al. Tracking people within groups with RGB-D data
WO2022142855A1 (zh) 回环检测方法、装置、终端设备和可读存储介质
US11205276B2 (en) Object tracking method, object tracking device, electronic device and storage medium
US10685263B2 (en) System and method for object labeling
CN112016402B (zh) 基于无监督学习的行人重识别领域自适应方法及装置
WO2019007253A1 (zh) 图像识别方法、装置及设备、可读介质
Ji et al. Integrating visual selective attention model with HOG features for traffic light detection and recognition
WO2022095514A1 (zh) 图像检测方法、装置、电子设备及存储介质
Ren et al. Parallel RCNN: A deep learning method for people detection using RGB-D images
Jiang et al. A tree-based approach to integrated action localization, recognition and segmentation
WO2020233414A1 (zh) 物体识别方法、装置及车辆
Kim et al. Combined visually and geometrically informative link hypothesis for pose-graph visual SLAM using bag-of-words
Zhu et al. Boosting RGB-D salient object detection with adaptively cooperative dynamic fusion network
Li et al. Loop closure detection based on image semantic segmentation in indoor environment
CN112668662B (zh) 基于改进YOLOv3网络的野外山林环境目标检测方法
CN111881775B (zh) 一种人脸实时识别方法和装置
Ciarfuglia et al. A discriminative approach for appearance based loop closing
Krinski et al. Masking salient object detection, a mask region-based convolutional neural network analysis for segmentation of salient objects
Tsintotas et al. Visual place recognition for simultaneous localization and mapping
Jafari et al. Real-time RGB-D based template matching pedestrian detection
Shf et al. Review on deep based object detection
Lu et al. Monocular multi-kernel based lane marking detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20809765

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20809765

Country of ref document: EP

Kind code of ref document: A1