CN117115625A

CN117115625A - Unseen environmental classification

Info

Publication number: CN117115625A
Application number: CN202210515198.1A
Authority: CN
Inventors: 古萨姆·肖林格; S·孙达尔; 金尼什·简; 什里亚莎·波德尔
Original assignee: Ford Global Technologies LLC
Current assignee: Ford Global Technologies LLC
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2023-11-24

Abstract

The present disclosure provides "unseen environmental classification". A system comprising a computer comprising a processor and a memory, the memory comprising instructions such that the processor is programmed to: processing vehicle sensor data with a deep neural network to generate predictions indicative of one or more objects based on the data and determining an object uncertainty corresponding to the predictions, and partitioning the vehicle sensor data into a foreground portion and a background portion when the object uncertainty is greater than an uncertainty threshold. Classifying the foreground portion as including an unseen object class when the foreground uncertainty is greater than a foreground uncertainty threshold; classifying the background portion as including an unseen background when the background uncertainty is greater than a background uncertainty threshold; and transmitting the data and the data classifications to a server.

Description

Unseen environmental classification

Technical Field

The present disclosure relates to neural networks in vehicles.

Background

Deep Neural Networks (DNNs) may be used to perform many image understanding tasks, including classification, segmentation, and subtitle generation. For example, a convolutional neural network may take an image as input, assign importance to various aspects/objects depicted within the image, and distinguish the aspects/objects from one another.

Disclosure of Invention

Autonomous vehicles typically employ a sensing algorithm to sense the environment surrounding the vehicle. The perception algorithm may use one or more deep neural networks to assist in detecting and/or classifying objects. When the environment of the vehicle changes, the perception system of the vehicle should be able to learn from unexpected results, such as detected objects that the perception system cannot confidently identify. Identifying data (such as a data set) with domain shifts or out-of-distribution data points can be challenging. The domain shift corresponds to a major change in the vehicle environment. The out-of-distribution data points may be objects that were not previously seen in the familiar environment.

As discussed herein, a computer may implement a neural network that identifies data that includes unseen scenes. An unseen scene may be defined as a new object class, an environmental condition, or a combination of object class and environmental condition that is not included in the data used to train the neural network. For example, features comprising an unseen scene depicted within an image may cause the neural network to generate erroneous predictions. Vehicle sensor data corresponding to a scene may be processed with a deep neural network to generate predictions indicative of one or more objects based on the data and to determine object uncertainties corresponding to the predictions. Object uncertainty is a probability that a prediction indicating one or more objects correctly identifies the one or more objects.

The neural network may utilize a probabilistic deep neural network (such as a bayesian neural network) to capture uncertain objects and/or contexts that may identify dataset shifts and/or out-of-distribution data. After unreliable predictions have been identified, data associated with the predictions may be annotated to make more accurate predictions in the future. One measure of uncertainty in neural networks is cognitive uncertainty. Cognitive uncertainty is defined as a measure of the degree to which a given input is represented in a training dataset. For example, the cognitive uncertainty indicates that the neural network has not been trained with enough training samples to allow the neural network to generate the correct predictions. The correct predictions are predictions that match the ground truth provided to the neural network during training. Ground truth is data determined by means independent of the neural network, such as by a human being determining the content of input data provided to the neural network during training.

Disclosed herein is a method comprising: processing vehicle sensor data using a deep neural network to generate predictions indicative of one or more objects based on the vehicle sensor data, and determining an object uncertainty corresponding to the predictions, and then upon determining that the object uncertainty is greater than an uncertainty threshold: the vehicle sensor data is segmented into a foreground portion and a background portion, the foreground portion is classified as including an unseen object class when the foreground cognitive uncertainty is greater than a foreground cognitive uncertainty threshold, the background portion is classified as including an unseen background when the background cognitive uncertainty is greater than a background cognitive uncertainty threshold, and the data and data classifications are transmitted to a server. The processor may also be programmed to operate the vehicle based on the predictions indicative of one or more objects. The object uncertainty may be a probability that the prediction indicating one or more objects correctly identified the one or more objects.

The foreground cognitive uncertainty may be a probability measure of the extent to which the one or more objects are represented in the training data distribution. The background cognitive uncertainty may be a probabilistic measure of how much a noise factor is represented in the training data distribution, where the noise factor includes weather conditions, lighting conditions, and surface conditions. The foreground portion may be mapped to a potential representation, the potential representation may be mapped to a reconstruction of the foreground portion, and the foreground cognitive uncertainty may be determined based on a comparison of the reconstructed foreground portion to the foreground portion. The background portion may be mapped to a potential representation, which may be mapped to a reconstruction of the background portion; and determining the background cognitive uncertainty may be based on a comparison of the reconstructed background portion to the background portion. The vehicle sensor data may include at least one of an image or a point cloud. The deep neural network may comprise a probabilistic neural network. The vehicle sensor data may be segmented into the foreground portion and the background portion via a segmentation mask via a segmenter. The segmentation mask may include a binary mask classifying objects within the vehicle sensor data, wherein the classified objects are assigned to the foreground portion. The segmenter may include a Mask R convolutional neural network (Mask R-CNN). The object may be a vehicle trailer and the deep neural network outputs a trailer angle. The trailer angle may describe a direction in which the vehicle trailer will travel in response to reversing the vehicle.

A computer readable medium storing program instructions for performing some or all of the above method steps is disclosed. Also disclosed is a computer programmed to perform some or all of the above method steps, comprising a computer device programmed to process vehicle sensor data using a deep neural network to generate predictions indicative of one or more objects based on the vehicle sensor data, and to determine an object uncertainty corresponding to the predictions, and then upon determining that the object uncertainty is greater than an uncertainty threshold: the vehicle sensor data is segmented into a foreground portion and a background portion, the foreground portion is classified as including an unseen object class when the foreground cognitive uncertainty is greater than a foreground cognitive uncertainty threshold, the background portion is classified as including an unseen background when the background cognitive uncertainty is greater than a background cognitive uncertainty threshold, and the data and data classifications are transmitted to a server. The processor may also be programmed to operate the vehicle based on the predictions indicative of one or more objects. The object uncertainty may be a probability that the prediction indicating one or more objects correctly identified the one or more objects.

The computer may also be programmed to determine the foreground cognitive uncertainty, which may be a probability measure of the extent to which the one or more objects are represented in the training data distribution. The background cognitive uncertainty may be a probabilistic measure of how much a noise factor is represented in the training data distribution, where the noise factor includes weather conditions, lighting conditions, and surface conditions. The foreground portion may be mapped to a potential representation, the potential representation may be mapped to a reconstruction of the foreground portion, and the foreground cognitive uncertainty may be determined based on a comparison of the reconstructed foreground portion to the foreground portion. The background portion may be mapped to a potential representation, which may be mapped to a reconstruction of the background portion; and determining the background cognitive uncertainty may be based on a comparison of the reconstructed background portion to the background portion. The vehicle sensor data may include at least one of an image or a point cloud. The deep neural network may comprise a probabilistic neural network. The vehicle sensor data may be segmented into the foreground portion and the background portion via a segmentation mask via a segmenter. The segmentation mask may include a binary mask classifying objects within the vehicle sensor data, wherein the classified objects are assigned to the foreground portion. The segmenter may include a Mask R convolutional neural network (Mask R-CNN). The object may be a vehicle trailer and the deep neural network outputs a trailer angle. The trailer angle may describe a direction in which the vehicle trailer will travel in response to reversing the vehicle.

Drawings

FIG. 1 is a diagram of an exemplary system including a vehicle.

FIG. 2 is a diagram of an exemplary server within a system.

Fig. 3 is a diagram of an exemplary deep neural network.

Fig. 4 is a diagram of an exemplary sensing network and an unseen scene detection neural network.

Fig. 5 is an illustration of an exemplary vehicle trailer.

FIG. 6 is a flow chart illustrating an exemplary process for identifying an unseen scene within data where one or more deep neural networks have not been trained.

Detailed Description

FIG. 1 is a block diagram of an exemplary vehicle system 100. The system 100 includes a vehicle 105, which is a land vehicle such as an automobile, truck, or the like. The vehicle 105 includes a computer 110, vehicle sensors 115, actuators 120 for actuating various vehicle components 125, and a vehicle communication module 130. The communication module 130 allows the computer 110 to communicate with a server 145 via a network 135.

The computer 110 includes a processor and a memory. The memory includes one or more forms of computer-readable media and stores instructions executable by the computer 110 to perform various operations, including operations as disclosed herein.

The computer 110 may operate the vehicle 105 in an autonomous mode, a semi-autonomous mode, or a non-autonomous (manual) mode. For purposes of this disclosure, autonomous mode is defined as a mode in which each of the propulsion, braking, and steering of the vehicle 105 is controlled by the computer 110; in semi-autonomous mode, the computer 110 controls one or both of propulsion, braking, and steering of the vehicle 105; in the non-autonomous mode, a human operator controls each of the vehicle 105 propulsion, braking, and steering.

The computer 110 may include one or more of programming to operate the vehicle 105 to brake, propel (e.g., control acceleration of the vehicle by controlling one or more of an internal combustion engine, an electric motor, a hybrid engine, etc.), turn, climate control, interior lights and/or exterior lights, etc., and to determine whether and when the computer 110 (and not a human operator) controls such operations. In addition, the computer 110 may be programmed to determine whether and when a human operator controls such operations.

The computer 110 may include or be communicatively coupled to more than one processor, such as included in an Electronic Controller Unit (ECU) or the like (e.g., powertrain controller, brake controller, steering controller, etc.) included in the vehicle 105 for monitoring and/or controlling various vehicle components 125, such as via a communication module 130 of the vehicle 105 as further described below. In addition, the computer 110 may communicate with a navigation system using a Global Positioning System (GPS) via a communication module 130 of the vehicle 105. As an example, the computer 110 may request and receive location data of the vehicle 105. The location data may be Conventional methodFormat such as geographical coordinates (latitude and longitude coordinates).

The computer 110 is typically arranged to communicate by means of the vehicle 105 communication module 130 and also by means of a wired and/or wireless network inside the vehicle 105 (e.g. a bus in the vehicle 105, etc., such as a Controller Area Network (CAN), etc.) and/or other wired and/or wireless mechanisms.

Via the vehicle 105 communication network, the computer 110 may transmit and/or receive messages to and/or from various devices in the vehicle 105, such as vehicle sensors 115, actuators 120, vehicle components 125, human-machine interfaces (HMI), and the like. Alternatively or additionally, where the computer 110 actually includes a plurality of devices, the vehicle 105 communication network may be used for communication between the devices represented in this disclosure as the computer 110. Further, as mentioned below, various controllers and/or vehicle sensors 115 may provide data to the computer 110.

The vehicle sensors 115 may include a variety of devices such as are known for providing data to the computer 110. For example, the vehicle sensors 115 may include light detection and ranging (lidar) sensors 115 or the like disposed on top of the vehicle 105, behind a front windshield of the vehicle 105, around the vehicle 105, etc., that provide relative location, size, and shape of objects around the vehicle 105, and/or conditions of the surroundings. As another example, one or more radar sensors 115 secured to the bumper of the vehicle 105 may provide data to provide speed of an object (possibly including a second vehicle) or the like relative to the position of the vehicle 105 and ranging. The vehicle sensors 115 may also include camera sensors 115 (e.g., front view, side view, rear view, etc.) that provide images from a field of view of the interior and/or exterior of the vehicle 105.

The vehicle 105 actuators 120 are implemented via circuits, chips, motors, or other electronic and/or mechanical components that may actuate various vehicle subsystems according to appropriate control signals as is known. The actuators 120 may be used to control components 125, including braking, acceleration, and steering of the vehicle 105.

In the context of the present disclosure, the vehicle component 125 is one or more hardware components adapted to perform mechanical or electromechanical functions or operations, such as moving the vehicle 105, decelerating or stopping the vehicle 105, steering the vehicle 105, and the like. Non-limiting examples of components 125 include propulsion components (which include, for example, an internal combustion engine and/or an electric motor, etc.), transmission components, steering components (which may include, for example, one or more of a steering wheel, a steering rack, etc.), braking components (as described below), parking assist components, adaptive cruise control components, adaptive steering components, movable seats, etc.

Further, the computer 110 may be configured to communicate with devices external to the vehicle 105 via the vehicle-to-vehicle communication module 130, for example, with another vehicle, a remote server 145 (typically via the network 135) through vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2X) wireless communications. The communication module 130 may include one or more mechanisms by which the computer 110 may communicate, including any desired combination of wireless (e.g., cellular, wireless, satellite, microwave, and radio frequency) communication mechanisms, as well as any desired network topology (or topologies when multiple communication mechanisms are utilized). Exemplary communications provided via the communication module 130 include a cellular network providing data communication services, IEEE 802.11, dedicated Short Range Communications (DSRC), and/or Wide Area Networks (WANs) including the internet.

The network 135 may be one or more of a variety of wired or wireless communication mechanisms, including any desired combination of wired (e.g., cable and fiber) and/or wireless (e.g., cellular, wireless, satellite, microwave, and radio frequency) communication mechanisms and any desired network topology (or topologies where multiple communication mechanisms are utilized). Exemplary communication networks include wireless communication networks (e.g., using bluetooth, bluetooth Low Energy (BLE), IEEE 802.11, vehicle-to-vehicle (V2V), such as Dedicated Short Range Communication (DSRC), etc.), local Area Networks (LANs), and/or Wide Area Networks (WANs), including the internet, that provide data communication services.

The computer 110 may receive and analyze data from the sensors 115 substantially continuously, periodically, and/or when instructed by the server 145, etc. Further, object classification or identification techniques may be used in, for example, computer 110 to identify the type of object (e.g., vehicle, person, rock, pothole, bicycle, motorcycle, etc.) and the physical characteristics of the object based on data from lidar sensor 115, camera sensor 115, etc.

Fig. 2 is a block diagram of an exemplary server 145. The server 145 includes a computer 235 and a communication module 240. The computer 235 includes a processor and a memory. The memory includes one or more forms of computer-readable media and stores instructions executable by the computer 235 for performing various operations, including operations as disclosed herein. The communication module 240 allows the computer 235 to communicate with other devices, such as the vehicle 105.

Fig. 3 is an illustration of an exemplary Deep Neural Network (DNN) 300. DNN 300 may represent one or more neural networks described herein. DNN 300 includes a plurality of nodes 305, and nodes 305 are arranged such that DNN 300 includes an input layer, one or more hidden layers, and an output layer. Each layer of DNN 300 may include a plurality of nodes 305. Although fig. 3 shows three (3) hidden layers, it should be understood that DNN 300 may include additional or fewer hidden layers. The input and output layers may also include more than one (1) node 305.

Nodes 305 are sometimes referred to as artificial neurons 305 because they are designed to mimic living beings, such as human neurons. A set of inputs (represented by arrows) for each artificial neuron 305 are each multiplied by a respective weight. The weighted inputs may then be summed in an input function to provide a net input with possible adjustment by bias. The net input may then be provided to an activation function, which in turn provides an output for the connected artificial neuron 305. The activation function may be a variety of suitable functions that are typically selected based on empirical analysis. As indicated by the arrows in fig. 3, the artificial neuron 305 output may then be provided for inclusion in a set of inputs of one or more artificial neurons 305 in a next layer.

DNN 300 may be trained to accept data as input and generate output based on the input. DNN 300 may be trained with ground truth data (i.e., data regarding real world conditions or states). For example, DNN 300 may be trained with ground truth data or updated by a processor with additional data. For example, the weights may be initialized by using a gaussian distribution, and the bias of each node 305 may be set to zero. Training DNN 300 may include updating weights and biases via suitable techniques, such as back propagation plus optimization.

Back propagation is a technique of returning the output from DNN 300 to the input for comparison with ground truth corresponding to the test data. In this example, during training, the tag and occlusion probabilities may be back-propagated to be compared to tag and occlusion probabilities included in the ground truth to determine the loss function. The loss function determines the accuracy with which DNN 300 processes DNN 300. DNN 300 may be performed multiple times on foreground and background data while changing parameters that control the processing of DNN 300. Parameters corresponding to correct answers confirmed by a loss function comparing the output with the ground truth are saved as candidate parameters. After the test runs, the candidate parameters that yield the most correct results are saved as parameters that can be used to program DNN 300 during operation. The ground truth data may include, but is not limited to, data specifying whether portions of an image are foreground portions of an image or background portions of an image. For example, ground truth data may be data representing foreground and background data and corresponding tags. In an exemplary implementation, pixels of an image may be classified such that pixels corresponding to one or more objects are classified into a category, such as people, vehicles, signs, and the like. DNN 300 may be trained at server 145 and provided to vehicle 105 via communication network 135. DNN 300 may include one or more of the probabilistic neural networks, convolutional neural networks, auto-encoders, variant auto-encoders, sparse auto-encoders, recurrent neural networks, deconvolution networks, etc., discussed herein.

Fig. 4 illustrates an exemplary deep neural network 400 of the vehicle 105 that includes a perception network 405 and an unseen scene detection neural network 410. The sensing network 405 may include one or more DNNs 300 that may use the sensor 115 data to detect and/or sense the vehicle environment. The sensing network 405 may receive data, such as sensor 115 data, and use the trained probabilistic neural network 407 to predict objects in the sensed vehicle environment. Probabilistic neural network 407 may be a trained deep neural network (e.g., DNN 300) that receives data (e.g., images or point clouds) and generates predictions indicative of one or more objects depicted within the data. For example, the probabilistic neural network 407 may be trained using conventional image detection and/or image classification techniques. Some data may be associated with objects that were not observed during previous training of the sensory network 405. Thus, the probabilistic neural network 407 may not be able to identify one or more objects with a high level of certainty. As used herein, an object class may be defined as a label of a particular object predicted to be within a perceived vehicle environment.

The probabilistic neural network 407 may generate predictions based on the received data and cognitive uncertainties corresponding to the predictions, described in more detail below. The cognitive network 405 compares the cognitive uncertainty to a cognitive uncertainty threshold. If the data is associated with a cognitive uncertainty greater than an uncertainty threshold, the data is provided to the unseen scene detection neural network 410 for further processing. The cognitive uncertainty threshold discussed herein may be empirically determined during development of the probabilistic neural network 407 and/or the unseen scene detection neural network 410. In an exemplary implementation, the cognitive uncertainty threshold may be determined using clustering techniques, anomaly detection techniques, or other similar techniques. By comparing the predictions with ground truth data, these techniques can be applied to predictions output by probabilistic neural network 407 during training.

The unseen scene detection neural network 410 includes a segmenter 415, a foreground automatic encoder 420, and a background automatic encoder 425. The unseen scene detection neural network 410 receives data from the sensing network 405 to determine which portion of the data corresponds to an unseen scene. The segmenter 415 segments the received data into background and foreground portions via conventional segmentation techniques. One or more objects depicted within the image may be detected based on the segmentation of the image. For example, any discrete, continuous foreground portion may be identified as an object in a scene. In some examples, only portions of the continuous foreground that are greater than a particular size (e.g., by number of pixels) are identified as objects in the scene.

The segmenter 415 uses a segmentation mask to define regions of the image as belonging to one or more foreground portions (e.g., a plurality of foreground pixels) and one or more background portions (e.g., a plurality of background pixels) of the image. In one example, the segmentation mask defines any region of the image that is not a foreground portion as belonging to a background portion. Therefore, only one background portion may be defined. The segmentation mask may comprise a binary mask classifying features or objects identified within the image or point cloud, and the classified features or objects are assigned to the foreground portion. In one or more implementations, the segmenter 415 includes a Mask R convolutional neural network (Mask R-CNN). In addition to predicting the convolutional layer of the foreground, mask R-CNN adds branches for predicting the convolutional layer of the foreground Mask. However, it should be understood that the segmenter 415 may include other suitable neural networks that may classify similar features or objects depicted within the image and assign the classified objects to foreground portions of the image.

The automatic encoders 420, 425 may include an artificial neural network trained to generate output data based on input data. The auto encoders 420, 425 may each include an encoder that maps the input data to a potential representation and a decoder that maps the potential representation to a reconstruction of the input data. For example, an encoder compresses input data into a compressed representation of the data, and a decoder decompresses the compressed representation into a reconstruction of the input data. Each automatic encoder 420, 425 may include a feedforward neural network that generates an output based on an input and generates a cognitive uncertainty corresponding to the generated output. Uncertainty is a probabilistic measure of the reliability of predictions from a perceptual model. Cognitive uncertainty represents uncertainty due to limited data and knowledge. In the case of a supervised neural network, the cognitive uncertainty gives a probability measure of how much the input is represented in the training data distribution. The respective cognitive uncertainty metric of each automatic encoder 420, 425 may be quantified by a reconstruction error corresponding to the input data.

During training, the foreground auto-encoder 420 receives the foreground portions from the segmenter 415, determines the cognitive uncertainties corresponding to the foreground portions, and maps the foreground portions to the potential representations. Foreground cognitive uncertainty is a probabilistic measure of how well an object is represented in a training data distribution. The foreground auto-encoder 420 may determine a foreground cognitive uncertainty metric by comparing the reconstruction to the input foreground portion as discussed above. The foreground cognitive uncertainty threshold may be determined by observing a typical minimum of cognitive uncertainty determined for the reconstructed foreground portion during training. At runtime, after training, the foreground auto-encoder 420 compares the determined foreground cognitive uncertainty to a previously determined foreground cognitive uncertainty threshold. If the foreground cognitive uncertainty is greater than the foreground cognitive uncertainty threshold, the foreground auto-encoder 420 classifies the foreground portion as including an unseen object class.

The background auto-encoder 425 receives the background portion from the segmenter 415, maps the background portion to the potential representation, and maps the potential representation to a reconstruction of the background portion. The context automatic encoder 425 may determine a context cognitive uncertainty metric by comparing the reconstruction to the input context portion. The background uncertainty is a probabilistic measure of how much a noise factor is represented in the training data distribution, where the noise factor includes weather conditions, lighting conditions, and surface conditions. The background automatic encoder 425 compares the background cognitive uncertainty to a background cognitive uncertainty threshold. The background cognitive uncertainty threshold may be determined during training as discussed above. If the background cognitive uncertainty is greater than the background cognitive uncertainty threshold, the background automatic encoder 425 classifies the background portion as including unseen background content.

A measure of accuracy and uncertainty of predictions from previously trained DNNs is used to determine a cognitive uncertainty threshold. The cognitive uncertainty metric may be based on measuring similarity between the reconstructed portion and the input, or other image processing techniques (such as determining image similarity) including sum of squares differences, correlation, and comparison using neural networks. Various bayesian learning techniques may be used to calculate the cognitive uncertainty metrics of the automatic encoders, such as the foreground automatic encoder 420 and the background automatic encoder 425. Bayesian learning techniques may include: a monte carlo sampling method; training an integration method of a plurality of probability models with different initializations; and fitting a variance inference of the gaussian variance posterior approximation to the weights of the automatic encoder.

The perception network 405 may receive output from the auto-encoders 420, 425 indicating whether the input data includes unseen object categories or unseen background content. If the sensing network 405 receives an output indicating that the data includes an unseen object class or unseen background content, the sensing network 405 may transmit the data to the server 145 via the network 135. As discussed above, the unseen object categories and unseen background content generally correspond to high cognitive uncertainties. The object categories seen and the object categories seen generally correspond to low cognitive uncertainties. In examples where the potential representation output by the automatic encoder does not include unseen object categories or unseen background content, but the overall cognitive uncertainty determined by the perception network 405 is high, the output may be marked for manual review and provided to the server 145. The perception network 405 may be retrained with data having tags indicating object categories and/or background content and provided to the vehicle 105 once retrained.

The deep neural network 400, including the perception network 405 and the unseen scene detection neural network 410, may improve operation of the vehicle 105 by providing an output indicating that the input data includes an unseen object class or an unseen background class. The presence of an unseen object class or an unseen background class may indicate that the output result (e.g., trailer angle) from the deep neural network 400 has high cognitive uncertainty and, thus, the output result may be unreliable. The high cognitive uncertainty is defined empirically by the user and may be, for example, an uncertainty of greater than 50%. In examples where the deep neural network 400 indicates high cognitive uncertainty, the computer 110 in the vehicle 105 may determine that the reliability of the results is insufficient to allow the computer 110 to operate the vehicle 105. For example, when the cognitive uncertainty corresponding to the trailer angle 504 is greater than a 50% threshold, which indicates that the deep neural network 400 has not correctly determined the trailer angle 504, the computer 110 may prevent the vehicle 105 from backing up with the trailer attached. The user may select the cognitive uncertainty threshold based on testing the deep neural network 400 using real world data. Output data regarding the unseen object class and the unseen background class may be uploaded to the server 145 to allow the server 145 to retrain the deep neural network 400 based on the unseen object class and the unseen background class. The retrained deep neural network 400 may then be downloaded to the computer 110 in the vehicle 105 to allow the computer 110 to process input data including unseen object categories and unseen background categories with less cognitive uncertainty.

In examples where the deep neural network 400 indicates low cognitive uncertainty, such as where the cognitive uncertainty is less than a 50% threshold, the output from the deep neural network 400 may be used to operate the vehicle 105. An example of operating the vehicle 105 based on the deep neural network 400 output is when the deep neural network 400 outputs a trailer angle 504 in response to an input image from the vehicle sensor 115. As discussed above, the trailer angle 504 indicates the angle formed by the vehicle axis formed parallel to the direction of travel of the vehicle 105 and the trailer axis formed parallel to the direction of travel of the wheels of the trailer. Trailer angle 504 may be measured at the point of attachment of the trailer to vehicle 105 (e.g., at the trailer hitch). Trailer angle 504 depicts the direction in which the trailer will travel in response to reversing vehicle 105 in a direction determined by the vehicle steering, brakes, and driveline. The computer 110 in the vehicle may determine the correct command to send to the controller for the vehicle steering, brakes and driveline to move the trailer to the desired position (e.g., parking space) based on the trailer angle.

Fig. 5 is an illustration of a vehicle trailer 500 attached to the vehicle 105. An image of the vehicle trailer 500 may be acquired by a camera 502 included in the vehicle 105 for processing with the deep neural network 400 as discussed above with respect to fig. 4. The deep neural network 400 may determine a trailer angle 504, which may be an angle determined relative to a centerline 506 of the vehicle 105 and a centerline 508 of the vehicle trailer 500. The cognitive uncertainty 510 is illustrated by an arrow indicating a range of possible trailer angles that may be determined by the deep neural network 400 based on images acquired by the camera 502.

FIG. 6 is a flow chart of an exemplary process 600 for identifying unseen scenes within data where one or more deep neural networks have not been trained. The blocks of process 600 may be performed by a processor of computer 110. At block 605, it is determined whether data is received at the sensing network 405. As discussed above, the data may include sensor 115 data, such as an image or a point cloud. If data has not been received, the process 600 returns to block 605. Otherwise, at block 610, the cognitive network 405 may generate predictions based on the received data and the cognitive uncertainties for the predictions. For example, the sensing network 405 may receive sensor 115 data and use the trained probabilistic neural network 407 to predict objects in the sensed vehicle environment. The probabilistic neural network 407 may also generate a cognitive uncertainty for the prediction.

At block 615, the cognitive network 405 compares the cognitive uncertainty to a cognitive uncertainty threshold. If the cognitive certainty is greater than the cognitive uncertainty threshold, the data is provided to the unseen scene detection neural network 410 at block 620. If the cognitive uncertainty is less than or equal to the cognitive uncertainty threshold, the process 600 returns to block 605. In examples where the cognitive uncertainty is less than or equal to the cognitive uncertainty threshold, the predicted object may be output to the computer 110 in the vehicle 105 and used to operate the vehicle 105.

At block 625, the divider 415 divides the data into a foreground portion and a background portion. The foreground portion may be provided to the foreground auto-encoder 420 and the background portion may be provided to the background auto-encoder 425. At block 630, the foreground auto-encoder 420 calculates the foreground cognitive uncertainty by comparing the reconstructed foreground portion to the input foreground portion. At block 635, the foreground automatic encoder 420 determines whether the foreground portion includes an unseen object class based on a comparison of the foreground cognitive uncertainty to a foreground cognitive uncertainty threshold. For example, if the foreground cognitive uncertainty is greater than the foreground cognitive uncertainty threshold, the foreground automatic encoder 420 classifies the input data as including an unseen object class. At block 640, the foreground auto-encoder 420 causes the sense network 405 to transmit the input data received at block 605 to the server 145 so that the sense network 405 may be trained with data including the unseen object class and the process 600 ends. In some cases, data including unseen object categories may be marked with corresponding object categories prior to training.

At block 645, the background automatic encoder 425 calculates the background cognitive uncertainty. At block 650, the background automatic encoder 425 determines whether the background portion includes unseen background content based on a comparison of the background cognitive uncertainty to a background cognitive uncertainty threshold. For example, if the background cognitive uncertainty is greater than the background cognitive uncertainty threshold, the background automatic encoder 425 classifies the input data as including unseen background content. At block 655, the context automatic encoder 425 causes the sense network 405 to transmit the input data received at block 605 to the server 145 so that the sense network 405 may be trained with data comprising unseen context, and the process 600 ends. In some cases, data including unseen background content may be marked with a corresponding background content tag prior to training.

At block 660, the perception network 405 marks the input data received at block 605 for manual viewing and transmits the marked input data to the server 145. If the perceived uncertainty of both foreground and background content is solely within the threshold, but the overall perceived uncertainty determined by the perceived network 405 is still higher than expected, these images are marked for manual review and knowledge. In this case, the foreground and background cognitive uncertainties are less than or equal to the respective cognitive uncertainty thresholds, indicating that the automatic encoders 420, 425 were previously trained with data corresponding to the partitions, but the overall cognitive uncertainty determined by the sensing network 405 is greater than the thresholds. In this case, the image data may be transmitted to the server 145 via the network 135. For example, the server 145 may use the image data to retrain the perception network 405 and the unseen scene detection neural network 410. Process 600 then ends.

In general, the described computing systems and/or devices may employ any of a number of computer operating systems, including, but in no way limited to, the following versions and/or variants: ford (Ford)Application, appLink/Smart Device Link middleware, microsoft +. >Operating System, microsoft->Operating System, unix operating System (e.g., +.A. issued by Oracle corporation of the coast of Redwood California>Operating system), AIX UNIX operating system published by International Business Machines company of Armonk, N.Y., linux operating system, apple by Coprinus CalifMac OSX and iOS operating systems published by the company, blackBerry OS published by BlackBerry Limited of Torile Canada, android operating systems developed by Google corporation and open cell phone alliance, or supplied by QNX Software Systems>CAR infotainment platform. Examples of computing devices include, but are not limited to, an in-vehicle computer, a computer workstation, a server, a desktop, a notebook, a laptop or handheld computer, or some other computing system and/or device.

Computers and computing devices typically include computer-executable instructions that may be capable of being executed by one or more computing devices, such as those listed above. Computer-executable instructions may be compiled or interpreted from a computer program created using a variety of programming languages and/or techniques, including, but not limited to, java, alone or in combination ^TM C, C ++, matlab, simulink, stateflow, visual Basic, java Script, perl, HTML, etc. Some of these applications may be compiled and executed on virtual machines such as Java virtual machines, dalvik virtual machines, and the like. In general, a processor (e.g., a microprocessor) receives instructions from, for example, a memory, a computer-readable medium, etc., and executes the instructions, thereby performing one or more processes, including one or more of the processes described herein. Such instructions and other data may be stored and transmitted using a variety of computer-readable media. Files in a computing device are typically a collection of data stored on a computer readable medium such as a storage medium, random access memory, or the like.

The memory may include computer-readable media (also referred to as processor-readable media) including any non-transitory (e.g., tangible) media that participates in providing data (e.g., instructions) that may be read by a computer (e.g., by a processor of a computer). Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media may include, for example, optical or magnetic disks, and other persistent memory. Volatile media may include, for example, dynamic Random Access Memory (DRAM), which typically constitutes a main memory. Such instructions may be transmitted by one or more transmission media, including coaxial cables, copper wire and fiber optics, including the wires that comprise a system bus coupled to the processor of the ECU. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, a flash EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.

Databases, data stores, or other data stores described herein may include various mechanisms for storing, accessing, and retrieving various data, including hierarchical databases, file sets in file systems, application databases in proprietary formats, relational database management systems (RDBMSs), and the like. Each such data storage device is typically included within a computing device employing a computer operating system (such as one of those mentioned above) and is accessed via a network in any one or more of a variety of ways. The file system may be accessed from a computer operating system and may include files stored in various formats. In addition to languages used to create, store, edit, and execute stored programs, such as the PL/SQL language described above, RDBMS also typically employ Structured Query Language (SQL).

In some examples, system elements may be implemented as computer-readable instructions (e.g., software) on one or more computing devices (e.g., servers, personal computers, etc.), stored on a computer-readable medium (e.g., disk, memory, etc.) associated therewith. The computer program product may include such instructions stored on a computer-readable medium for performing the functions described herein.

With respect to the media, processes, systems, methods, heuristics, etc. described herein, it should be understood that, while the steps of such processes, etc. have been described as occurring in a certain ordered sequence, such processes may be practiced by executing the steps in an order different than that described herein. It should also be understood that certain steps may be performed concurrently, other steps may be added, or certain steps described herein may be omitted. In other words, the description of the processes herein is provided for the purpose of illustrating certain embodiments and should not be construed as limiting the claims in any way.

Accordingly, it is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the invention should be determined, not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. It is contemplated and anticipated that future developments will occur in the arts discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In summary, it is to be understood that the invention is capable of modification and variation and is limited only by the following claims.

Unless explicitly indicated to the contrary herein, all terms used in the claims are intended to be given their ordinary and customary meaning as understood by those skilled in the art. In particular, the use of singular articles such as "a," "an," "the," and the like are to be construed to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.

According to the present invention there is provided a system having a computer comprising a processor and a memory, the memory comprising instructions executable by the processor such that the processor is programmed to: processing vehicle sensor data using a deep neural network to generate predictions indicative of one or more objects based on the vehicle sensor data, and determining an object uncertainty corresponding to the predictions; and then upon determining that the object uncertainty is greater than an uncertainty threshold: dividing the vehicle sensor data into a foreground portion and a background portion; classifying the foreground portion as including an unseen object class when the foreground cognitive uncertainty is greater than a foreground cognitive uncertainty threshold; classifying the background portion as including an unseen background when the background cognitive uncertainty is greater than a background cognitive uncertainty threshold; and transmitting the data and the data classifications to a server.

According to one embodiment, the processor is further programmed to operate the vehicle based on the prediction indicative of one or more objects.

According to one embodiment, the object uncertainty is a probability that indicates that the prediction of one or more objects correctly identified the one or more objects.

According to one embodiment, the foreground cognitive uncertainty is a probability measure of the extent to which the one or more objects are represented in the training data distribution.

According to one embodiment, the background cognitive uncertainty is a probabilistic measure of how much a noise factor is represented in the training data distribution, wherein the noise factor comprises weather conditions, lighting conditions and surface conditions.

According to one embodiment, the processor is further programmed to: mapping the foreground portion to a potential representation; mapping the potential representation to a reconstruction of the foreground portion; and determining the foreground cognitive uncertainty based on a comparison of the reconstructed foreground portion to the foreground portion.

According to one embodiment, the processor is further programmed to: mapping the background portion to a potential representation; mapping the potential representation to a reconstruction of the background portion; and determining the background cognitive uncertainty based on a comparison of the reconstructed background portion and the background portion.

According to one embodiment, the vehicle sensor data comprises at least one of an image or a point cloud.

According to one embodiment, the deep neural network comprises a probabilistic neural network.

According to one embodiment, the processor is further programmed to segment the vehicle sensor data into the foreground portion and the background portion via a segmentation mask via a segmenter.

According to one embodiment, the segmentation mask comprises a binary mask classifying objects within the vehicle sensor data, wherein the classified objects are assigned to the foreground portion.

According to one embodiment, the segmenter comprises a Mask R convolutional neural network (Mask R-CNN).

According to the invention, a method comprises: processing vehicle sensor data using a deep neural network to generate predictions indicative of one or more objects based on the data, and determining an object uncertainty corresponding to the predictions; when the object uncertainty is greater than an uncertainty threshold: dividing the vehicle sensor data into a foreground portion and a background portion; classifying the foreground portion as including an unseen object class when the foreground cognitive uncertainty is greater than a foreground cognitive uncertainty threshold; classifying the background portion as including an unseen background when the background cognitive uncertainty is greater than a background cognitive uncertainty threshold; and transmitting the data and the data classifications to a server.

In one aspect of the invention, the vehicle is operated based on the predictions indicative of one or more objects.

In one aspect of the invention, the object uncertainty is a probability that the prediction indicating one or more objects correctly identifies the one or more objects.

In one aspect of the invention, the foreground cognitive uncertainty is a probability measure of the extent to which the one or more objects are represented in the training data distribution.

In one aspect of the invention, the background cognitive uncertainty is a probabilistic measure of how much a noise factor is represented in the training data distribution, wherein the noise factor includes weather conditions, lighting conditions, and surface conditions.

In one aspect of the invention, the method comprises: mapping the foreground portion to a potential representation; mapping the potential representation to a reconstruction of the foreground portion; and determining the foreground cognitive uncertainty based on a comparison of the reconstructed foreground portion and the foreground portion.

In one aspect of the invention, the method comprises: mapping the background portion to a potential representation; mapping the potential representation to a reconstruction of the background portion; and determining the background cognitive uncertainty based on a comparison of the reconstructed background portion and the background portion.

In one aspect of the invention, the vehicle sensor data includes at least one of an image or a point cloud.

Claims

1. A method, comprising:

processing vehicle sensor data using a deep neural network to generate predictions indicative of one or more objects based on the vehicle sensor data, and determining an object uncertainty corresponding to the predictions;

then, upon determining that the object uncertainty is greater than an uncertainty threshold:

dividing the vehicle sensor data into a foreground portion and a background portion;

classifying the foreground portion as including an unseen object class when the foreground cognitive uncertainty is greater than a foreground cognitive uncertainty threshold;

classifying the background portion as including an unseen background when the background cognitive uncertainty is greater than a background cognitive uncertainty threshold; and

the data and the data classifications are transmitted to a server.

2. The method of claim 1, wherein the processor is further programmed to operate a vehicle based on the prediction indicative of one or more objects.

3. The method of claim 1, wherein the object uncertainty is a probability that the prediction indicating one or more objects correctly identified the one or more objects.

4. The method of claim 1, wherein the foreground cognitive uncertainty is a probability measure of the extent to which the one or more objects are represented in a training data distribution.

5. The method of claim 1, wherein the background cognitive uncertainty is a probabilistic measure of how much a noise factor is represented in a training data distribution, wherein noise factor includes weather conditions, lighting conditions, and surface conditions.

6. The method of claim 1, further comprising:

mapping the foreground portion to a potential representation;

mapping the potential representation to a reconstruction of the foreground portion; and

the foreground cognitive uncertainty is determined based on a comparison of the reconstructed foreground portion to the foreground portion.

7. The method of claim 1, further comprising:

mapping the background portion to a potential representation;

mapping the potential representation to a reconstruction of the background portion; and

the background cognitive uncertainty is determined based on a comparison of the reconstructed background portion to the background portion.

8. The method of claim 1, wherein the vehicle sensor data comprises at least one of an image or a point cloud.

9. The method of claim 1, wherein the deep neural network comprises a probabilistic neural network.

10. The method of claim 1, further comprising segmenting the vehicle sensor data into the foreground portion and the background portion via a segmentation mask via a segmenter.

11. The method of claim 10, wherein the segmentation mask comprises a binary mask classifying objects within the vehicle sensor data, wherein the classified objects are assigned to the foreground portion.

12. The method of claim 10, wherein the segmenter comprises a Mask R convolutional neural network (Mask R-CNN).

13. The method of claim 1, wherein the object is a vehicle trailer and the deep neural network outputs a trailer angle.

14. The method of claim 13, wherein the trailer angle describes a direction in which the vehicle trailer will travel in response to reversing the vehicle.

15. A system comprising a computer programmed to perform the method of any one of claims 1 to 14.