WO2021233357A1 - Procédé de détection d'objet, système et support lisible par ordinateur - Google Patents

Procédé de détection d'objet, système et support lisible par ordinateur Download PDF

Info

Publication number
WO2021233357A1
WO2021233357A1 PCT/CN2021/094720 CN2021094720W WO2021233357A1 WO 2021233357 A1 WO2021233357 A1 WO 2021233357A1 CN 2021094720 W CN2021094720 W CN 2021094720W WO 2021233357 A1 WO2021233357 A1 WO 2021233357A1
Authority
WO
WIPO (PCT)
Prior art keywords
incoming
category
superpoints
object point
semantic map
Prior art date
Application number
PCT/CN2021/094720
Other languages
English (en)
Inventor
Xiang Li
Yi Xu
Yuan Tian
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp., Ltd. filed Critical Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority to CN202180030033.9A priority Critical patent/CN115428040A/zh
Publication of WO2021233357A1 publication Critical patent/WO2021233357A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to image processing technologies, and more particularly, to an object detection method, system and computer-readable medium.
  • Object detection could play an important role in Augmented Reality (AR) .
  • AR Augmented Reality
  • the awareness/understanding of objects in the real-world scene can enable a lot of applications for AR. For example, one can change the appearance of real objects by adjusting virtual overlay accordingly.
  • rules of association e.g., a matching virtual chair displayed near a real table
  • e-commerce applications one can also recommend related merchandise based on the understanding of the scene.
  • DNN Deep Neural Networks
  • AR frameworks have become mainstream, e.g., Apple Inc. ’s ARKit and Google Inc. ’s ARCore.
  • Such AR frameworks employ SLAM algorithm, more specifically VIO algorithm, to track the 6 Degree-of-Freedom (DoF) camera pose (i.e., position and orientation) .
  • DoF Degree-of-Freedom
  • 3D sparse point cloud data is also provided by such frameworks. They can reconstruct 3D points over 50 meters from the camera.
  • An object of the present application is to propose an object detection method, system and computer-readable medium to use a semantic map to improve accuracy of object detection.
  • an object detection method includes:
  • object representation data based on outputs from a neural network and an augmented reality (AR) framework, wherein the object representation data includes object label information of an object identified by the neural network on an image and three-dimensional location of an object point and viewpoint information and scale information of the object point from the AR framework;
  • AR augmented reality
  • a semantic map including object superpoints, wherein each of the object superpoints is represented by historical data of scores, the viewpoint information and the scale information of the object point;
  • the semantic map in the building the semantic map, is built from the object points whose projections on the image are within a bounding area of the object identified by the neural network.
  • a median point of the object points whose projections on the image are within the bounding area of the object is used to construct the object superpoint of the semantic map.
  • the certain distance is a maximum scale of the category to which the incoming object point belongs to.
  • the updating the semantic map in response to the incoming object point includes:
  • the score of the incoming object point in consideration of a comparison between the viewpoint information of the incoming object point and historical viewpoint information of the object superpoints in the set and/or a comparison between the scale information of the incoming object point and historical scale information of the object superpoints in the set, wherein the incoming object point is of the category identified by the neural network with the probability of the category, and the score of the incoming object point indicates a change of the probability of the category to which the incoming object point belongs.
  • the score of the incoming object point is computed based on a first weight and a second weight, wherein the first weight is associated with a minimum angular difference between a viewpoint corresponding to the incoming object point and all the viewpoints corresponding to the object superpoints in the set, and the second weight is associated with a minimum scale difference between a scale corresponding to the incoming object point and all the scales corresponding to the object superpoints in the set.
  • the first weight is set to be a first number if the minimum angular difference is less than a first predetermined degree and the first weight is set to be a second number if the minimum angular difference is greater than a second predetermined degree, and wherein the first number is less than the second number and the first predetermined degree is less than the second predetermined degree.
  • the second weight is proportional to the minimum scale difference if the minimum scale difference is within a predetermined range, and the second weight is set to a fixed number if the minimum scale difference exceeds the predetermined range.
  • the score of the incoming object point increases as the minimum angular difference and/or the minimum scale difference increases; the score of the incoming object point decreases as the minimum angular difference and/or the minimum scale difference decreases, and wherein an increase of the minimum angular difference and/or the minimum scale difference indicates a chance to use the probability of the category of the incoming object point obtained from the neural network increases; a decrease of the minimum angular difference and/or the minimum scale difference indicates a chance to use the probability of the category of the incoming object point obtained from the neural network decreases.
  • the updating the semantic map in response to the incoming object point includes:
  • the object superpoints in the set get an extra score if the object superpoints in the set fall within a minimum scale of the category to which the incoming object point belongs.
  • the updating the semantic map in response to the incoming object point includes:
  • the updating the semantic map in response to the incoming object point includes:
  • the modifying the probability of the category to which the incoming object point belongs and identified by the neural network based on the updated semantic map includes:
  • the probability of the category to which the incoming object point belongs is modified in consideration of a maximum score of all the object superpoints in the set with a category as the same as the category of the incoming object point and a maximum score of all the object superpoints in the set with any other category.
  • the probability of the category to which the incoming object point belongs is modified based on a sigmoid function.
  • an object detection system includes:
  • At least one memory configured to store program instructions
  • At least one processor configured to execute the program instructions, which cause the at least one processor to perform steps including:
  • object representation data based on outputs from a neural network and an augmented reality (AR) framework, wherein the object representation data includes object label information of an object identified by the neural network on an image and three-dimensional location of an object point and viewpoint information and scale information of the object point from the AR framework;
  • AR augmented reality
  • each of the object superpoints is represented by historical data of scores, the viewpoint information and the scale information of the object point;
  • the updating the semantic map in response to the incoming object point includes:
  • the score of the incoming object point in consideration of a comparison between the viewpoint information of the incoming object point and historical viewpoint information of the object superpoints in the set and/or a comparison between the scale information of the incoming object point and historical scale information of the object superpoints in the set, wherein the incoming object point is of the category identified by the neural network with the probability of the category, and the score of the incoming object point indicates a change of the probability of the category to which the incoming object point belongs.
  • the updating the semantic map in response to the incoming object point includes:
  • the modifying the probability of the category to which the incoming object point belongs and identified by the neural network based on the updated semantic map includes:
  • a non-transitory computer-readable medium is deployed with program instructions stored thereon, that when executed by at least one processor, cause the at least one processor to perform any of above-described object detection method.
  • a semantic map is used to improve the accuracy of object detection.
  • Semantic points in the map are generated by combining object detection results from a neural network and pose data and three-dimensional points results from an AR framework.
  • the semantic map consists of object superpoints with a list of scores corresponding to detected labels, a list of view directions and a list of scales.
  • the probabilities from the neural network are modified based on them. By modifying the probability of an object label or category, object detection accuracy is enhanced.
  • FIG. 1 is a schematic diagram illustrating the architecture of object detection according to the present invention.
  • FIG. 2 is a flowchart of an object detection method according to the present application.
  • FIG. 3 is a flowchart of a semantic map updating process according to the present application.
  • FIG. 4 is a block diagram illustrating an object detection system according to the present application.
  • FIG. 5 is a block diagram illustrating an updating module of an object detection system according to the present application.
  • FIG. 6 is a block diagram illustrating an electronic device for implementing an object detection method according to the present application.
  • DNN DNN
  • AR frameworks AR frameworks
  • other neural networks and similar augmented reality technologies can apply in the present application. It is not intended that the present application is limited to any illustrated examples.
  • the present application is to use a (3D) semantic map to improve the accuracy of 2D object detection DNN (s) .
  • the semantic point clouds are generated by combining object detection results from DNN (s) and pose and 3D points results from AR framework.
  • the semantic map consists of 3D superpoints with a list of scores corresponding to detected labels, a list of view directions and a list of scales.
  • the probabilities from DNN (s) are updated or modified based on them. By modifying the probability of an object label or category, object detection accuracy is enhanced.
  • a score measure for estimated object points considers not only how many times a certain label has been detected at a certain location, but also the detection view directions and detection scales.
  • This approach performs better than ordinary DNN (s) in AR scenario when viewpoint is constantly changing. This can decrease the probability of false positives when the object is recognized as a category which has never been seen from similar view directions recently, while another category has been seen many times at the same location. For example, this approach can correct false positives where a bed has been detected as a couch at the current frame, but the semantic map shows a bed had been detected consistently at the same location in previous frames.
  • the object detection DNN when the object detection DNN (s) outputs a relative low probability for an object label, but it can be known from the semantic map that this object at this location from very different directions a while ago, this approach will increase the probability of the said category.
  • This approach increases the accuracy of 2D object detection with AR framework without any additional training data to handle scale and viewpoint variance of the task.
  • This enables many AR applications. For example, it can assign semantic labels to 3D point cloud which can trigger corresponding virtual contents for the users.
  • FIG. 1 is a schematic diagram illustrating the architecture of object detection according to the present invention.
  • the architecture of object detection of the present application is described as follows.
  • a 3D semantic map is built using the output from both AR framework and DNN (s) .
  • object detection such as variance in scale and viewpoint: an object detector must detect objects with different scales on the images and from different viewpoints. This challenge is addressed by using a category scale database to verify whether the detected object’s category agrees with the scale estimated from AR framework. For example, an airplane should not appear in a 5mx5m space.
  • the viewpoints generated by AR framework when an object is detected by DNN (s) are stored. In some embodiments, those consistent detections of the same object from different view directions and/or at different scales are favored.
  • a probabilistic model is used to insert and update object category, viewpoint, and scale information in the 3D semantic map. Information from the 3D semantic map is extracted to update the object label probability from the DNN (s) as shown in FIG. 1.
  • FIG. 2 is a flowchart of an object detection method according to the present application. The object detection method is detailedly described below.
  • Step S200 creating object representation data based on outputs from a neural network and an AR framework.
  • outputs from a neural network e.g., DNN
  • an AR framework e.g., Apple Inc. ’s ARKit or Google Inc. ’s ARCore, which employs SLAM algorithm, more specifically VIO algorithm
  • the object representation data includes object label information (e.g., a chair label shown in FIG. 1) of an object (e.g., a chair) identified by the neural network on an image, and further includes three-dimensional location of an object point and viewpoint information and scale information of the object point from the AR framework.
  • 2D object points of the object (e.g., the chair) on the image have corresponding 3D object points estimated from the AR framework, wherein the mapping of the 3D object points onto the image results in the 2D object points.
  • the DNN (s) may output a list of N object categories with associated bounding boxes and probabilities.
  • an object representation data structure is created as (loc, label, view, scale) , where loc is the 3D coordinates of an estimated object point in a current frame, tabel is the object label from DNN (s) , view is the view direction (or viewpoint) from camera to loc, and scale is the scale information that depends on the distance from camera position to loc.
  • the AR framework For each frame, the AR framework generates the 6DoF pose of the camera and a set of sparse 3D points with global 3D coordinates. These are used to compute loc, view, and scale.
  • Step S202 building a semantic map including object superpoints.
  • each of the object superpoints is represented by historical data of scores, the viewpoint information and the scale information of the object point.
  • the object superpoints are represented as (loc, list_score, list_view, list_scale) .
  • the three lists encode information from all previous frames in the AR session. 1) list_score (E 1 , E 2 , E 3 , ..., E l , ...) stores the list of the scores E l for each label l that has been detected at this point; the higher the score, the higher probability that this point is of category l.
  • list_view (v 1 , v 2 , v 3 , ...) stores the list of historical view directions (or viewpoints) from camera positions to the point when an object is detected; and 3) list_scale (s 1 , s 2 , s 3 , ...) stores the list of historical scales from camera positions to the point when an object is detected at the point.
  • a point might be labelled as different categories during the AR session at different time instances.
  • the semantic map is built from the object points whose projections on the image are within a bounding area of the object identified by the neural network. That is, only the 3D object points that map to or fall within the bounding area (e.g., a bounding box) of the object identified by the neural network are to be concerned or interested in building the semantic map.
  • a bounding area e.g., a bounding box
  • a median point of the object points whose projections on the image are within the bounding area (e.g., a bounding box) of the object is used to construct the object superpoint of the semantic map. It is ensured that the median point falls within the bounding area of the object on the image. In another aspect, this further reduces the computation amount.
  • some form of statistics of all the 3D points whose projections on the image are within the 2D bounding box of a detected object label may be computed.
  • the median for each of the XYZ dimensions for all points is used to represent the object in the current view. In this way, it is avoided assigning object labels to irrelevant points on other objects or background in the semantic map. This may make the approach more robust and efficient.
  • the AR framework estimates the pose of camera and reconstructs a few 3D points as shown in FIG. 1 as circular points (see the right side of FIG. 1) .
  • the median, p (2.8733, 1.09483, 1.2345) of those circular points that are within the 2D bounding box of “chair” is used to represent estimated object point loc.
  • a view direction v (0.61497, 0.76871, 0.17458) is computed from the position of the camera to loc.
  • v is a normalized unit vector.
  • Step S204 determining a set of the object superpoints in the semantic map whose locations are within a certain distance of an incoming object point.
  • Viewpoint will be changed during an AR session.
  • new object points may be generated from different viewpoints.
  • different labels or categories may be given to the same object.
  • a set of the object superpoints in the semantic map whose locations are within a certain distance of the incoming object point is determined. More specifically, the certain distance is determined based on a scale of a category of the object (e.g., a chair category) identified by the neural network and the incoming object point belongs to the category.
  • the scale of the object category or label may be retrieved from a category scale database as shown in FIG. 1.
  • the certain distance is a maximum scale of the category to which the incoming object point belongs to.
  • the scale of a chair category ranges from 0.5m to 1.5m.
  • the maximum scale of the chair category would be 1.5m.
  • each incoming estimated object representation loc in , label in , view in , scale in
  • the superpoints loc, list_score, list_view, list_scale
  • a set of superpoints in the map whose loc are within a certain distance of the incoming object point loc in are located.
  • a minimum scale and maximum scale are defined for that category (e.g., 0.5m to 1.5m for the scale of a chair category) . Then any superpoints within the maximum scale of the incoming object category will be added to the set S in for processing.
  • Step S206 updating the semantic map in response to the incoming object point.
  • the semantic map is updated.
  • the updated semantic map is for modifying in subsequent processes a probability of the category to which the incoming object belongs to and identified by the neural network for facilitating the object detection.
  • the scores of the object superpoints in the determined set in the semantic map are updated based on information from the incoming object point. That is, the information from the incoming object point participates in building historical scores of set of the object superpoints in the semantic map.
  • FIG. 3 is a flowchart of a semantic map updating process according to the present application.
  • the updating the semantic map in Step S206 may include the following steps, i.e., Steps 300 to 306.
  • Step 300 computing the score of the incoming object point.
  • weights w v and w s are calculated as:
  • v diff is the minimum angular difference between current view direction (or viewpoint) and all the view directions in the list_view of all the points in S in .
  • the higher the v diff the higher the weight w v is.
  • weight w v is set to zero when the v diff is within 45 degrees to only update the semantic map intermittently.
  • w v is capped at 1 when the v diff is larger than 90 degrees.
  • S diff is the minimum scale difference between current scale and all scales in the list_scale of all the points in S in .
  • the higher the s diff the higher the weight w s is.
  • k s is the factor to normalize the s diff .
  • the computing the score of the incoming object point may include computing the score of the incoming object point in consideration of a comparison (e.g., v diff ) between the viewpoint information of the incoming object point and historical viewpoint information of the object superpoints in the set and/or a comparison (e.g., s diff ) between the scale information of the incoming object point and historical scale information of the object superpoints in the set. That is, the comparison between the viewpoint information of the incoming object point and historical viewpoint information of the object superpoints in the set and/or the scale information of the incoming object point and historical scale information of the object superpoints in the set can be used to estimate how much magnitude the camera pose is changed at.
  • a comparison e.g., v diff
  • s diff a comparison between the scale information of the incoming object point and historical scale information of the object superpoints in the set.
  • the incoming object point is of the category identified by the neural network with the probability of the category, and the score of the incoming object point indicates a change of the probability of the category to which the incoming object point belongs. It is desired that a large magnitude the camera pose is changed at leads to a high score of the incoming object point since in this circumstance it would be better to assign a new label identified by the neural network to the incoming object point, and a small magnitude the camera pose is changed at leads to a low score of the incoming object point since in this circumstance it would be better to keep an already-detected label for the incoming object point.
  • the score of the incoming object point is computed based on a first weight and a second weight, in which the first weight is associated with a minimum angular difference between a viewpoint corresponding to the incoming object point and all the viewpoints corresponding to the object superpoints in the set, and the second weight is associated with a minimum scale difference between a scale corresponding to the incoming object point and all the scales corresponding to the object superpoints in the set.
  • the first weight and the second weight may be w v and w s , respectively.
  • the minimum angular difference v diff is used to determine the first weight.
  • the minimum scale difference s diff is used to determine the second weight.
  • the first weight is set to be a first number if the minimum angular difference is less than a first predetermined degree and the first weight is set to be a second number if the minimum angular difference is greater than a second predetermined degree, and the first number is less than the second number and the first predetermined degree is less than the second predetermined degree.
  • the first weight may be w v .
  • the first weight w v is set to be 0 if the minimum angular difference v diff is less than 45 degrees and is set to 1 if the minimun angular difference v diff is greather than 90 degrees.
  • the second weight is proportional to the minimum scale difference if the minimum scale difference is within a predetermined range, and the second weight is set to a fixed number if the minimum scale difference exceeds the predetermined range.
  • the second weight may be w s .
  • the second weight w s is proportional to the minimum scale difference s diff if the minimum scale difference s diff is within 1/k s , and the second weight w s is set to 1 if the minimum scale difference s diff exceeds 1/k s .
  • the score of the incoming object point increases as the minimum angular difference and/or the minimum scale difference increases (e.g., increases as s diff and/or s diff increases) ; the score of the incoming object point decreases as the minimum angular difference and/or the minimum scale difference decreases (e.g., decreases as v diff and/or s diff decreases) , and an increase of the minimum angular difference and/or the minimum scale difference indicates a chance to use the probability of the category of the incoming object point obtained from the neural network increases, that is, it is desired that a large magnitude the camera pose is changed at leads to a high score of the incoming object point since in this circumstance it would be better to assign a new label identified by the neural network to the incoming object point; a decrease of the minimum angular difference and/or the minimum scale difference indicates a chance to use the probability of the category of the incoming object point obtained from the neural network decreases, that is, it is desired that a small magnitude the camera pose is changed at leads to
  • Step 302 updating the scores of the object superpoints in the set by utilizing the score of the incoming object point.
  • the scores of the object superpoints in the set are updated for the category of the object superpoints in the set that is identical to the category of the incoming object point identified by the neural network, by utilizing the score of the incoming object point. More specifically, a distance from the object superpoints in the set with the same category to the incoming object point is considered. For the object superpoints falling between a maximum scale and a minimum scale of the category of the incoming object point, their scopes are updated by adding the score of the incoming object point to their original scores.
  • these object superpoints get an extra score (e.g., 1) if they fall within the minimum scale of the category to which the incoming object point belongs. This takes the times a certain label has been detected at a certain location into consideration.
  • Step 304 updating the historical data of viewpoint information and/or scale information.
  • the historical data of the viewpoint information of any one of the object superpoints in the set is updated if a minimum angular difference (e.g., v diff ) between a viewpoint corresponding to the one of the object superpoints in the set and all the viewpoints corresponding to the object superpoints in the set is greater than a predetermined degree (e.g., v diff ⁇ 45°) ; and/or the historical data of the scale information of any one of the object superpoints in the set is updated if a minimum scale difference (e.g., s diff ) between a viewpoint corresponding to the one of the object superpoints in the set and all the viewpoints corresponding to the object superpoints in the set exceeds a predetermined value (e.g., s diff ⁇ 1) .
  • a minimum angular difference e.g., v diff
  • a predetermined degree e.g., v diff ⁇ 45°
  • Step 306 initializing the historical data of the incoming object point under certain circumstances.
  • all the three lists list_score, list_view and list_scale are initialized with the corresponding values v and s. For example, list_view (v) and list_scale (s) . Then, the new superpoint (loc, list_score, list_view, list_scale) is added into the semantic map.
  • the historical data of the incoming object point is initialized if no object superpoint is within a minimum scale of the category to which the incoming object point belongs.
  • the initialization means that only current score, viewpoint information and scale information of the incoming object point are recorded on the semantic map for the incoming object point and previous or historical scores, viewpoint information and scale information are initialized as zero or deleted.
  • Step S208 modifying a probability of the category to which the incoming object point belongs and identified by the neural network based on the updated semantic map.
  • a minimum value of p map is defined as 0.5 to make sure it does not decrease the output probability p l from DNN (s) dramatically. Therefore, the final probability of object is:
  • the final output is a list of bounding boxes, each of which has the output (l, p, bbos) , where label and bounding box are the same as the output from DNN (s) .
  • FIG. 4 is a block diagram illustrating an object detection system according to the present application. As illustrated in FIG. 4, an object detection system 40 is provided.
  • the object detection system 40 includes a creating module 400, a building module 402, a determining module 404, an updating module 406 and a modifying module 408.
  • the creating module 400 is configured to create object representation data based on outputs from a neural network and an augmented reality (AR) framework, wherein the object representation data includes object label information of an object identified by the neural network on an image and three-dimensional location of an object point and viewpoint information and scale information of the object point from the AR framework.
  • AR augmented reality
  • the determining module 404 is configured to determine a set of the object superpoints in the semantic map whose locations are within a certain distance of an incoming object point, wherein the certain distance is determined based on a scale of a category of the object identified by the neural network and the incoming object point belongs to the category.
  • the updating module 406 is configured to update the semantic map in response to the incoming object point, wherein the scores of the object superpoints in the determined set in the semantic map are updated based on information from the incoming object point.
  • the modifying module 408 is configured to modify a probability of the category to which the incoming object point belongs and identified by the neural network based on the updated semantic map.
  • FIG. 5 is a block diagram illustrating an updating module of an object detection system according to the present application.
  • the updating module 406 of the object detection system 40 includes a computing unit 500, a score updating unit 502, a data updating unit 504 and an initializing unit 506.
  • the computing unit 500 is configured to compute the score of the incoming object point in consideration of a comparison between the viewpoint information of the incoming object point and historical viewpoint information of the object superpoints in the set and/or a comparison between the scale information of the incoming object point and historical scale information of the object superpoints in the set, wherein the incoming object point is of the category identified by the neural network with the probability of the category, and the score of the incoming object point indicates a change of the probability of the category to which the incoming object point belongs.
  • the score updating unit 502 is configured to update the scores of the object superpoints in the set for the category of the object superpoints in the set that is identical to the category of the incoming object point identified by the neural network by utilizing the score of the incoming object point, wherein for the object superpoints in the set that are of the category identical to the category of the incoming object point, the object superpoints in the set get an extra score if the object superpoints in the set fall within a minimum scale of the category to which the incoming object point belongs.
  • the data updating unit 504 is configured to update the historical data of the viewpoint information of any one of the object superpoints in the set if a minimum angular difference between a viewpoint corresponding to the one of the object superpoints in the set and all the viewpoints corresponding to the object superpoints in the set is greater than a predetermined degree; and/or updating the historical data of the scale information of any one of the object superpoints in the set if a minimum scale difference between a viewpoint corresponding to the one of the object superpoints in the set and all the viewpoints corresponding to the object superpoints in the set exceeds a predetermined value.
  • the initializing unit 506 is configured to modify the probability of the category to which the incoming object point belongs by using the updated scores of the object superpoints in the set, wherein the probability of the category to which the incoming object point belongs is modified in consideration of a maximum score of all the object superpoints in the set with a category as the same as the category of the incoming object point and a maximum score of all the object superpoints in the set with any other category.
  • All or part of the modules or units in the above-mentioned object detection system may be implemented by software, hardware, and a combination thereof.
  • the foregoing modules or units may be embedded in or independent from a processor of a computer equipment in the form of hardware, or may be stored in a memory of the computer equipment in the form of software, so that the processor can invoke and execute the operations corresponding to the foregoing modules or units.
  • the modules or units in the object detection system may be implemented by a computer program.
  • the computer program can be run on a terminal or a server.
  • the program module composed of the computer program can be stored in a memory of the terminal or the server.
  • Implementations also provide a non-transitory computer-readable storage medium.
  • One or more non-transitory computer-readable storage media contain computer-executable instructions which, when executed by one or more processors, cause the one or more processors to perform the operations of the object detection method.
  • FIG. 6 is a block diagram illustrating an electronic device 600 according to an embodiment of the present application.
  • the electronic device 600 can be a mobile phone, a game controller, a tablet device, a medical equipment, an exercise equipment, or a personal digital assistant (PDA) .
  • PDA personal digital assistant
  • the electronic device 600 may include one or a plurality of the following components: a housing 602, a processor 604, a storage 606, a circuit board 608, and a power circuit 610.
  • the circuit board 608 is disposed inside a space defined by the housing 602.
  • the processor 604 and the storage 606 are disposed on the circuit board 608.
  • the power circuit 610 is configured to supply power to each circuit or device of the electronic device 600.
  • the storage 606 is configured to store executable program codes. By reading the executable program codes stored in the storage 606, the processor 604 runs a program corresponding to the executable program codes to execute the object detection method of any one of the afore-mentioned embodiments.
  • the processor 604 typically controls overall operations of the electronic device 600, such as the operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processor 604 may include one or more processor 604 to execute instructions to perform actions at all or part of the steps in the above-described methods.
  • the processor 604 may include one or more modules which facilitate the interaction between the processor 604 and other components.
  • the processor 604 may include a multimedia module to facilitate the interaction between the multimedia component and the processor 604.
  • the storage 606 is configured to store various types of data to support the operation of the electronic device 600. Examples of such data include instructions for any application or method operated on the electronic device 600, contact data, Phonebook data, messages, pictures, video, etc.
  • the storage 606 may be implemented using any type of volatile or non-volatile memory devices, or a combination thereof, such as a static random access memory (SRAM) , an electrically erasable programmable read-only memory (EEPROM) , an erasable programmable read-only memory (EPROM) , a programmable read-only memory (PROM) , a read-only memory (ROM) , a magnetic memory, a flash memory, a magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory a magnetic memory
  • flash memory a magnetic
  • the power circuit 610 supplies power to various components of the electronic device 600.
  • the power circuit 610 may include a power management system, one or more power sources, and any other component associated with generation, management, and distribution of power for the electronic device 600.
  • the electronic device 600 may be implemented by one or more application specific integrated circuits (ASICs) , digital signal processors (DSPs) , digital signal processing devices (DSPDs) , programmable logic devices (PLDs) , field programmable gate arrays (FPGAs) , controllers, micro-controllers, microprocessors, or other electronic components, for performing the above-described methods.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • controllers micro-controllers, microprocessors, or other electronic components, for performing the above-described methods.
  • non-transitory computer-readable storage medium including instructions, such as included in the storage 606, executable by the processor 604 of the electronic device 600 for performing the above-described methods.
  • the non-transitory computer-readable storage medium may be a ROM, a random access memory (RAM) , a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device, and the like.
  • the modules as separating components for explanation are or are not physically separated.
  • the modules for display are or are not physical modules, that is, located in one place or distributed on a plurality of network modules. Some or all of the modules are used according to the purposes of the embodiments.
  • each of the functional modules in each of the embodiments can be integrated in one processing module, physically independent, or integrated in one processing module with two or more than two modules.
  • the software function module is realized and used and sold as a product, it can be stored in a readable storage medium in a computer.
  • the technical plan proposed by the present application can be essentially or partially realized as the form of a software product.
  • one part of the technical plan beneficial to the conventional technology can be realized as the form of a software product.
  • the software product in the computer is stored in a storage medium, including a plurality of commands for a computational device (such as a personal computer, a server, or a network device) to run all or some of the steps disclosed by the embodiments of the present application.
  • the storage medium includes a USB disk, a mobile hard disk, a read-only memory (ROM) , a random access memory (RAM) , a floppy disk, or other kinds of media capable of storing program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de détection d'objet, un système et un support de stockage. Le procédé consiste à créer des données de représentation d'objet sur la base de sorties provenant d'un réseau neuronal et d'un cadre de réalité augmentée (RA), à construire une carte sémantique comprenant des superpoints d'objet, à déterminer un ensemble des superpoints d'objet dans la carte sémantique dont les emplacements se trouvent à une certaine distance d'un point d'objet entrant, à mettre à jour la carte sémantique en réponse au point d'objet entrant, et à modifier une probabilité de la catégorie à laquelle appartient le point d'objet entrant et identifiée par le réseau neuronal sur la base de la carte sémantique mise à jour. La carte sémantique est constituée de superpoints d'objet avec une liste de scores correspondant à des étiquettes détectées, une liste de directions de visualisation et une liste d'échelles. En modifiant une probabilité d'une étiquette ou d'une catégorie d'objet à partir du réseau neuronal sur la base de la carte sémantique, la précision de détection d'objet est améliorée.
PCT/CN2021/094720 2020-05-20 2021-05-19 Procédé de détection d'objet, système et support lisible par ordinateur WO2021233357A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202180030033.9A CN115428040A (zh) 2020-05-20 2021-05-19 物体检测方法、***和计算机可读介质

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063027798P 2020-05-20 2020-05-20
US63/027,798 2020-05-20

Publications (1)

Publication Number Publication Date
WO2021233357A1 true WO2021233357A1 (fr) 2021-11-25

Family

ID=78708105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094720 WO2021233357A1 (fr) 2020-05-20 2021-05-19 Procédé de détection d'objet, système et support lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN115428040A (fr)
WO (1) WO2021233357A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180012411A1 (en) * 2016-07-11 2018-01-11 Gravity Jack, Inc. Augmented Reality Methods and Devices
WO2019213459A1 (fr) * 2018-05-04 2019-11-07 Northeastern University Système et procédé pour générer des points de repère d'image
CN110674696A (zh) * 2019-08-28 2020-01-10 珠海格力电器股份有限公司 一种监控方法、装置、***、监控设备及可读存储介质
US20200082544A1 (en) * 2018-09-10 2020-03-12 Arm Limited Computer vision processing
CN110895826A (zh) * 2018-09-12 2020-03-20 三星电子株式会社 图像处理的训练数据生成方法、图像处理方法及其装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180012411A1 (en) * 2016-07-11 2018-01-11 Gravity Jack, Inc. Augmented Reality Methods and Devices
WO2019213459A1 (fr) * 2018-05-04 2019-11-07 Northeastern University Système et procédé pour générer des points de repère d'image
US20200082544A1 (en) * 2018-09-10 2020-03-12 Arm Limited Computer vision processing
CN110895826A (zh) * 2018-09-12 2020-03-20 三星电子株式会社 图像处理的训练数据生成方法、图像处理方法及其装置
CN110674696A (zh) * 2019-08-28 2020-01-10 珠海格力电器股份有限公司 一种监控方法、装置、***、监控设备及可读存储介质

Also Published As

Publication number Publication date
CN115428040A (zh) 2022-12-02

Similar Documents

Publication Publication Date Title
US10796452B2 (en) Optimizations for structure mapping and up-sampling
US10733431B2 (en) Systems and methods for optimizing pose estimation
US10586350B2 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
KR20220004607A (ko) 목표 검출방법, 전자장치, 노변장치와 클라우드 컨트롤 플랫폼
CN114399629A (zh) 一种目标检测模型的训练方法、目标检测的方法和装置
CN111670457A (zh) 动态对象实例检测、分割和结构映射的优化
CN110597387B (zh) 基于人工智能的画面显示方法和装置、计算设备、存储介质
CN109685873B (zh) 一种人脸重建方法、装置、设备和存储介质
CN111459269B (zh) 一种增强现实显示方法、***及计算机可读存储介质
CN113343982A (zh) 多模态特征融合的实体关系提取方法、装置和设备
EP4365841A1 (fr) Procédé et appareil de détection de pose d'objet, dispositif informatique et support de stockage
CN108628442B (zh) 一种信息提示方法、装置以及电子设备
US20230067934A1 (en) Action Recognition Method, Apparatus and Device, Storage Medium and Computer Program Product
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
US20230401799A1 (en) Augmented reality method and related device
CN114565916A (zh) 目标检测模型训练方法、目标检测方法以及电子设备
CN113989376B (zh) 室内深度信息的获取方法、装置和可读存储介质
CN116452631A (zh) 一种多目标跟踪方法、终端设备及存储介质
CN114998433A (zh) 位姿计算方法、装置、存储介质以及电子设备
CN113284237A (zh) 一种三维重建方法、***、电子设备及存储介质
CN117132649A (zh) 人工智能融合北斗卫星导航的船舶视频定位方法及装置
WO2021233357A1 (fr) Procédé de détection d'objet, système et support lisible par ordinateur
US20210104096A1 (en) Surface geometry object model training and inference
CN114663980B (zh) 行为识别方法、深度学习模型的训练方法及装置
US11961249B2 (en) Generating stereo-based dense depth images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21807677

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21807677

Country of ref document: EP

Kind code of ref document: A1