CN113793378B

CN113793378B - Semantic SLAM object association and pose updating method and system based on hierarchical grouping

Info

Publication number: CN113793378B
Application number: CN202110685643.4A
Authority: CN
Inventors: 张剑华; 陈凯祺; 刘嘉玲; 孙波
Original assignee: Zidong Information Technology Suzhou Co ltd
Current assignee: Zidong Information Technology Suzhou Co ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2023-08-11
Anticipated expiration: 2041-06-21
Also published as: CN113793378A

Abstract

The application relates to a semantic SLAM object association and pose updating method and system based on hierarchical grouping, wherein the method comprises the steps of obtaining a moving image of a dynamic object; constructing a key frame queue according to the moving image, and selecting key frames from the key frame queue as a key frame group; performing object association operation on the key frames in each key frame group to obtain object signposts and pose information thereof; constructing a Gaussian mixture model according to the object signpost and pose information thereof, judging whether the object signpost in each key frame group is associated with each object signpost in other key frame groups by using the Gaussian mixture model, and obtaining a judging result; and updating the object signpost and the pose information of the object signpost in the map according to the judging result. The application can realize object association with high precision, avoid a plurality of similar and similar objects from generating error association, and overcome the defects of low object association accuracy, weak scene generalization capability and insufficient pose optimization in the prior art under the condition that a plurality of identical objects are very close to each other.

Description

Semantic SLAM object association and pose updating method and system based on hierarchical grouping

Technical Field

The application relates to the technical field of robot vision, in particular to a semantic SLAM object association and pose updating method and system based on hierarchical grouping.

Background

Although pure vision SLAM has better robustness, it is very prone to tracking failures in dynamic scenes, fast motion, loss of texture, illumination changes, and other situations. Therefore, the combination of the traditional SLAM and the semantic information can improve the robustness of the system and is more in line with the understanding of human beings on exploring unknown environments. However, the conventional visual SLAM uses little semantic information in positioning and mapping, and thus is limited in some application scenarios.

Accurate object association and real-time updating of optimized object poses are vital components in semantic SLAM. To build an accurate three-dimensional semantic map, accurate object association and object pose are the preconditions. Accurate object association relies primarily on accurate object measurements, including the class and pose of the object. In practice, however, the information captured by the sensors of the robot is noisy, and it is not reliable to use the information from the sensors alone to estimate the motion of the robot. In addition, the semantic SLAM object association method in the prior art has the defects of low object association accuracy, weak scene generalization capability and insufficient object pose optimization under the condition that a plurality of identical objects are very close to each other, so that an accurate three-dimensional semantic map cannot be constructed.

Disclosure of Invention

Therefore, the application aims to solve the technical problems that the object association accuracy is not high, the scene generalization capability is weak and the object pose optimization is insufficient under the condition that a plurality of identical objects are very close to each other in the semantic SLAM object association method in the prior art.

In order to solve the technical problems, the application provides a semantic SLAM object association and pose updating method based on hierarchical grouping, which comprises the following steps:

acquiring a moving image of a dynamic object;

constructing a key frame queue according to the moving image, and selecting continuous key frames from the key frame queue as key frame groups, wherein adjacent key frame groups have overlapped key frames;

performing object association operation on the key frames in each key frame group to obtain object signs and pose information of each key frame group;

constructing a Gaussian mixture model according to object signposts of one key frame group and pose information thereof, and judging whether each object signpost in each key frame group is associated with each object signpost in other key frame groups by using the Gaussian mixture model to obtain a judging result;

and updating the object signpost and the pose information thereof in the map according to the judging result.

In one embodiment of the present application, acquiring a moving image of a dynamic object includes:

in the moving process of the dynamic object, the moving image of the dynamic object is captured by using the image capturing device, and the pose of the image capturing device and the position of the image capturing device on a map are calculated for each frame of image.

In one embodiment of the present application, the de-distortion process is performed before a moving image of a dynamic object is captured with an image capturing apparatus.

In one embodiment of the present application, constructing a key frame queue from the moving image includes:

and adding the first image frame as a key frame into a key frame queue, taking the key frame as a reference to select an image frame with obvious image information change as a new key frame, adding the new key frame into a key frame queue, taking the new key frame as a reference to select other key frames, and so on to obtain the key frame queue comprising a plurality of key frames.

In one embodiment of the present application, performing object association operations on key frames within each key frame group includes:

and detecting and obtaining object measurement in each key frame group by using an MOT algorithm, wherein the object measurement comprises pose information, and associating each object measurement with an object landmark to obtain the key frame group containing the object landmark and the pose information thereof.

In one embodiment of the present application, associating each object measurement to an object landmark, obtaining a keyframe group including the object landmark and pose information thereof includes:

first, taking an object landmark correlated with a first object measurement in a key frame group as a first object landmark; then judging whether a second object measurement in the key frame group is associated with the first object landmark, if so, associating the second object measurement with the first object landmark, and if not, generating a second object landmark associated with the second object measurement; and then respectively judging whether a third object measurement in the key frame group is associated with the first object landmark and the second object landmark, and analogizing to obtain the key frame group containing the object landmark and the pose information thereof.

In one embodiment of the present application, the number of key frames within each key frame group is 10 or less.

In one embodiment of the present application, constructing a gaussian mixture model according to object landmarks of one keyframe group and pose information thereof, and determining whether each object landmark in each keyframe group is associated with each object landmark in other keyframe groups using the gaussian mixture model includes:

s4.1: according to the ith object guidepost in the nth key frame groupAnd pose information thereof to construct a Gaussian mixture model;

s4.2: marking the jth object in the mth key frame groupAnd pose information thereof is used as an input value of the Gaussian mixture model, and a j-th object road sign in an m-th key frame group is calculated>Marking +_with the ith object in the nth keyframe group>Maximum likelihood of (a) for a sample;

s4.3: if the maximum similarity probability is greater than or equal to a preset threshold value, judgingAnd->If the maximum likelihood is smaller than the preset threshold, judging +.>And->Not associated, and repeating step S4.2 until a AND +.>Associated object signposts.

In one embodiment of the present application, the jth object guidepost in the mth keyframe group is determinedMarking +_with the ith object in the nth keyframe group>After the association, get the relation +.>All subject measures associated are associated with +.>And (5) association.

In addition, the application also provides a semantic SLAM object association and pose updating system based on hierarchical grouping, which comprises the following steps:

the acquisition module is used for acquiring a moving image of the dynamic object;

a key frame configuration modeling block, configured to construct a key frame queue according to the moving image, and select continuous key frames from the key frame queue as key frame groups, where adjacent key frame groups have overlapping key frames;

the first-level object association module is used for carrying out object association operation on the key frames in each key frame group to obtain object signposts and pose information of each key frame group;

the second-level object association module is used for constructing a Gaussian mixture model according to object signposts of one key frame group and pose information thereof, and judging whether each object signpost in each key frame group is associated with each object signpost in other key frame groups by using the Gaussian mixture model to obtain a judging result;

and the pose updating module is used for updating the object guidepost and the pose information thereof in the map according to the judging result.

Compared with the prior art, the technical scheme of the application has the following advantages:

according to the application, a hierarchical grouping model is introduced, first, object association is carried out on key frames in each key frame group, then object association is carried out on object measurement among the key frame groups, so that object association can be realized with high precision, optimization of object road sign pose is carried out based on the object association, camera pose estimation of semantic SLAM can be promoted, the object pose after optimization can enable object association to be more accurate, thus a more accurate semantic map is constructed, and error association caused by a plurality of similar and similar objects is avoided, so that the defects of low object association accuracy, weak scene generalization capability and insufficient object pose optimization existing in the semantic SLAM object association method in the prior art under the condition that a plurality of identical objects are very close to each other are overcome.

Drawings

In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.

FIG. 1 is a flow diagram of a hierarchical grouping based semantic SLAM object association and pose update method of the present application.

FIG. 2 is a schematic block diagram of a hierarchical grouping based semantic SLAM object association and pose update method of the present application.

FIG. 3 is a partial schematic block diagram of a hierarchical grouping based semantic SLAM object association and pose update method of the present application.

FIG. 4 is a schematic diagram of a hierarchical grouping based semantic SLAM object association and pose update system of the present application.

Reference numerals illustrate: 10. an acquisition module; 20. a key frame configuration modeling block; 30. a first hierarchical object association module; 40. a second level object association module; 50. and a pose updating module.

Detailed Description

The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.

Example 1

Referring to fig. 1 to 3, the present embodiment provides a semantic SLAM object association and pose updating method based on hierarchical grouping, which specifically includes the following steps:

s1: a moving image of a dynamic object is acquired.

For example, a moving image thereof may be captured with an image pickup apparatus. As a preferred solution, the image capturing device may be a camera, capturing a moving image of a dynamic object with the camera, and calculating the pose of the camera and the position of the camera on the map for each frame of image, wherein in the moving process, the motion equation of the camera is expressed as follows:

x _t ＝f(x _t-1 ，μ _t )+ω _t ，ω _t ～N(0，R _t )

wherein f (x) _t-1 ，μ _t ) Represents the ideal posture change relation from the time t-1 to the time t, mu _t Representing motion measurements, ω _t Represents the obeying mean, its value is 0, variance R _t Noise representing gaussian distribution; the observation equation for the camera is also expressed as follows:

z _t ＝h(x _t ，y _t )+v _t ，v _t ～N(0，Q _t )

wherein h (x _t ，y _t ) Representing the pose x under the ideal condition t moment _t Lower pair of road signs y _t V of (c) _t Represents the obeying mean value, the value of which is 0, squareDifference Q _t Is gaussian distributed noise.

For example, the de-distortion process is performed before a moving image of a dynamic object is captured with a camera. The specific treatment is as follows: performing internal parameter calibration on the camera to obtain distortion parameters and an internal parameter matrix of the camera as follows:

wherein [ x, y]Representing the coordinates of normalized plane points, [ x ] _distorted ，y _distorted ]Representing the distorted coordinates, k ₁ ，k ₂ ，k ₃ ，p ₁ ，p ₂ Representing the distortion term, r representing the distance between any point on the plane and the origin of the coordinate system, P representing the camera reference matrix, f representing the camera focal length, [ O ] _x ，O _y ]Representing the principal optical axis point.

S2: and constructing a key frame queue according to the moving image, and selecting continuous key frames from the key frame queue as key frame groups, wherein adjacent key frame groups have overlapped key frames.

For example, a first image frame is used as a key frame to be added into a key frame queue, the key frame is used as a reference to select an image frame with obvious image information change to be set as a new key frame, the new key frame is added into the key frame queue, and the new key frame is used as a reference to select other key frames to be sequentially pushed, so that a key frame queue comprising a plurality of key frames is obtained.

S3: and carrying out object association operation on the key frames in each key frame group to obtain object signposts and pose information of the object signposts of each key frame group.

Illustratively, performing the object association operation on the keyframes within each keyframe group includes: and detecting and obtaining object measurement in each key frame group by using an MOT algorithm, wherein the object measurement comprises pose information, and associating each object measurement to an object landmark to obtain the key frame group containing the object landmark and the pose information thereof. Wherein associating each object measurement to an object landmark, obtaining a keyframe group including the object landmark and pose information thereof includes: first, taking an object landmark correlated with a first object measurement in a key frame group as a first object landmark; then judging whether a second object measurement in the key frame group is associated with the first object landmark, if so, associating the second object measurement with the first object landmark, and if not, generating a second object landmark associated with the second object measurement; and then, respectively judging whether a third object measurement in the key frame group is associated with the first object landmark and the second object landmark, and analogizing the first object landmark and the second object landmark to obtain the key frame group containing the object landmark and the pose information thereof.

S4: and constructing a Gaussian mixture model according to the object signpost of one key frame group and the pose information thereof, judging whether each object signpost in each key frame group is associated with each object signpost in other key frame groups by using the Gaussian mixture model, and updating the object signpost and the pose information of the object signpost in the map according to the judging result.

In the above S2, the first frame image is first used as the key frame F ₁ Adding the image frames into a key frame array, taking the key frame of the previous frame as a reference, selecting the image frame with obvious image information change as a new key frame F ₂ And adds it to the key frame queue, and so on. Suppose that at time t there is already a keyframe group of D keyframes, i.e. F _1：D ＝{F ₁ ，…，F _D }. Then starting at time zero from the first key frame F in the key frame queue ₁ Beginning to select consecutive M-frame keyframes as a group of keyframes, for example, at time t, the keyframe queue may be divided into a group of keyframes G _t ＝{g ₁ ，…，g _N }, wherein g _n (1.ltoreq.n.ltoreq.N) represents an nth key frame, and N represents the number of key frames. Of course, the number of key frame groups is plural, and adjacent key frame groups have heavy weightKey frames of the stack.

In S3 above, in each key frame group, the object measurement in the key frame group is correlated by the MOT algorithm, and since the number of key frames in each key frame group is 10 or less, the present application can detect and track the object measurement belonging to the same object landmark with high accuracy. The specific contents are as follows:

a) For each key frame F in the key frame group _m (M is more than or equal to 1 and less than or equal to M) using MOT algorithm to detect and obtain a group of object measurement L _m ＝{L _m，1 ，…，L _m，K }，K _m Representing the number of object measurements detected by the mth keyframe in a group of keyframes, each object measurement L _m，k (1. Ltoreq.k. Ltoreq.K) contains a number of parameter information expressed as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,the detected object identification ID, the ID of the key frame where the detected object is located, the coordinate value of the x axis of the center point of the detected object, the coordinate value of the y axis of the center point of the detected object, and the width and the height of the boundary frame of the detected object are respectively represented.

b) Using MOT algorithm to obtain tracking ID for each object measurement, object measurement in each keyframe can be accurately correlated with one object landmark, which can track multiple object instances simultaneously across multiple keyframes, e.g., in the nth keyframe group, the kth object in the mth keyframe is measuredAssociation with the ith object guidepostDenoted as->Wherein the superscript G indicates that the association is within a keyframe group,/or->Representing an ith object road sign in an nth key frame group; if the detected object measure cannot be correlated with any previous object guidepost, a new object guidepost will be generated within the keyframe group, i.e. a keyframe group contains a set of object guidepostsIndicating the ith object landmark in the nth key frame group, I indicating the number of object landmarks in the key frame group. It should be noted that, the object guideposts in the keyframe group are not the final object guideposts used in the map, but the object guideposts associated on the global level, i.e. the hierarchical policy designed by the present application has the same level of object association.

In the above S4, the following are specifically included:

s4.1: according to the ith object guidepost in the nth key frame groupAnd pose information thereof to construct a Gaussian mixture model. For example, the object guidepost in the 1 st keyframe group and the pose information thereof can be used as an initialization value of a Gaussian Mixture Model (GMM) model, and can be used as one of the earliest object guideposts in a map, and the object guideposts in other keyframe groups can be used as detection values to be input into the GMM for calculation. Assuming that the object landmark set O in the map is modeled, the input detection value is an object landmark that correlates the measurement of Z objects, which can be expressed as follows:

wherein lambda is _z Represents the weight coefficient, phi (O|theta) _z ) The density of the gaussian distribution is indicated,mean value of mu _z Variance is->At the same time:

and carrying out association judgment on the object landmark set O and Z object measurements, wherein each object measurement data is provided with six free variables, namely corresponding 3D world positions and three directions.

S4.2: marking the jth object in the mth key frame groupAnd pose information thereof is used as an input value of a Gaussian mixture model, and a jth object road sign +_in an mth key frame group is calculated>The i-th object guidepost in the n-th key frame group +.>Is the maximum likelihood of a person being identified. For example, the 3 rd object landmark ++in the 2 nd keyframe group can be used>And pose information thereof is used as an input value of a Gaussian mixture model, and a 3 rd object road sign in a second key frame group is calculated>2 nd object landmark in the 1 st keyframe group->Is the maximum likelihood of a person being identified.

S4.3: if the probability of similarity is maximumIf the threshold value is greater than or equal to the preset threshold value, judgingAnd->Is associated with->All subject measures associated with +.>Associating; if the maximum likelihood is less than the predetermined threshold value, a determination is made of +.>And->Not associated, and repeating step S4.2 until a AND +.>If the associated object guidepost is not found by traversing all the object guideposts in the current map, the method is to be +.>Adding the new object guidepost into the map, and updating the related information of the object guidepost in the map, and simultaneously utilizing +.>The pose information of the existing GMM is extended.

Also, object signposts in a keyframe groupAssigning unique object landmarks in a global map, the relationship q _p The following are provided:

wherein O is _p And P represents the number of the object road marks in the map at the moment t.

In S5 above, updating the pose of the object landmark in the map includes:

each object road sign O _p Comprises a plurality of object measurements, calculates the average angle difference theta of the kth object measurement and other objects associated with the same object road sign _k Average distance difference phi _k Since the distances and angles are different in scale, we need to normalize them, set the maximum angle difference A and the maximum distance difference B, and then normalize the average angle differenceMean distance difference>The calculation is as follows:

then, the target road sign O _p Related kth object measurement L _k The pose difference of (2) is calculated as:

where α and β are the weights of the angle and position differences, respectively, which are set to 0.4 and 0.6, respectively, in all experiments, and then all pose differences are ordered, and the pose of the object landmark is set to the pose of the object measurement with the smallest pose difference.

If the key frame loop is detected, loop correction is carried out, and the position of the relevant object road sign pose, the position of the point and the camera pose are updated.

And along with the execution of the object association operation and the optimization operation, updating and drawing the map of the object road sign and the camera motion trail with the three-dimensional pose in real time.

Example two

The following describes a semantic SLAM object association and pose update system based on hierarchical grouping disclosed in embodiment 2 of the present application, and the semantic SLAM object association and pose update system based on hierarchical grouping described below and the semantic SLAM object association and pose update method based on hierarchical grouping described above may be referred to correspondingly.

Referring to fig. 4, an embodiment of the present application provides a semantic SLAM object association and pose update system based on hierarchical grouping, including:

an acquisition module 10, wherein the acquisition module 10 is used for acquiring a moving image of a dynamic object;

a key frame configuration modeling block 20, where the key frame configuration modeling block 20 is configured to construct a key frame queue according to the moving image, and select continuous key frames from the key frame queue as key frame groups, where adjacent key frame groups have overlapping key frames;

the first-level object association module 30, where the first-level object association module 30 is configured to perform object association operation on the key frames in each key frame group, so as to obtain an object landmark of each key frame group and pose information thereof;

the second-level object association module 40, the second-level object association module 40 is configured to construct a gaussian mixture model according to object landmarks of one keyframe group and pose information thereof, and determine whether each object landmark in each keyframe group is associated with each object landmark in other keyframe groups by using the gaussian mixture model, so as to obtain a determination result;

the pose updating module 50 is configured to update the object landmark and pose information of the object landmark in the map according to the determination result.

The hierarchical packet based semantic SLAM object association and pose updating system of the present embodiment is used to implement the foregoing hierarchical packet based semantic SLAM object association and pose updating method, so that the specific implementation of the system can be seen from the foregoing example portions of the hierarchical packet based semantic SLAM object association and pose updating method, so that the specific implementation thereof may be referred to the description of the corresponding respective portion examples and will not be further described herein.

In addition, since the semantic SLAM object association and pose updating system based on hierarchical grouping in this embodiment is used to implement the foregoing semantic SLAM object association and pose updating method based on hierarchical grouping, the roles thereof correspond to the roles of the foregoing methods, and will not be described herein.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims

1. A semantic SLAM object association and pose updating method based on hierarchical grouping is characterized by comprising the following steps:

acquiring a moving image of a dynamic object;

performing object association operation on the key frames in each key frame group to obtain object signposts and pose information of the object signposts of each key frame group;

updating the object signpost and pose information of the object signpost in the map according to the judging result;

performing object association operations on key frames within each key frame group includes:

detecting and obtaining object measurement in each key frame group by using an MOT algorithm, wherein the object measurement comprises pose information, and associating each object measurement to an object landmark to obtain the key frame group containing the object landmark and the pose information thereof;

wherein, associating each object measurement to an object landmark, obtaining a key frame group containing the object landmark and pose information thereof comprises:

first, taking an object landmark correlated with a first object measurement in a key frame group as a first object landmark; then judging whether a second object measurement in the key frame group is associated with the first object landmark, if so, associating the second object measurement with the first object landmark, and if not, generating a second object landmark associated with the second object measurement; and then, respectively judging whether a third object measurement in the key frame group is associated with the first object landmark and the second object landmark, and analogizing the first object landmark and the second object landmark to obtain the key frame group containing the object landmark and the pose information thereof.

2. The hierarchical packet based semantic SLAM object association and pose update method of claim 1, wherein: acquiring a moving image of a dynamic object includes:

3. The hierarchical packet based semantic SLAM object association and pose update method of claim 2, wherein: the de-distortion process is performed before a moving image of a dynamic object is captured by an image pickup apparatus.

4. The hierarchical packet based semantic SLAM object association and pose update method of claim 1, wherein: constructing a key frame queue from the moving image includes:

and adding the first image frame as a key frame into a key frame queue, taking the key frame as a reference to select an image frame with obvious image information change as a new key frame, adding the new key frame into the key frame queue, taking the new key frame as a reference to select other key frames, and so on to obtain the key frame queue comprising a plurality of key frames.

5. The hierarchical packet based semantic SLAM object association and pose update method of claim 1, wherein: the number of key frames within each key frame group is 10 or less.

6. The hierarchical packet based semantic SLAM object association and pose update method of claim 1, wherein: constructing a Gaussian mixture model according to object signposts of one key frame group and pose information thereof, and judging whether each object signpost in each key frame group is associated with each object signpost in other key frame groups by using the Gaussian mixture model comprises the following steps:

7. The hierarchical packet based semantic SLAM object association and pose update method of claim 6, wherein: the jth object guidepost in judging the mth key frame groupWith the ith object landmark in the nth keyframe groupAfter the association, get the relation +.>All subject measures associated with +.>And (5) association.

8. A hierarchical grouping based semantic SLAM object association and pose update system, comprising:

the first-level object association module is used for carrying out object association operation on the key frames in each key frame group to obtain object signposts and pose information of the object signposts of each key frame group;

the second-level object association module is used for constructing a Gaussian mixture model according to object signposts of one key frame group and pose information thereof, judging whether each object signpost in each key frame group is associated with each object signpost in other key frame groups or not by utilizing the Gaussian mixture model, and obtaining a judgment result;

the pose updating module is used for updating object signposts and pose information thereof in the map according to the judging result;