CN113192109B

CN113192109B - Method and device for identifying motion state of object in continuous frames

Info

Publication number: CN113192109B
Application number: CN202110609317.5A
Authority: CN
Inventors: 刘杰辰; 陈佃文; 黄宇凯; 曹琼; 郝玉峰; 李科
Original assignee: Beijing Speechocean Technology Co ltd
Current assignee: Beijing Speechocean Technology Co ltd
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2022-01-11
Anticipated expiration: 2041-06-01
Also published as: CN113192109A

Abstract

The present disclosure relates to a method and apparatus for recognizing a motion state of an object in consecutive frames, an electronic device, and a computer-readable storage medium. Wherein the method of identifying the motion state of the object in successive frames comprises: obtaining the labeling result of multiple objects in continuous frames, wherein the labeling result is the object type and the area, and determining multiple reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground; acquiring the moving distance of an object to be identified relative to a plurality of reference objects; and determining the motion state of the object to be identified in the current frame according to the moving distance. The motion state of the object is directly marked through an algorithm, so that the workload of manual marking is reduced, and the marking efficiency is greatly improved; the displacement distance of the object to be recognized is accurately calculated by automatically selecting a plurality of reference objects, so that the recognition accuracy is improved.

Description

Method and device for identifying motion state of object in continuous frames

Technical Field

The present disclosure relates to the field of image processing, and more particularly, to a method and apparatus for identifying a motion state of an object in consecutive frames, an electronic device, and a computer-readable storage medium.

Background

The moving target detection is the basis of moving target tracking, behavior recognition, scene description and other technologies, and the detection result directly influences the accuracy of a subsequent algorithm. Therefore, how to improve the accuracy and robustness of target detection becomes one of the main research directions in the field of computer vision. However, in the vehicle driving assistance process, image data of a plurality of frames can be collected, and it is a very tedious work to label all objects appearing in a plurality of continuous frames and judge the motion state of the objects.

At present, many automatic driving technologies collect 3D point cloud data for analysis, and the point cloud is a data set of points in a certain coordinate system. Points contain rich information including three-dimensional coordinates X, Y, Z, color, classification values, intensity values, time, and the like. The point cloud data is mainly acquired through a three-dimensional laser scanner, three-dimensional reconstruction is carried out through a two-dimensional image, point cloud data are acquired in the reconstruction process, and in addition, the point cloud data are calculated and acquired through a three-dimensional model. In general, 3D point cloud data is acquired by using LiDAR (light Detection And ranging) technology, which is a laser Detection And measurement technology, And the point cloud data is processed And applied while the point cloud data is acquired. LiDAR data acquisition methods fall into three major categories: satellite-borne, airborne and ground, and most point cloud data used for automatic driving is acquired by vehicle-mounted ground. Unlike RGB images, LIDAR point clouds are 3D and unstructured, and facing the real-time nature of vehicle driving assistance task requirements, it is very difficult to quickly and accurately complete the task of moving target determination during the labeling process. At present, a common detection method is to label manually, and in an actual labeling process, a labeling operator firstly confirms the type of an object according to a frame selection area, and then determines attributes of the object, wherein the motion state of the object is one of the attributes, and the motion state of the object needs to be determined according to the judgment of the labeling operator. Judging the motion state of an object is a tedious matter, and for a certain object, a annotator needs to compare the relative positions of the object in the previous and later frames and select whether the object is 'stationary' or 'moving' in a certain frame. In actual operation, the method is time-consuming, labor-consuming and error-prone.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a method and apparatus for identifying a motion state of an object in consecutive frames, an electronic device, and a computer-readable storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided a method of identifying an object motion state in consecutive frames, the method comprising: acquiring the labeling results of multiple objects in continuous frames, wherein the labeling results are object types and areas; determining a plurality of reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground; acquiring the moving distance of an object to be identified relative to a plurality of reference objects; and determining the motion state of the object to be identified in the current frame according to the moving distance.

In one embodiment, the successive frames are point cloud data acquired using a 3D radar.

In one embodiment, determining a plurality of references of the current frame according to the labeling result includes: according to the labeling result, numbering a plurality of objects in the continuous frames, wherein the same objects in different frames correspond to the same and unique numbers; dividing a static object and a non-static object according to the class of the object, wherein the static object is an object which is static relative to the ground; and taking the same static object in the current frame and the previous frame as a reference object of the current frame.

In an embodiment, the method further comprises: after determining a plurality of reference objects in continuous frames, classifying the reference objects according to the identification accuracy and the movable degree; and dividing the static weight of the reference object according to the classification result.

In one embodiment, the acquiring the moving distances of the object to be recognized relative to the plurality of reference objects includes: if m identical reference objects are contained in two adjacent frames, and m is larger than or equal to 1, determining a static coordinate system based on the m reference objects to obtain coordinate values of the object to be identified in the adjacent frames in the static coordinate system; and obtaining the moving distance of the object to be recognized according to the coordinate values.

In one embodiment, the obtaining of the moving distances of the object to be recognized relative to the plurality of reference objects includes, if m identical reference objects are included in two adjacent frames and m is greater than or equal to 1, determining coordinate values of the object to be recognized in the adjacent frames by taking the ith reference object as a coordinate origin (i is 1, …, m); and obtaining the moving distance of the object to be recognized relative to the ith reference object according to the coordinate values.

In an embodiment, the obtaining of the moving distances of the object to be recognized relative to the plurality of reference objects includes determining the motion state of the object to be recognized according to the labeling result of the object to be recognized if the two adjacent frames do not include the same reference object.

In one embodiment, determining the motion state of the object to be recognized in the current frame according to the moving distance includes: and judging the motion state of the object to be recognized according to the moving distance of the object to be recognized relative to the m reference objects and the static weight values of the m reference objects.

In one embodiment, determining the motion state of the object to be recognized according to the moving distance of the object to be recognized relative to the m reference objects and the static weights of the m reference objects includes: acquiring a weighted motion value D of an object to be identified:

where m is the number of identical references in two adjacent frames, d_iRepresenting the movement value of the object to be recognized relative to the ith reference object, and if the movement distance of the object to be recognized relative to the ith reference object is less than a preset distance threshold value, d_iIf the moving distance of the object to be recognized relative to the ith reference object is greater than a preset distance threshold value, d is equal to 0_i＝1，ω_iRepresenting the static weight of the ith reference object; if the weighted motion value is larger than the set motion threshold value, marking the object to be identified as a moving object; and if the weighted motion value is less than or equal to the set motion threshold value, marking the object to be identified as a static object.

In an embodiment, the method further includes marking that the object to be recognized in the previous frame is a moving object if the determination results of the motion states of the object to be recognized are different for two consecutive times.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for identifying a motion state of an object in consecutive frames, the apparatus comprising: the labeling unit is used for acquiring labeling results of multiple objects in continuous frames, and the labeling results are object types and areas; the reference object determining unit is used for determining a plurality of reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground; the distance measurement unit is used for acquiring the moving distance of the object to be identified relative to a plurality of reference objects; and the state identification unit is used for determining the motion state of the object to be identified in the current frame according to the moving distance.

In one embodiment, the continuous frames are continuous frames of point cloud data acquired using a 3D radar.

In one embodiment, the reference object determining unit includes: according to the labeling result, numbering a plurality of objects in the continuous frames, wherein the same objects in different frames correspond to the same and unique numbers; dividing a static object and a non-static object according to the class of the object, wherein the static object is an object which is static relative to the ground; and taking the same static object in the current frame and the previous frame as a reference object of the current frame.

In an embodiment, the apparatus further comprises: the classification unit is used for classifying the reference object according to the identification accuracy and the movable degree; and dividing the static weight of the reference object according to the classification result.

In one embodiment, the ranging unit further comprises: if m identical reference objects are contained in two adjacent frames, and m is larger than or equal to 1, determining a static coordinate system based on the m reference objects to obtain coordinate values of the object to be identified in the adjacent frames in the static coordinate system; and obtaining the moving distance of the object to be recognized according to the coordinate values.

In one embodiment, the ranging unit comprises, when m identical reference objects are contained in two adjacent frames, m is larger than or equal to 1, and the coordinate value of the object to be recognized in the adjacent frame is determined by taking the ith reference object as the origin of coordinates (i is 1, …, m); and obtaining the moving distance of the object to be recognized relative to the ith reference object according to the coordinate values.

In an embodiment, the distance measuring unit further includes, when two adjacent frames do not include the same reference object, determining the motion state of the object to be recognized according to the labeling result of the object to be recognized.

In one embodiment, the state recognition unit includes: and judging the motion state of the object to be recognized according to the moving distance of the object to be recognized relative to the m reference objects and the static weight values of the m reference objects.

In one embodiment, the state identification unit further includes: a weighted motion value D of the object to be identified is obtained,

In an embodiment, the device further comprises a correction unit, configured to mark the object to be recognized in the previous frame as a moving object when the determination result that the motion state of the object to be recognized is different for two consecutive times.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a memory to store instructions; and a processor for invoking the memory-stored instructions to perform the method of the first aspect of identifying a motion state of an object in successive frames.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by a processor, perform the method of the first aspect of identifying a motion state of an object in consecutive frames.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method for identifying the motion state of the object in the continuous frames directly labels the motion state of the object through an algorithm, reduces the workload of manual labeling, and greatly improves the labeling efficiency; meanwhile, a reference object can be automatically selected, a plurality of reference objects are introduced to vote on the motion state of the object, and the method has higher reliability than a manual single-target reference method and is more suitable for an object motion state identification task of a vehicle in the advancing process; the displacement distance of the object to be recognized is calculated in a reference object mode, so that the method has higher reliability compared with the method for observing by eyes, the recognition efficiency is improved, and the recognition accuracy is also improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram illustrating a method for identifying a motion state of an object in successive frames in accordance with an exemplary embodiment;

FIG. 2 is a diagram of point cloud data shown in accordance with an exemplary embodiment;

FIG. 3 is another point cloud data diagram shown in accordance with an exemplary embodiment;

FIG. 4 is an actual image of the in-vehicle movement measurement system shown in accordance with an exemplary embodiment;

FIG. 5 is a schematic illustration showing post-annotation results of collected point cloud data by an in-vehicle mobile measurement system, in accordance with an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating a process for determining multiple references based on annotation results according to an exemplary embodiment;

FIG. 7 is a schematic flow chart diagram illustrating another method for identifying a motion state of an object in successive frames in accordance with an exemplary embodiment;

FIG. 8 is a schematic flow chart illustrating a process for determining a motion state of an object to be recognized in a current frame according to a moving distance according to an exemplary embodiment;

FIG. 9 is a schematic flow chart diagram illustrating another method for identifying an object motion state in successive frames in accordance with an exemplary embodiment;

FIG. 10 is a schematic block diagram illustrating an apparatus for identifying a motion state of an object in successive frames in accordance with an illustrative embodiment;

FIG. 11 is a schematic block diagram illustrating another apparatus for identifying a motion state of an object in successive frames in accordance with an illustrative embodiment;

FIG. 12 is a schematic block diagram illustrating another apparatus for identifying a motion state of an object in successive frames in accordance with an illustrative embodiment;

FIG. 13 is a schematic block diagram illustrating an apparatus in accordance with an exemplary embodiment;

FIG. 14 is a schematic block diagram illustrating an electronic device in accordance with an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the task of detecting a moving object, common methods include acquiring continuous frame images through a camera for detection, and mainly include an inter-frame difference method, a background subtraction method and an optical flow method. The optical flow method is a method for detecting a target image based on detected luminance information, and is not generally used because of high computational complexity and weak interference resistance. The interframe difference method is to use continuous video frame images to carry out difference operation to extract a moving target, has strong adaptability to background change, but has a hole phenomenon in the detected target and omission of the target moving slowly. The background subtraction method is to establish a background model first and then extract a moving target by subtracting the current frame image from the background model. When the methods are applied to the driving process of the vehicle, the camera and the object to be recognized are in a relative motion state, the background is continuously changed, and the misjudgment rate of monitoring the motion state of the acquired image by an optical flow and frame difference method is high.

The current common detection method is to label manually, and judge and determine the detection method according to experience after repeatedly checking data manually. In the actual marking process, a marker firstly confirms the type of the object according to the selected area, and then determines other attributes of the object, including the motion state. Judging the motion state of an object is a tedious matter, and for a certain object, a annotator needs to compare the relative positions of the object in the previous and later frames and select whether the object is 'stationary' or 'moving' in a certain frame. In actual operation, the method is time-consuming, labor-consuming and error-prone.

In order to solve the above problem, the present disclosure provides a method for identifying a motion state of an object in consecutive frames, which includes steps S11-S14, as shown in fig. 1, and is described in detail as follows:

step S11, obtaining labeling results of multiple objects in consecutive frames, where the labeling results are object types and areas.

Firstly, objects in each frame are labeled and classified in a manual or algorithm detection mode. The continuous frames can be image data acquired by a camera or a binocular camera, point cloud data acquired by a 3D radar and the like, the labeling result can be a region of interest of multiple targets and object types, and the object types comprise: pedestrians, cars, riding non-motor vehicles, riding motorcycles, express delivery vehicles, street lamps, trees, signs, buildings, garbage cans and the like. Some current target detection and identification methods can be applied in the labeling process of the present disclosure, for example, a method of identifying each object by a pre-trained identification model, or a method of directly predicting pixel classes by using a multi-individual learning method, and the like. The judgment of the motion area is carried out after the category of the object is identified, so that the prior knowledge is increased, for example, trees, traffic signs and the like are generally fixed target objects, vehicles and pedestrians are moving target objects which are worth of high attention, the motion condition of the target objects is further judged by increasing the prior classification information, and the misjudgment condition is reduced.

Many current autopilot technologies collect 3D point cloud data for analysis, where the point cloud is a data set of points in a coordinate system, as shown in fig. 2. The points contain rich information including three-dimensional coordinates X, Y, Z, color, classification values, intensity values, time, etc., as shown in fig. 3. The point cloud data is mainly acquired through a three-dimensional laser scanner, three-dimensional reconstruction is carried out through a two-dimensional image, point cloud data are acquired in the reconstruction process, and in addition, the point cloud data are calculated and acquired through a three-dimensional model. In general, 3D point cloud data is acquired by using LiDAR (light Detection And ranging) technology, which is a laser Detection And measurement technology, And the point cloud data is processed And applied while the point cloud data is acquired. LiDAR data acquisition methods fall into three major categories: the point cloud data used for automatic driving are mostly acquired from the vehicle-mounted ground, and fig. 4 is an actual image of the vehicle-mounted MMS mobile measurement system. Image 3D object annotation refers to: and marking a three-dimensional surrounding frame of the target object in the image in a 3D space by using the laser point cloud data of the target object in the image, such as a vehicle, a bicycle, a pedestrian, a guideboard indicator and the like at the same time obtained by 3D laser scanning, and projecting the three-dimensional surrounding frame to the image. When the continuous frames are point cloud data acquired by using a 3D radar, acquiring the labeling result of multiple objects in the continuous frames, and acquiring the three-dimensional surrounding frame and the object type.

FIG. 5 is a marked effect of point cloud data acquired by a vehicle mounted MMS, with the center of the circle in the middle being the location of the LiDAR acquisition device, thereby emitting detection laser outward for the center. When an object is labeled, a series of parameters need to be selected after the object bounding box is adjusted, and for an automobile, the motion state, whether the door is opened, the type of the automobile and the like are common 3D object labeling tasks. In the actual labeling process, the region obtained by the labeling person is the frame selection region as shown in fig. 5, and after the object type is confirmed, the labeling person needs to label the attributes of the objects one by one, wherein the motion state of the object is one of the attributes.

The method is more suitable for the point cloud data acquired by the 3D radar, firstly, the point cloud data is acquired by the 3D radar mostly in the current vehicle auxiliary driving, and then the data is labeled, so that the method is suitable for the scene; secondly, the point cloud data can acquire richer information compared with a two-dimensional plane camera, the information comprises three-dimensional coordinates X, Y, Z, colors, classification values, intensity values, time and the like, the precision is higher during classification, the point cloud data comprises three-dimensional coordinate data, and calculation is more convenient and faster when the position of the point is measured and calculated.

And step S12, determining a plurality of reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground.

In the current algorithms for judging moving objects, the moving objects are detected based on the camera being fixed, so that a single reference object or background is often used for motion detection, and common methods include a frame difference method, an optical flow method and a background subtraction method. The frame difference method is to obtain the moving area of an image by using the image difference of two or more adjacent frames so as to obtain the video foreground information. The method is high in speed and simple in operation, but a large error occurs when a foreground object moving at a high speed is faced. The optical flow method is to estimate the movement of foreground object in video according to the space-time variation gradient of image by calculating optical flow field and extract foreground object. The method has the disadvantages of large calculation amount, high sensitivity to noise and easy influence. The background subtraction method is to pre-construct a background image by using image information, and then to differentiate a current frame from the background image, thereby realizing the distinction between a motion region and a background and extracting video foreground information. The method has high requirement on the quality of artificially constructed background, the selection of the threshold is very critical, and moving foreground objects cannot be completely extracted if the threshold is too high or too low, so that certain difficulty exists in operation. Therefore, when the existing motion detection method is applied to the process of vehicle auxiliary driving task, because the vehicle has a high driving speed and a plurality of lane-changing and turning conditions, the camera and the object to be identified are in a relative motion state, the background is continuously changed, the effect of motion detection by applying a single reference object or background is not ideal, and in the process of advancing, other moving objects can be judged by depending on the reference object on the road surface, and the accuracy of motion detection can be further improved by using a plurality of reference objects, so that the detection abnormity and misjudgment easily brought by the single reference object are avoided.

In one embodiment, as shown in fig. 6, step S12 includes: and step S121, numbering the multiple objects in the continuous frames according to the labeling result, wherein the same objects in different frames correspond to the same and unique numbers. During labeling, each object has a unique tracking id, and the same object in different frames uses the same id. The same objects can be divided into the same ID according to Kalman filtering algorithm prediction or other identification algorithms, so that the labeling management and the judgment of judging whether the reference objects in the continuous frames are the same or not are facilitated.

Step S122, dividing a static object and a non-static object according to the type of the object, wherein the static object is an object which is static relative to the ground; step S123, the same stationary object in the current frame and the previous frame is used as the reference object of the current frame.

In the process of advancing, other moving objects can be judged by means of a reference object on the road surface. Depending on the requirements of the vehicle to assist the driving task, it is more desirable to detect the target than the target on which the ground is moving. Therefore, when the reference object is selected, objects which are static relative to the ground, such as trees, buildings, traffic lights, bus stop boards and the like, are selected, the static objects are used as the reference object to accord with the common sense of motion detection, the reference object can be automatically selected through an algorithm, manual screening is avoided, meanwhile, the confidence coefficient of motion detection is effectively improved, and the target which moves relative to the ground is detected in time.

In one embodiment, as shown in fig. 7, the method further comprises: in step S15, after a plurality of reference objects in consecutive frames are determined, the reference objects are classified according to the recognition accuracy and the degree of movement.

Alternatively, a stationary object, i.e., a reference object that is unlikely to move, is classified into three levels in the annotation target:

grade one: the recognition accuracy is high, and movement is completely impossible, for example: traffic signal lamps, traffic signs, street lamps.

Grade two: recognition accuracy is high and is generally unlikely to be moved, for example: cone, garbage bin, bicycle of parking.

Grade three: the recognition accuracy is general, and is impossible to be moved at all, and the weight is 0.5, for example: trees and benches.

The identification accuracy is judged according to the feature degree of the object point cloud data, and the distinguishing standards are that the features are very obvious, the features are not easy to be confused, and the marks are not easy to be missed. The judgment can be carried out according to expert experience knowledge, the identification model can also be detected according to a test set dividing mode, and the identification rate and the recall rate are used as judgment basis of the identification accuracy; whether or not movement is possible is judged based on common sense. When the reference object is screened, the reference object is classified according to the identification accuracy and the mobility degree according to the grade of the reference object, the identification accuracy and the priori knowledge can be fused into the judgment process, the reference object is graded, the confidence coefficient of motion detection can be increased, and the probability of misjudgment is further reduced.

And step S16, dividing the static weight of the reference object according to the classification result.

In an embodiment of the present disclosure, the weight of level one is set to 1, the weight of level 2 is 0.8, and the weight of level three is 0.5. The recognition accuracy is high, and a completely unmovable object is used as a reference object, and the confidence coefficient of the object is higher than the recognition accuracy, and the object is generally unlikely to be moved. And the weight division is also the weight combination with the highest accuracy obtained after multiple tests. And a reference object with high reliability is given higher weight, a more reasonable motion state evaluation system can be constructed, and the accuracy of motion detection is improved.

In step S13, the moving distances of the object to be recognized with respect to the plurality of reference objects are acquired.

When the motion state is judged, the calculation of the moving distance is the most rapid and convenient, and whether the object to be detected moves relative to the reference object can be rapidly obtained by calculating the moving distance according to the reference object. The method for calculating the displacement distance has more credibility compared with the observation by naked eyes.

In an embodiment, step S13 further includes, if two adjacent frames include m identical reference objects, where m is greater than or equal to 1, determining a stationary coordinate system based on the m reference objects, and obtaining coordinate values of the object to be recognized in the stationary coordinate system in the adjacent frames; and obtaining the moving distance of the object to be recognized according to the coordinate values.

In the above embodiment, the stationary coordinate system origin may be determined based on the m reference objects. Alternatively, if a single stable stationary reference object exists, such as a street lamp or a traffic sign, the sign may be used as a fixed reference object, the center of gravity of the fixed reference object is used as the origin of a stationary coordinate system, the X forward direction in the point cloud data is used as the X forward direction of a constructed coordinate axis, the Y forward direction is used as the Y forward direction of the constructed coordinate axis, and the Z forward direction is used as the Y forward direction of the constructed coordinate axis, and the coordinate values of the object in the stationary coordinate system are calculated, so as to obtain the moving distance of the object to be recognized in the adjacent frames, thereby determining the moving state of the object to be recognized. In another embodiment, if there are multiple reference objects, the multiple reference objects may be filtered, for example, 3 stationary objects with the largest distance are selected to determine the stationary coordinate system. Alternatively, one center of gravity may be determined as the origin of the stationary coordinate system using the 3 stationary objects having the largest distance. Or, the reference objects with the most front ranking values are selected by ranking the stability degrees of the reference objects, and a static coordinate system is determined. After acquiring the distance, if the calculation result is less than 0.1 (empirical value, error tolerance), we consider a to remain stationary with respect to the stationary coordinate system, otherwise, we consider a to move with respect to the stationary coordinate system. By establishing the stationary coordinate system, the coordinate system stationary with respect to the ground can be found, and the motion state of the object can be determined more objectively and accurately.

In one embodiment, step S13 includes detecting an image, and if m is greater than or equal to 1 and m is the same reference object in two adjacent frames, determining coordinate values of the object to be recognized in the adjacent frames with the ith reference object as the origin of coordinates (i is 1, …, m); and obtaining the moving distance of the object to be recognized relative to the ith reference object according to the coordinate values. If the image data is two-dimensional image data, the X, Y axis is based on an X axis and a Y axis of a camera imaging plane coordinate system, and if the image data is point cloud data, the X forward direction in the point cloud data is taken as a constructed coordinate axis X forward direction, and the Y forward direction is taken as a constructed coordinate axis Y forward direction. In the driving process of the vehicle, the X direction and the Y direction in the point cloud data are related to the form and the direction of the vehicle, and whether the object moves or not is judged to better meet the requirement of the vehicle on the motion detection in the auxiliary driving task.

Operating on each frame, assume that the motion state determination is now made for the a object in the nth frame. First, coordinates of a in the coordinate axis of each reference object (assuming that m stationary objects are shared, i is 1, …, m, respectively) in the nth frame are calculated<X_n,i,Y_n,i>The coordinate obtaining mode is as follows: and taking the current reference object as a coordinate origin, taking the X forward direction in the point cloud data as the X forward direction of the constructed coordinate axis, and taking the Y forward direction as the Y forward direction of the constructed coordinate axis, and calculating the coordinate of the A object in the coordinate axis. The point cloud data includes Z-direction data, and in the present disclosure, Z-point coordinates may also be obtained and calculated. However, in practical monitoring, the movement of a pedestrian across a road or a vehicle merging into a road is more worthwhile to monitor than the planar movement formed on the ground, and the movement of the pedestrian along the Z-axis is less important. Therefore, the X, Y axis coordinate is only selected to calculate the movement needed to be detected during driving more quickly.

For two adjacent frames n-1 and n, based on m common stationary objects j of the two frames being 1, …, m, calculating the difference between the coordinates of the target object a in the two previous frames and the coordinate of the target object a in the two next frames, the calculation formula is as follows:

alternatively, the subsequent calculation may be performed directly using the distance value, or may be performed after the normalization process, if the calculation result is less than 0.1 (empirical value, error tolerance), then we consider that a is stationary with respect to the stationary object and is recorded as 0, otherwise, we consider that a is moving with respect to the stationary object and is recorded as 1, and the detection results of the moving distances with respect to the plurality of reference objects may be normalized by taking the allowable errors into account.

In an embodiment, the step S13 further includes, if two adjacent frames do not include the same reference object, determining the motion state of the object to be recognized according to the labeling result of the object to be recognized.

In the detection process, if the condition that the same reference object is not contained occurs, the objects are sorted according to the labeling result, the static objects and the non-static objects are screened out, the calibration of the motion state is directly carried out, the normal static objects are labeled to be static, the movable objects are labeled to be in motion, and automobiles, pedestrians, pets and the like are detected. When the same motion reference object does not exist, the pedestrian, the vehicle and the like are marked as motion instead of default values, the early warning level can be improved according to the motion state, and the safety requirement of the vehicle auxiliary driving task is met.

Alternatively, for the motion state of frame 1, since there is no frame n-1 as a reference, the following method is used: directly copying the motion state of the object which simultaneously appears in the 1 st frame and the 2 nd frame to the state of the object in the 2 nd frame or the state in the subsequent first occurrence; if the subsequent stationary objects do not appear or are completely different with reference to the stationary objects, the calibration is carried out according to the common knowledge so as to perfect the acquired data information.

And step S14, determining the motion state of the object to be recognized in the current frame according to the moving distance.

The stability of each reference object is different, the importance degree is different when judging the motion state, the moving distances of the multiple reference objects are unified, different weights are assigned, the confidence coefficient of the multiple reference object comprehensive analysis is improved through weighting operation, and the accuracy of motion detection is improved.

In a specific embodiment, if the coordinates of the object to be identified are obtained by using a stationary coordinate system, a plurality of reference objects may be filtered according to the stationary weight, for example. Alternatively, the motion state may be determined by sorting the static weights of the reference objects, selecting the reference object with the top sorting value, and determining a center of gravity as the origin of the static coordinate system to obtain the motion distance of the object to be recognized.

In one embodiment, as shown in fig. 8, the step S14 of determining the motion state of the object to be recognized according to the moving distances of the object to be recognized relative to the m reference objects and the static weights of the m reference objects includes: acquiring a weighted motion value D of an object to be identified:

where m is the number of identical references in two adjacent frames, d_iRepresenting the movement value of the object to be recognized relative to the ith reference object, and if the movement distance of the object to be recognized relative to the ith reference object is less than a preset distance threshold value, d_iIf the moving distance of the object to be recognized relative to the ith reference object is greater than a preset distance threshold value, d is equal to 0_i＝1，ω_iRepresenting the static weight of the ith reference object; if the weighted motion value is larger than the set motion threshold value, marking the object to be identified as a moving object; and if the weighted motion value is less than or equal to the set motion threshold value, marking the object to be identified as a static object. In a specific embodiment, the motion threshold may take on a value of 0.5.

For example, if m reference objects exist in the nth frame and the nth-1 frame simultaneously, the calculation results based on m stationary objects are multiplied by the weight values set in step S16 and then added, and then divided by the total number m of stationary objects, and if the result is greater than the given threshold value 0.5, the object is considered to be moving in the nth-1 frame and the nth frame, otherwise, the object is considered to be stationary. Through the operation, the motion state of the object is directly marked by the algorithm, so that the workload of manual marking is reduced, and the marking efficiency is greatly improved.

In an embodiment, as shown in fig. 9, the method further includes, in step S17, if the determination result of the motion state of the object to be recognized is different for two consecutive times, marking the object to be recognized in the previous frame as a moving object.

For example, the target object A is diverged in two judgments, the object A is judged to be static in the n-2 frame and the n-1 frame through the steps, and the motion state of the object A in the n-1 frame is judged to be motion in the n-1 frame and the n-1 frame, and then the motion state of the object A in the n-1 frame is judged to be motion. At this time, the A object is still in n-2 frames, the A object is determined to be moving in n-1 frames, and the A object is moving in n frames. The situation may occur at the moment when the target object starts to move or may be accidental misjudgment occurring under the condition of continuous movement, the state of the target object is marked as movement, the misjudgment situation can be effectively avoided, the movement moment of the object is accurately detected, the early warning level is improved in advance during processing, and the overall safety of the vehicle auxiliary driving task is improved.

According to a second aspect of the embodiments of the present disclosure, as shown in fig. 10, there is provided an apparatus 100 for identifying a motion state of an object in consecutive frames, the apparatus 100 comprising: a labeling unit 110, configured to obtain labeling results of multiple objects in consecutive frames, where the labeling results are object types and areas; a reference object determining unit 120, configured to determine, according to the labeling result, a plurality of reference objects of the current frame, where the reference objects are objects that are stationary with respect to the ground; a distance measuring unit 130 for acquiring moving distances of the object to be recognized with respect to a plurality of reference objects; and the state identification unit 140 is used for determining the motion state of the object to be identified in the current frame according to the moving distance.

In one embodiment, the reference object determining unit 120 includes: according to the labeling result, numbering a plurality of objects in the continuous frames, wherein the same objects in different frames correspond to the same and unique numbers; dividing a static object and a non-static object according to the class of the object, wherein the static object is an object which is static relative to the ground; and taking the same static object in the current frame and the previous frame as a reference object of the current frame.

In one embodiment, as shown in fig. 11, the apparatus 100 further comprises: a classification unit 150 for classifying the reference object according to the recognition accuracy and the movable degree; and dividing the static weight of the reference object according to the classification result.

In one embodiment, the ranging unit 130 includes: if m identical reference objects are contained in two adjacent frames, and m is larger than or equal to 1, determining a static coordinate system based on the m reference objects to obtain coordinate values of the object to be identified in the adjacent frames in the static coordinate system; and obtaining the moving distance of the object to be recognized according to the coordinate values.

In one embodiment, the distance measuring unit 130 includes, when m identical reference objects are included in two adjacent frames, determining coordinate values of the object to be recognized in the adjacent frames by using the ith reference object as the origin of coordinates (i is 1, …, m); and obtaining the moving distance of the object to be recognized relative to the ith reference object according to the coordinate values.

In an embodiment, the distance measuring unit 130 further includes, when two adjacent frames do not include the same reference object, determining a motion state of the object to be recognized according to the labeling result of the object to be recognized.

In one embodiment, the state recognition unit 140 includes: and judging the motion state of the object to be recognized according to the moving distance of the object to be recognized relative to the m reference objects and the static weight values of the m reference objects.

In one embodiment, the state identification unit 140 further includes: a weighted motion value D of the object to be identified is obtained,

where m is the number of identical references in two adjacent frames, d_iRepresenting the movement value of the object to be recognized relative to the ith reference object, and if the movement distance of the object to be recognized relative to the ith reference object is less than a preset distance threshold value, d_iIf the moving distance of the object to be recognized relative to the ith reference object is greater than a preset distance threshold value, d is equal to 0_i＝1，ω_iRepresenting the static weight of the ith reference object; if the weighted motion value is larger than the set motion threshold value, marking the object to be identified as a moving object; if weightedAnd if the motion value is less than or equal to the set motion threshold value, marking the object to be identified as a static object.

In one embodiment, as shown in fig. 12, the apparatus 100 further includes a correction unit 160 for marking the object to be recognized in the previous frame as a moving object when the determination results that the motion states of the object to be recognized are continuously different twice.

Referring to fig. 13, the apparatus 200 may include one or more of the following components: a processing component 202, a memory 204, a power component 206, a multimedia component 208, an audio component 210, an input/output (I/O) interface 212, a sensor component 214, and a communication component 216.

The processing component 202 generally controls overall operation of the device 200, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 202 may include one or more processors 220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 202 can include one or more modules that facilitate interaction between the processing component 202 and other components. For example, the processing component 202 can include a multimedia module to facilitate interaction between the multimedia component 208 and the processing component 202.

The memory 204 is configured to store various types of data to support operations at the apparatus 200. Examples of such data include instructions for any application or method operating on the device 200, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 204 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 206 provides power to the various components of the device 200. The power components 206 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 200.

The multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 208 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 200 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 210 is configured to output and/or input audio signals. For example, audio component 210 includes a Microphone (MIC) configured to receive external audio signals when apparatus 200 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 204 or transmitted via the communication component 216. In some embodiments, audio component 210 also includes a speaker for outputting audio signals.

The I/O interface 212 provides an interface between the processing component 202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 214 includes one or more sensors for providing various aspects of status assessment for the device 200. For example, the sensor assembly 214 may detect an open/closed state of the device 200, the relative positioning of components, such as a display and keypad of the device 200, the sensor assembly 214 may also detect a change in the position of the device 200 or a component of the device 200, the presence or absence of user contact with the device 200, the orientation or acceleration/deceleration of the device 200, and a change in the temperature of the device 200. The sensor assembly 214 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 216 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The device 200 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 216 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 216 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as memory 204 comprising instructions, executable by processor 220 of apparatus 200 to perform the above-described method is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 14 is a block diagram illustrating an electronic device 300 according to an example embodiment. For example, the apparatus 300 may be provided as a server. Referring to FIG. 14, apparatus 300 includes a processing component 322 that further includes one or more processors and memory resources, represented by memory 342, for storing instructions, such as application programs, that are executable by processing component 322. The application programs stored in memory 342 may include one or more modules that each correspond to a set of instructions. Further, the processing component 322 is configured to execute instructions to perform the above-described methods.

The apparatus 300 may also include a power component 326 configured to perform power management of the apparatus 300, a wired or wireless network interface 350 configured to connect the apparatus 300 to a network, and an input/output (I/O) interface 358. The apparatus 300 may operate based on an operating system stored in the memory 342, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of identifying a state of motion of an object in successive frames, the method comprising:

acquiring the labeling results of multiple objects in continuous frames, wherein the labeling results are object types and areas;

determining a plurality of reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground;

classifying the reference object according to the identification accuracy and the movable degree;

dividing the static weight of the reference object according to the classification result;

acquiring the moving distances of the object to be recognized relative to the plurality of reference objects;

and determining the motion state of the object to be recognized in the current frame according to the moving distance.

2. The method of claim 1, wherein the continuous frames are point cloud data acquired using 3D radar.

3. The method according to claim 1, wherein the determining a plurality of references of the current frame according to the labeling result comprises:

according to the labeling result, numbering the multiple objects in the continuous frames, wherein the same objects in different frames correspond to the same and unique numbers;

dividing a static object and a non-static object according to the class of the object, wherein the static object is an object which is static relative to the ground;

and taking the same static object in the current frame and the previous frame as a reference object of the current frame.

4. The method for identifying the motion state of the object in the continuous frames according to claim 1, wherein the obtaining the moving distance of the object to be identified relative to the plurality of reference objects comprises:

if two adjacent frames contain m identical reference objects, m is more than or equal to 1,

determining a static coordinate system based on the m reference objects to obtain coordinate values of the object to be identified in the adjacent frames in the static coordinate system;

and obtaining the moving distance of the object to be recognized according to the coordinate values.

5. The method for identifying the motion state of the object in the continuous frames according to claim 1, wherein the obtaining the moving distance of the object to be identified relative to the plurality of reference objects comprises:

determining the coordinate value of the object to be recognized in the adjacent frame by taking the ith reference object as the origin of coordinates (i is 1, …, m);

and obtaining the moving distance of the object to be recognized relative to the ith reference object according to the coordinate values.

6. The method for identifying the motion state of the object in the continuous frames according to claim 5, wherein the obtaining the moving distance of the object to be identified relative to the plurality of reference objects comprises:

if two adjacent frames do not contain the same reference object,

and determining the motion state of the object to be recognized according to the labeling result of the object to be recognized.

7. The method of claim 1, wherein the determining the motion state of the object to be identified in the current frame according to the moving distance comprises:

and judging the motion state of the object to be recognized according to the moving distance of the object to be recognized relative to the m reference objects and the static weight values of the m reference objects.

8. The method according to claim 7, wherein the determining the motion state of the object to be recognized according to the moving distances of the object to be recognized relative to the m reference objects and the static weights of the m reference objects comprises:

obtaining a weighted motion value D of the object to be identified:

where m is the number of identical references in two adjacent frames, d_iRepresenting the movement value of the object to be recognized relative to the ith reference object, and if the movement distance of the object to be recognized relative to the ith reference object is smaller than a preset distance threshold value, d_iIf the moving distance of the object to be identified relative to the ith reference object is greater than a preset distance threshold value, d is equal to 0_i＝1，ω_iRepresenting the static weight of the ith reference object;

if the weighted motion value is larger than a set motion threshold value, marking the object to be identified as a moving object;

and if the weighted motion value is less than or equal to a set motion threshold value, marking the object to be identified as a static object.

9. The method of identifying the motion state of an object in successive frames according to claim 1, further comprising,

and if the judgment results of the motion states of the object to be recognized are different, marking the object to be recognized in the previous frame as a moving object.

10. An apparatus for identifying a motion state of an object in successive frames, the apparatus comprising:

the labeling unit is used for acquiring labeling results of multiple objects in continuous frames, wherein the labeling results are object types and areas;

the reference object determining unit is used for determining a plurality of reference objects of the current frame according to the labeling result, wherein the reference objects are static objects relative to the ground;

the classification unit is used for classifying the reference object according to the identification accuracy and the movable degree and dividing the static weight of the reference object according to the classification result;

the distance measurement unit is used for acquiring the moving distances of the object to be identified relative to the plurality of reference objects;

and the state identification unit is used for determining the motion state of the object to be identified in the current frame according to the moving distance.

11. The apparatus of claim 10, wherein the continuous frames are continuous frames of point cloud data acquired using a 3D radar.

12. The apparatus of claim 11, wherein the reference object determining unit comprises:

13. The apparatus for identifying motion state of object in consecutive frames according to claim 10, wherein said obtaining the moving distance of the object to be identified relative to the plurality of reference objects comprises:

when m identical reference objects are contained in two adjacent frames, m is more than or equal to 1,

14. The apparatus of claim 11, wherein the ranging unit comprises:

when two adjacent frames contain m identical reference objects, m is more than or equal to 1,

determining coordinate values of the object to be recognized in the adjacent frames by taking the ith reference object as a coordinate origin (i is 1, …, m);

15. The apparatus for identifying the motion status of an object in consecutive frames as recited in claim 14, wherein said ranging unit further comprises:

when two adjacent frames do not contain the same reference,

16. The apparatus for recognizing a motion state of an object in consecutive frames according to claim 10, wherein said state recognition unit comprises:

17. The apparatus for identifying a motion state of an object in consecutive frames according to claim 16, wherein said state identifying unit further comprises:

obtaining a weighted motion value D of the object to be identified:

18. The apparatus for identifying the motion state of an object in successive frames according to claim 10, further comprising,

and the correction unit is used for marking the object to be identified as a moving object in the previous frame when the judgment results of the motion states of the object to be identified continuously twice are different.

19. An electronic device, comprising:

a memory to store instructions; and

a processor for invoking the memory stored instructions to perform the method of identifying a motion state of an object in successive frames according to any of claims 1 to 9.

20. A computer-readable storage medium having stored thereon instructions which, when executed by a processor, perform a method of identifying a state of motion of an object in successive frames according to any one of claims 1 to 9.