CN112132864A - Robot following method based on vision and following robot - Google Patents

Robot following method based on vision and following robot Download PDF

Info

Publication number
CN112132864A
CN112132864A CN202010993247.3A CN202010993247A CN112132864A CN 112132864 A CN112132864 A CN 112132864A CN 202010993247 A CN202010993247 A CN 202010993247A CN 112132864 A CN112132864 A CN 112132864A
Authority
CN
China
Prior art keywords
input image
following
image
target
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010993247.3A
Other languages
Chinese (zh)
Other versions
CN112132864B (en
Inventor
于峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Aoyou Intelligent Technology Co ltd
Original Assignee
Dalian Aoyou Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Aoyou Intelligent Technology Co ltd filed Critical Dalian Aoyou Intelligent Technology Co ltd
Priority to CN202010993247.3A priority Critical patent/CN112132864B/en
Publication of CN112132864A publication Critical patent/CN112132864A/en
Application granted granted Critical
Publication of CN112132864B publication Critical patent/CN112132864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a robot following method based on vision and a following robot, and relates to the technical field of computer vision. A vision-based robot following method comprising the steps of: receiving target object information; the method comprises the steps that a vision camera acquires a visual field input image, the visual field input image is preprocessed to generate a detection input image, and the detection input image is input into a pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image; acquiring a pedestrian detection result, and determining a following target in the pedestrian according to the target object information; and carrying out robot following on the following target. The invention reduces the requirement of target tracking on the calculation power of equipment and reduces the power consumption on the basis of ensuring the real-time accuracy of far and near target tracking.

Description

Robot following method based on vision and following robot
Technical Field
The invention relates to the technical field of computer vision.
Background
The intelligent mobile robot for following the moving target is widely applied to the fields of home service, old and disabled assisting, scene monitoring, intelligent vehicles and the like, and has wide application prospects. The following of a target object by a mobile robot relates to the fields of computer vision, motion control, pattern recognition and the like. For the robot vision, the aim is to simulate a human vision mechanism, calculate the importance degree of information in a visual scene, and extract interesting salient features or target object features in an image.
The following process of a vision-based following robot typically includes image acquisition, target detection, and target tracking. With the rapid development of artificial intelligence and deep learning techniques, a target detection method based on a Convolutional Neural Network (CNN) algorithm is widely applied. Compared with the traditional machine vision method, the convolutional neural network learns useful characteristics from a large amount of data under the training of big data, has the advantages of high speed, high precision, low cost and the like, and the convolutional neural network algorithm is also applied to pedestrian detection following the robot, for example, Chinese patent application CN2020101552071 discloses a pedestrian autonomous following method based on the vision for the quadruped robot, and a pedestrian detection model of the method is based on the convolutional neural network algorithm.
At present, when people use a following robot to visually track a target, the robot is generally expected to have better tracking capability not only for a near target, but also for a far target, namely better tracking capability for both the far target and the near target. However, although the convolutional neural network algorithm improves the real-time accuracy of tracking, the convolutional neural network based target detection algorithm usually includes a large number of computation-intensive operations, and thus the requirements for real-time detection power and bandwidth are high. In particular, in order to detect objects at different distances, the current common methods are: carrying out multi-scale scaling on an original image to generate a multi-scale pyramid image group, and then respectively detecting input images with different scales; specifically, when a near object is detected, the detection is performed on the reduced image; when detecting a distant object, the detection is performed on a large-size image with high resolution. Because of the need to design a training neural network for each image scale, higher demands are placed on the computational power and bandwidth of the device. How to reduce the requirement of target tracking on the computing power of equipment on the basis of ensuring the real-time accuracy of far and near target tracking is a technical problem to be solved urgently at present.
Disclosure of Invention
The invention aims to: the defects of the prior art are overcome, and the robot following method and the following robot based on the vision are provided. According to the invention, the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the visual field input image so as to match the size requirement of the pedestrian detection neural network model, the large-resolution image is suitable for detecting a near target, the small-resolution image is suitable for detecting a far target, and the input images with different scales are not required to be detected respectively, so that the requirement of target tracking on the calculation force of equipment is reduced on the basis of ensuring the real-time accuracy of the far and near target tracking, and the power consumption is reduced.
In order to achieve the above object, the present invention provides the following technical solutions:
a vision-based robot following method comprising the steps of:
receiving target object information;
the method comprises the steps that a vision camera acquires a visual field input image, the visual field input image is preprocessed to generate a detection input image, and the detection input image is input into a pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image;
acquiring a pedestrian detection result, and determining a following target in the pedestrian according to the target object information;
and carrying out robot following on the following target.
On the other hand, the step of determining the following target in the pedestrian according to the target object information is as follows:
acquiring all pedestrian information of a pedestrian detection result;
selecting the pedestrians matched with the target object information from all the pedestrians, and mapping the pedestrian selection result to the view input image for output and display;
when only one pedestrian is selected, taking the pedestrian as a following target; when a plurality of selected pedestrians are selected, the selected pedestrians in the visual field input image are identified through the candidate frame, selection information of the user on the candidate frame is collected, and the pedestrians in the candidate frame selected by the user are used as the following targets.
On the other hand, the direction of collecting the selection information of the candidate box by the user is one of the following modes:
acquiring selection information of a user on a candidate frame through a display screen and an operation button on the robot, outputting a focus area on the display screen, and adjusting the position of the focus area through the operation button to select the candidate frame;
or outputting a candidate frame through a touch display screen on the robot, and acquiring a selection instruction of a user on the candidate frame through the touch display screen;
or sending the view input image containing the candidate frame to a remote terminal where the associated user is located, and acquiring a selection instruction of the associated user on the remote terminal for the candidate frame.
On the other hand, the target object information comprises face feature information and first following distance information of a target object, the face features are used as recognition features to construct a visual tracker, and the first following distance is kept between the visual tracker and the following target in the target following process.
On the other hand, in the following process, an image of the following target is acquired, the clothing characteristic information, the dressing characteristic information, the carried article characteristic information and/or the gait characteristic information of the following target are identified as target additional information, the target additional information is sent to the visual tracker to update the following target information, and the tracking direction and the tracking distance are adjusted.
On the other hand, in the following process, the following target is kept to be positioned in the central area of the visual field;
when the following target deviates, the deviation amount is compensated by controlling the robot to rotate, or the deviation amount is compensated by controlling the vision camera installed on the robot to rotate.
On the other hand, the resolutions of a plurality of spliced images forming the detection input image are different;
or, in a plurality of spliced images forming the detection input image, the resolutions of partial spliced images are the same.
In another aspect, the preprocessing the view input image to generate the detection input image includes:
taking a view input image as an original resolution image, and compressing the original resolution image according to two compression ratios to obtain two global mapping images with different resolutions; the size of the global mapping image with small resolution is smaller than the required size of the detection input image, and the size of the global mapping image with large resolution is larger than the required size of the detection input image;
selecting a global map with low resolution as a first splicing map of the detection input image, and subtracting the size of the first splicing map by the size of the detection input image to obtain the size of the residual region;
and setting one or more intercepting frames according to the size of the residual area, acquiring a high-resolution edge local image in the edge area of the global map with high resolution through the intercepting frames, filling the edge local image into the residual area, and splicing to form the detection input image.
In another aspect, the preprocessing the view input image to generate the detection input image includes:
taking a view input image as an original resolution image, and when the size of the original resolution image is judged to be larger than the required size of the detection input image, compressing the original resolution image according to a compression ratio to obtain a small-resolution global mapping image, wherein the size of the small-resolution global mapping image is smaller than the required size of the detection input image;
selecting a global mapping chart with small resolution as a first splicing chart of a detection input image, and subtracting the size of the first splicing chart from the size of the detection input image to obtain the size of a residual region;
one or more intercepting frames are set according to the size of the residual area, the edge local image with high resolution is obtained in the edge area of the original resolution image through the intercepting frames, and the edge local image is filled in the residual area for splicing to form the detection input image.
The invention also provides a visual following robot, which comprises the following structure:
a vision camera for taking an image as a visual field input image;
a processor comprising a pedestrian detection module and a target following module;
the pedestrian detection module is used for preprocessing the view input image to generate a detection input image, and inputting the detection input image into the pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image;
and the target following module is used for acquiring a pedestrian detection result, determining a following target in the pedestrian according to the target object information and performing robot following on the following target.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the detection input images are formed by splicing a plurality of images which have different resolutions and are related to the visual field input images so as to match the size requirement of a pedestrian detection neural network model, the large-resolution images are suitable for detecting near targets, the small-resolution images are suitable for detecting far targets, and the input images with different scales are not required to be detected respectively.
Drawings
Fig. 1 is a flowchart of a vision-based robot following method according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating several shapes of a capture box according to an embodiment of the present invention, wherein 2a illustrates a rectangular box, 2b illustrates an L-shaped box, and 2c illustrates an L-shaped box
Figure BDA0002691524580000041
Type
2d example □ type.
Fig. 3 is an information transmission diagram of a pedestrian detection process according to an embodiment of the present invention.
Detailed Description
The following describes the following robot and the following robot based on vision disclosed in the present invention with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings. The drawings are only for purposes of illustration and description and are not intended to limit the scope of the invention, which is to be determined by the claims, the appended drawings, and all changes that fall within the metes and bounds of the claims, or equivalences of such metes and bounds are therefore intended to be embraced by the claims.
Examples
Referring to fig. 1, a vision-based robot following method provided by the present invention includes the steps of:
step 1, receiving target object information.
Step 2, the vision camera acquires a visual field input image, the visual field input image is preprocessed to generate a detection input image, and the detection input image is input into a pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image.
And 3, acquiring a pedestrian detection result, and determining a following target in the pedestrian according to the target object information.
And 4, performing robot following on the following target.
In this embodiment, preferably, the target object information includes face feature information and first following distance information of the target object, the face feature is used as an identification feature to construct a visual tracker, and the first following distance is kept with the following target in the target following process. At this time, when the robot tracks the target, the robot may first perform the confirmation of the following target based on the side feature of the human face and/or the facial features of the five sense organs, and after confirming the following target, acquire other features of the target, such as gait features, clothing features, and the like, so as to facilitate the following behind.
Specifically, in the following process, after the target is confirmed through the face features, an image of the following target can be obtained, clothing feature information, dressing feature information, carried article feature information and/or gait feature information of the following target are identified as target additional information, the target additional information is sent to the visual tracker to update the following target information, and the tracking direction and the tracking distance are adjusted. Preferably, the tracking distance is adjusted to a second tracking distance, which is greater than the first tracking distance.
During the following, it is preferable to keep the following target located in the central region of the field of view. When the following target deviates, the deviation amount is compensated by controlling the robot to rotate, or the deviation amount is compensated by controlling the vision camera installed on the robot to rotate.
By adopting the technical scheme, the robot can determine the following target based on the human face features which have the significant features (facial features) and are easy to confirm the target identity, and then carry out rear following through other features of the following target, so that the following behavior can be hidden conveniently.
In this embodiment, preferably, the step of determining the following target in the pedestrian according to the target object information in step 3 specifically includes:
and step 31, acquiring all pedestrian information of the pedestrian detection result.
And 32, selecting the pedestrians matched with the target object information from all the pedestrians, and mapping the pedestrian selection result to the view input image for output and display.
When only one pedestrian is selected, the pedestrian is taken as a following target in step 331.
And step 332, when a plurality of selected pedestrians are selected, identifying the selected pedestrians in the visual field input image through the candidate frame, collecting selection information of the user on the candidate frame, and taking the pedestrians in the candidate frame selected by the user as a following target.
In this embodiment, preferably, the orientation for collecting the selection information of the candidate box by the user is one of the following manners.
The first method is as follows: the selection information of the user on the candidate frame is collected through a display screen and an operation button on the robot, a focus area is output on the display screen, and the position of the focus area is adjusted through the operation button to select the candidate frame.
For example, without limitation, 2 candidate frames are output on a display screen on the robot, and 5 operation buttons are provided on one side of the display screen, namely an up button, a down button, a left button, a right button, and a determination button located in the middle. The focus area is located by default on the candidate box closest to the center of the field of view, and the user can adjust the position of the focus area by pressing the up, down, left and right buttons. When the focus area is located on the candidate frame where the tracking target is located, the user may click the candidate frame through the aforementioned determination button as a selection operation. Or, the staying time of the focus area on the candidate frame is collected, and if the focus area is not moved by the user within the preset time range, the corresponding candidate frame can be determined as the candidate frame selected by the user.
Or outputting the candidate frame through a touch display screen on the robot, and acquiring a selection instruction of the user on the candidate frame through the touch display screen.
Or sending the view input image containing the candidate frame to a remote terminal where the associated user is located, and acquiring a selection instruction of the associated user on the remote terminal for the candidate frame.
The remote terminal preferably adopts a mobile phone, a tablet computer and a wearable intelligent terminal, such as intelligent glasses and an intelligent watch. Therefore, the remote user can assist the robot in target tracking.
Preferably, the robot may start a video recording function during the following process, store video data of the video recording in an associated memory or a cloud server, and periodically send the video data to the terminal of the user. Further, the user can also send a real-time reference instruction to the robot through the terminal, and the robot sends the current real-time video data or screenshot to the terminal of the user according to the real-time reference instruction.
In this embodiment, the resolutions of the plurality of stitched images composing the detection input image may all be different. By way of example and not limitation, for example, the input image is detected to include 3 stitched images, and the resolutions of the 3 stitched images are different.
Or, in a plurality of spliced images forming the detection input image, the resolutions of partial spliced images are the same. By way of example and not limitation, detecting that the input image includes 3 stitched images, wherein the resolution of 2 stitched images is taken from images of the same resolution, and the resolutions of the two are the same. The resolution of the other 1 stitched image is different from the two.
In a preferred embodiment, the step of preprocessing the view input image to generate the detection input image may be as follows:
taking a view input image as an original resolution image, and compressing the original resolution image according to two compression ratios to obtain two global mapping images with different resolutions; the size of the global map image with small resolution is smaller than the required size of the detection input image, and the global map image with large resolution is larger than the required size of the detection input image.
And selecting the global map with low resolution as a first splicing map of the detection input image, and subtracting the size of the first splicing map by the size of the detection input image to obtain the size of the residual area.
And setting one or more intercepting frames according to the size of the residual area, acquiring a high-resolution edge local image in the edge area of the global map with high resolution through the intercepting frames, filling the edge local image into the residual area, and splicing to form the detection input image.
The intercepting frame is fixed, for each frame of image, the edge local image with high resolution is only obtained in the fixed edge area of the global map with high resolution, the shape and the size of the intercepting frame are matched with the size of the residual area, and the size of the intercepting frame is larger than the minimum detection size of the pedestrian detection neural network model. Specifically, the shape of the intercepting frame can be set as a rectangular frame, an L-shaped frame, a rectangular,
Figure BDA0002691524580000071
Profiles (openings can be up, down, left or right) and □ profiles, see fig. 2.
Preferably, the intercepting frames are arranged to be rectangular frames, the rectangular frames can be arranged to be multiple according to the shape of the residual region, and the plurality of rectangular intercepting frames can form the shape of the residual region through edge splicing.
The fixed edge region may be a left edge region, a right edge region, an upper edge region and/or a lower edge region, preferably a right edge region and/or an upper edge region. Since the probability that a small object at a far position is located in the edge region of an image is greater than the probability that the small object at a far position is located in the middle region of the image (the center region of the field of view or the region close to the center region of the field of view) extending outward from the center of the field of view when the camera takes an image, that is, the probability that a small object at a far position is detected in the edge region of the image is greater than the probability that a small object at a far position is detected in the middle region of the image (a large object at a near position is more easily detected in the middle region of the image), when a large object at a far position is detected by a global map having a small resolution, the detection rate of.
The detection input image is a fixed input size, and the size of the detection input image input to the pedestrian detection neural network model needs to be consistent with the fixed input size. According to the size of the remaining region of the detection input image, one or more capture boxes can be set to obtain a local image in the edge region of the global map with high resolution (the image in the capture box is the captured local image). By adopting the detection input image with fixed size, the model training and the model design complexity of the pedestrian detection neural network model can be obviously simplified.
By way of example and not limitation, referring to fig. 3, for example, the width and height of the input image of the field of view is 1000 × 1000 pixels, that is, the resolution of the original resolution image is 1000 × 1000 pixels, the input size required for detecting the input image is 540 × 360 pixels, and the original resolution image is compressed at two compression ratios (compression at a high compression ratio) to obtain two global mapping images with different resolutions, that is, 300 × 300 pixels (compression ratio of 0.3) and 600 × 600 pixels (compression ratio of 0.6), where the former size is smaller than the required size for detecting the input image and the latter size is larger than the required size for detecting the input image.
And taking the global map with the resolution of 300-300 pixels as a first splicing map, and then carrying out splicing filling on the residual area by using a local image with large resolution at the edge of the first splicing map according to the size 540-360 pixels of the detected input image. The splice filling rule can be set by system default or user personalization, for example, the splice filling rule can be set as: the local image is stitched and filled based on the right edge of the first stitched image preferentially over the left edge, and the local image is stitched and filled based on the lower edge of the first stitched image preferentially over the upper edge. For example, the width and height of the 2 rectangular capture frames are 240 × 360 pixels and 300 × 60 pixels respectively, the captured image in the 240 × 360 capture frame is spliced to the left edge of the first mosaic, the width requirement of 540 for detecting the input image is met (300+240 × 540), the captured image in the 300 × 60 capture frame is spliced to the lower edge of the first mosaic for filling, the height requirement of 360 for detecting the input image is met (300+60 × 360), and a spliced image meeting the size requirement of detecting the input image is constructed.
It should be noted that, according to the shape of the size of the surplus region and the detection requirement, more rectangular capture frames may be provided as long as the plurality of rectangular capture frames can form the shape of the surplus region by edge stitching. However, when the rectangular capture frame is set, the number of rectangular frames is preferably set based on the rule of "minimizing rectangular frames participating in the splicing".
In another preferred embodiment, the step of preprocessing the view input image to generate the detection input image is as follows:
and taking the view input image as an original resolution image, and when the size of the original resolution image is judged to be larger than the required size of the detection input image, compressing the original resolution image according to a compression ratio to obtain a small-resolution global mapping image, wherein the size of the small-resolution global mapping image is smaller than the required size of the detection input image.
And selecting the global map with small resolution as a first splicing map of the detection input image, and subtracting the size of the first splicing map by the size of the detection input image to obtain the size of the residual area.
One or more intercepting frames are set according to the size of the residual area, the edge local image with high resolution is obtained in the edge area of the original resolution image through the intercepting frames, and the edge local image is filled in the residual area for splicing to form the detection input image.
In another implementation manner of this embodiment, the fixed intercepting frame may be a sliding frame that can move regularly. Specifically, the sliding window may move to different positions on the designated image according to a preset movement rule, for example, the full map may be scanned at a constant speed from the top left corner of the designated image in the order from left to right and from top to bottom, or the full map may be scanned according to the order set by the user, or the full map may be scanned according to a random movement rule. In this way, complete detection of large-resolution images can be achieved.
At this time, the step of preprocessing the view input image to generate the detection input image may be as follows:
taking a view input image as an original resolution image, and compressing the original resolution image according to two compression ratios to obtain two global mapping images with different resolutions; the size of the global map image with small resolution is smaller than the required size of the detection input image, and the global map image with large resolution is larger than the required size of the detection input image.
And selecting the global map with low resolution as a first splicing map of the detection input image, and subtracting the size of the first splicing map by the size of the detection input image to obtain the size of the residual area.
And setting one or more sliding frames according to the size of the residual area, wherein the sliding frames can be moved to different positions on the global map with high resolution according to a preset moving rule and frames, acquiring a local image of the global map on the global map with high resolution through the intercepting frame, and filling the local image into the residual area for splicing to form the detection input image.
The invention also provides a visual following robot based on the detection of the far and near targets.
The vision following robot comprises the following structures:
and a vision camera for capturing an image as a visual field input image.
And the processor comprises a pedestrian detection module and a target following module.
The pedestrian detection module is used for preprocessing the view input image to generate a detection input image, and inputting the detection input image into the pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image.
And the target following module is used for acquiring a pedestrian detection result, determining a following target in the pedestrian according to the target object information and performing robot following on the following target.
The target following module is configured to: acquiring all pedestrian information of a pedestrian detection result; selecting the pedestrians matched with the target object information from all the pedestrians, and mapping the pedestrian selection result to the view input image for output and display; when only one pedestrian is selected, taking the pedestrian as a following target; when a plurality of selected pedestrians are selected, the selected pedestrians in the visual field input image are identified through the candidate frame, selection information of the user on the candidate frame is collected, and the pedestrians in the candidate frame selected by the user are used as the following targets.
The processor also comprises an initialization setting module which is used for collecting target object information set by a user. Preferably, the target object information includes face feature information and first following distance information of the target object.
At this time, the target following module is configured to: constructing a visual tracker by taking the human face features as recognition features, and keeping the first following distance with a following target in the process of following the target; and acquiring an image of the following target in the following process, identifying clothing characteristic information, dressing characteristic information, carried article characteristic information and/or gait characteristic information of the following target as target additional information, sending the target additional information to a visual tracker for updating the following target information, and adjusting the tracking direction and the tracking distance.
The target following module also keeps the following target positioned in the central area of the visual field in the following process; when the following target deviates, the deviation amount is compensated by controlling the robot to rotate, or the deviation amount is compensated by controlling the vision camera installed on the robot to rotate.
Other technical features are described in the foregoing embodiments, and the processor or the module thereof may be configured to perform the information transmission and information processing functions described in the foregoing embodiments, which are not described in detail herein.
In the description above, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. While exemplary aspects of the present disclosure have been described for illustrative purposes, those skilled in the art will appreciate that the foregoing description is by way of description of the preferred embodiments of the present disclosure only, and is not intended to limit the scope of the present disclosure in any way, which includes additional implementations in which functions may be performed out of the order of presentation or discussion. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

Claims (10)

1. A vision-based robot following method, characterized by comprising the steps of:
receiving target object information;
the method comprises the steps that a vision camera acquires a visual field input image, the visual field input image is preprocessed to generate a detection input image, and the detection input image is input into a pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image;
acquiring a pedestrian detection result, and determining a following target in the pedestrian according to the target object information;
and carrying out robot following on the following target.
2. The robot following method according to claim 1, wherein: the step of determining the following target in the pedestrian based on the aforementioned target object information is,
acquiring all pedestrian information of a pedestrian detection result;
selecting the pedestrians matched with the target object information from all the pedestrians, and mapping the pedestrian selection result to the view input image for output and display;
when only one pedestrian is selected, taking the pedestrian as a following target; when a plurality of selected pedestrians are selected, the selected pedestrians in the visual field input image are identified through the candidate frame, selection information of the user on the candidate frame is collected, and the pedestrians in the candidate frame selected by the user are used as the following targets.
3. The robot following method according to claim 2, wherein: the orientation of the selection information of the candidate box acquired by the user is one of the following ways:
acquiring selection information of a user on a candidate frame through a display screen and an operation button on the robot, outputting a focus area on the display screen, and adjusting the position of the focus area through the operation button to select the candidate frame;
or outputting a candidate frame through a touch display screen on the robot, and acquiring a selection instruction of a user on the candidate frame through the touch display screen;
or sending the view input image containing the candidate frame to a remote terminal where the associated user is located, and acquiring a selection instruction of the associated user on the remote terminal for the candidate frame.
4. The robot following method according to claim 1, wherein: the target object information comprises face feature information and first following distance information of a target object, the face features are used as recognition features to construct a visual tracker, and the first following distance is kept between the visual tracker and the following target in the target following process.
5. The robot following method according to claim 4, wherein: in the following process, an image of the following target is obtained, clothing characteristic information, dressing characteristic information, carried article characteristic information and/or gait characteristic information of the following target are identified as target additional information, the target additional information is sent to a visual tracker to update the following target information, and the tracking direction and the tracking distance are adjusted.
6. The robot following method according to claim 5, wherein: in the following process, keeping the following target in the central area of the visual field;
when the following target deviates, the deviation amount is compensated by controlling the robot to rotate, or the deviation amount is compensated by controlling the vision camera installed on the robot to rotate.
7. The robot following method according to claim 1, wherein: the resolution ratios of a plurality of spliced images forming the detection input image are different;
or, in a plurality of spliced images forming the detection input image, the resolutions of partial spliced images are the same.
8. The robot following method according to claim 1, wherein: the preprocessing the view input image to generate the detection input image includes:
taking a view input image as an original resolution image, and compressing the original resolution image according to two compression ratios to obtain two global mapping images with different resolutions; the size of the global mapping image with small resolution is smaller than the required size of the detection input image, and the size of the global mapping image with large resolution is larger than the required size of the detection input image;
selecting a global map with low resolution as a first splicing map of the detection input image, and subtracting the size of the first splicing map by the size of the detection input image to obtain the size of the residual region;
and setting one or more intercepting frames according to the size of the residual area, acquiring a high-resolution edge local image in the edge area of the global map with high resolution through the intercepting frames, filling the edge local image into the residual area, and splicing to form the detection input image.
9. The robot following method according to claim 1, wherein: the preprocessing the view input image to generate the detection input image includes:
taking a view input image as an original resolution image, and when the size of the original resolution image is judged to be larger than the required size of the detection input image, compressing the original resolution image according to a compression ratio to obtain a small-resolution global mapping image, wherein the size of the small-resolution global mapping image is smaller than the required size of the detection input image;
selecting a global mapping chart with small resolution as a first splicing chart of a detection input image, and subtracting the size of the first splicing chart from the size of the detection input image to obtain the size of a residual region;
one or more intercepting frames are set according to the size of the residual area, the edge local image with high resolution is obtained in the edge area of the original resolution image through the intercepting frames, and the edge local image is filled in the residual area for splicing to form the detection input image.
10. A vision following robot, characterized by comprising:
a vision camera for taking an image as a visual field input image;
a processor comprising a pedestrian detection module and a target following module;
the pedestrian detection module is used for preprocessing the view input image to generate a detection input image, and inputting the detection input image into the pedestrian detection neural network model for detection; the detection input image is formed by splicing a plurality of images which have different resolutions and are related to the view field input image;
and the target following module is used for acquiring a pedestrian detection result, determining a following target in the pedestrian according to the target object information and performing robot following on the following target.
CN202010993247.3A 2020-09-21 2020-09-21 Vision-based robot following method and following robot Active CN112132864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010993247.3A CN112132864B (en) 2020-09-21 2020-09-21 Vision-based robot following method and following robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010993247.3A CN112132864B (en) 2020-09-21 2020-09-21 Vision-based robot following method and following robot

Publications (2)

Publication Number Publication Date
CN112132864A true CN112132864A (en) 2020-12-25
CN112132864B CN112132864B (en) 2024-04-09

Family

ID=73841689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010993247.3A Active CN112132864B (en) 2020-09-21 2020-09-21 Vision-based robot following method and following robot

Country Status (1)

Country Link
CN (1) CN112132864B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470068A (en) * 2021-06-07 2021-10-01 北京深睿博联科技有限责任公司 Following navigation method and system in complex scene

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168648A (en) * 2014-01-20 2014-11-26 中国人民解放军海军航空工程学院 Sensor network multi-target distributed consistency tracking device
CN105894538A (en) * 2016-04-01 2016-08-24 海信集团有限公司 Target tracking method and target tracking device
CN108673501A (en) * 2018-05-17 2018-10-19 中国科学院深圳先进技术研究院 A kind of the target follower method and device of robot
CN109727271A (en) * 2017-10-27 2019-05-07 三星电子株式会社 Method and apparatus for tracking object
WO2019238113A1 (en) * 2018-06-15 2019-12-19 清华-伯克利深圳学院筹备办公室 Imaging method and apparatus, and terminal and storage medium
CN111127458A (en) * 2019-12-27 2020-05-08 深圳力维智联技术有限公司 Target detection method and device based on image pyramid and storage medium
CN111126256A (en) * 2019-12-23 2020-05-08 武汉大学 Hyperspectral image classification method based on self-adaptive space-spectrum multi-scale network
CN111127401A (en) * 2019-11-29 2020-05-08 西安工程大学 Robot stereoscopic vision mechanical part detection method based on deep learning
CN111308993A (en) * 2020-02-13 2020-06-19 青岛联合创智科技有限公司 Human body target following method based on monocular vision
CN111368755A (en) * 2020-03-09 2020-07-03 山东大学 Vision-based pedestrian autonomous following method for quadruped robot

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168648A (en) * 2014-01-20 2014-11-26 中国人民解放军海军航空工程学院 Sensor network multi-target distributed consistency tracking device
CN105894538A (en) * 2016-04-01 2016-08-24 海信集团有限公司 Target tracking method and target tracking device
CN109727271A (en) * 2017-10-27 2019-05-07 三星电子株式会社 Method and apparatus for tracking object
CN108673501A (en) * 2018-05-17 2018-10-19 中国科学院深圳先进技术研究院 A kind of the target follower method and device of robot
WO2019238113A1 (en) * 2018-06-15 2019-12-19 清华-伯克利深圳学院筹备办公室 Imaging method and apparatus, and terminal and storage medium
CN111127401A (en) * 2019-11-29 2020-05-08 西安工程大学 Robot stereoscopic vision mechanical part detection method based on deep learning
CN111126256A (en) * 2019-12-23 2020-05-08 武汉大学 Hyperspectral image classification method based on self-adaptive space-spectrum multi-scale network
CN111127458A (en) * 2019-12-27 2020-05-08 深圳力维智联技术有限公司 Target detection method and device based on image pyramid and storage medium
CN111308993A (en) * 2020-02-13 2020-06-19 青岛联合创智科技有限公司 Human body target following method based on monocular vision
CN111368755A (en) * 2020-03-09 2020-07-03 山东大学 Vision-based pedestrian autonomous following method for quadruped robot

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JISOO JEONG等: "Enhancement of SSD by concatenating feature maps for object detection", 《ARXIV: 1705.09587》, 26 May 2017 (2017-05-26) *
杨源飞: "视频监控***中相关图像处理技术的研究与实现", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 2012, 15 January 2012 (2012-01-15), pages 138 - 583 *
许倩倩: "复杂背景下对地多运动目标检测", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 2019, 15 February 2019 (2019-02-15), pages 138 - 2207 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470068A (en) * 2021-06-07 2021-10-01 北京深睿博联科技有限责任公司 Following navigation method and system in complex scene

Also Published As

Publication number Publication date
CN112132864B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US10827133B2 (en) Communication terminal, image management apparatus, image processing system, method for controlling display, and computer program product
CN112207821B (en) Target searching method of visual robot and robot
US10489912B1 (en) Automated rectification of stereo cameras
JP6560480B2 (en) Image processing system, image processing method, and program
WO2021139484A1 (en) Target tracking method and apparatus, electronic device, and storage medium
US8855369B2 (en) Self learning face recognition using depth based tracking for database generation and update
CN101406390B (en) Method and apparatus for detecting part of human body and human, and method and apparatus for detecting objects
EP3499414B1 (en) Lightweight 3d vision camera with intelligent segmentation engine for machine vision and auto identification
US11159717B2 (en) Systems and methods for real time screen display coordinate and shape detection
KR101916093B1 (en) Method for tracking object
CN112132864A (en) Robot following method based on vision and following robot
Zhou et al. Information-efficient 3-D visual SLAM for unstructured domains
CN113065506A (en) Human body posture recognition method and system
CN107538485B (en) Robot guiding method and system
CN112288876A (en) Long-distance AR identification server and system
US20230224576A1 (en) System for generating a three-dimensional scene of a physical environment
CN116659518A (en) Autonomous navigation method, device, terminal and medium for intelligent wheelchair
KR101996907B1 (en) Apparatus for tracking object
Tsuji et al. Memorizing and representing route scenes
CN112052827B (en) Screen hiding method based on artificial intelligence technology
US11941171B1 (en) Eye gaze tracking method, apparatus and system
CN116524217B (en) Human body posture image matching method and device, electronic equipment and storage medium
Tanaka et al. Dynamically visual learning for people identification with sparsely distributed cameras
CN117152807A (en) Human head positioning method, device and storage medium
CN118037763A (en) Human body action posture tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant