CN112560592A

CN112560592A - Image processing method and device, and terminal control method and device

Info

Publication number: CN112560592A
Application number: CN202011377063.0A
Authority: CN
Inventors: 黄耿石; 滕家宁; 邵婧
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-26
Also published as: WO2022111044A1

Abstract

The present disclosure provides an image processing method and apparatus, and a terminal control method and apparatus, wherein the image processing method includes: acquiring an image to be detected obtained by shooting a target object through binocular cameras, wherein the image to be detected comprises images acquired through each camera in the binocular cameras respectively; carrying out target object detection on an image to be detected to obtain an object detection frame of a target object in the image to be detected; carrying out external expansion processing on the object detection frames, and carrying out translation processing on at least one object detection frame subjected to external expansion processing to obtain a processed object detection frame; and determining the identification result of the target object based on the processed object detection frame. The embodiment of the disclosure has strong universality, and can realize the purpose that different modules share one set of pseudo base line to simulate the parallax of human eyes, thereby improving the generalization capability of the modules and reducing the time cost of subsequent application such as target object identification.

Description

Image processing method and device, and terminal control method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, and a terminal control method and apparatus.

Background

With the continuous development of computer vision technology and the wide application of binocular cameras, the image processing technology based on the binocular cameras is widely applied to the fields of living body detection, intelligent transportation and the like. Taking the living body detection application as an example, the living body detection can be performed based on a set of images collected by the binocular cameras, for example, the living body detection can be performed on a set of images collected by each module (corresponding to one binocular camera) by using a living body detection model. In general, different live body detection models are trained for different modules, which results in a large amount of resources consumed for model training.

Disclosure of Invention

The embodiment of the disclosure at least provides an image processing method and device and a terminal control method and device.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring an image to be detected obtained by shooting a target object through binocular cameras, wherein the image to be detected comprises images acquired through each camera in the binocular cameras respectively;

carrying out target object detection on the image to be detected to obtain an object detection frame of the target object in the image to be detected;

carrying out external expansion processing on the object detection frames, and carrying out translation processing on at least one object detection frame subjected to external expansion processing to obtain a processed object detection frame;

and determining the identification result of the target object based on the processed object detection frame sum.

By adopting the image processing method, under the condition that two images to be detected acquired by the binocular camera are acquired, firstly, the target object of the image to be detected can be detected to obtain the object detection frame of the target object in each image to be detected. On the premise that the two object detection frames are subjected to the extension processing, one or two object detection frames of the two object detection frames after the extension processing can be subjected to translation processing to determine the identification result of the target object based on the processed object detection frames.

The image processing method can focus on the target object based on target object detection, can preliminarily reduce the influence of a base line, and also considers that in the process of identifying the target object aiming at two images to be detected collected by the binocular cameras, the depth information of the relevant target object needs to be determined by referring to the parallax formed by one module (corresponding to one binocular camera), so that the disclosed embodiment further achieves the effect of simulating the parallax of human eyes by performing the matched processing operation of external expansion and translation on the target detection frame, and obtains the processed target detection frame for identifying the target object. The embodiment of the disclosure has strong universality, and can realize the purpose that different modules share one set of pseudo base line to simulate the parallax of human eyes, thereby improving the generalization capability of the modules and reducing the time cost of subsequent application such as target object identification.

In one possible embodiment, the binocular camera comprises a first camera and a second camera;

under the condition that translation processing is performed on one object detection frame after the external expansion processing, the step of performing translation processing on at least one object detection frame after the external expansion processing to obtain a processed object detection frame includes:

selecting a frame to be translated from the object detection frames subjected to external expansion processing, wherein the frame to be translated is positioned in the image acquired by the first camera;

and translating the detection frame to be translated along the direction deviating from the second camera to obtain the processed object detection frame.

Here, the translation direction of the frame to be translated may be determined based on a relative position relationship between two cameras included in the binocular camera in a case where the frame to be translated is selected, where, in a case where the frame to be translated corresponds to the first camera, the frame to be translated may be translated in a direction deviating from the second camera, and the frame to be translated after being moved based on this translation direction may be in accordance with a parallax requirement of the pseudo baseline.

In a possible implementation manner, the translating the detection frame to be translated in a direction away from the second camera to obtain a processed object detection frame includes:

determining a translation distance based on the size information of the object detection frame after the external expansion processing;

and moving the frame to be translated by the translation distance according to the direction departing from the second camera to obtain the object detection frame after translation processing.

In a possible embodiment, the determining a translation distance based on the size information of the object detection frame after the external expansion processing includes:

and determining the translation distance based on the width value in the size information of the object detection frame after the external expansion processing and a preset translation coefficient.

In consideration of the fact that the parallax determined by the binocular camera is related to the distance between the target object and the binocular camera, and also the fact that the imaging size is also related to the distance, in the embodiment of the present disclosure, the translation distance used for simulating the parallax can be determined based on the size information of the imaged object detection frame, and then the recognition effect of the binocular parallax is achieved based on the pseudo baseline constructed by the translation distance.

In a possible embodiment, in a case that translation processing is performed on two object detection frames after the extension processing, the performing translation processing on at least one object detection frame after the extension processing to obtain a processed object detection frame includes:

and respectively translating each object detection frame in the two object detection frames after the external expansion processing towards a direction away from the other object detection frame to obtain the processed object detection frames.

In a possible implementation manner, the performing the outward expansion processing on the object detection frame includes:

determining the position coordinates of the corner points of the object detection frame in the image to be detected;

and carrying out external expansion processing on the object detection frame based on the determined angular point position coordinates and a preset external expansion ratio to obtain the object detection frame after the external expansion processing.

Here, considering the influence of the target object selected by the object detection frame on the recognition result, other image areas (for example, background) except the target object may also have a certain influence on the recognition result, and particularly for applications such as living body detection, the background information may determine the recognition result to a certain extent, so that, in the process of constructing the pseudo-baseline, the object detection frame may be subjected to an external expansion process based on the corner position coordinates of the object detection frame and a preset external expansion ratio, and the object detection frame obtained by the process may not only contribute to constructing the pseudo-baseline, but also may improve the accuracy of subsequent result recognition.

In one possible embodiment, the target object is a target face; the determining, based on the processed object detection box, a recognition result of the target object includes:

and carrying out target face recognition on the processed object detection frame by using the trained living body detection model, and determining whether the target face corresponding to the object detection frame is a real face.

In a second aspect, an embodiment of the present disclosure further provides a terminal control method, where the terminal is provided with a binocular camera, and the method includes:

acquiring a group of face images shot by the binocular camera, wherein the group of face images comprise a first face image shot by a first camera in the binocular camera and a second face image shot by a second camera in the binocular camera;

obtaining a recognition result of a person corresponding to the group of face images by using the image processing method according to the first aspect and any one of the various embodiments thereof, wherein the recognition result includes whether the person is a real face;

and responding to the recognition result of the person, wherein the person is a real face and passes the identity authentication, and controlling the terminal to execute the specified operation.

In a third aspect, an embodiment of the present disclosure further provides an image processing apparatus, including:

the device comprises an acquisition module, a detection module and a display module, wherein the acquisition module is used for acquiring an image to be detected obtained by shooting a target object through binocular cameras, and the image to be detected comprises images acquired through each camera in the binocular cameras respectively;

the detection module is used for carrying out target object detection on the image to be detected to obtain an object detection frame of the target object in the image to be detected;

the external expansion module is used for carrying out external expansion processing on the object detection frames and carrying out translation processing on at least one object detection frame after the external expansion processing to obtain a processed object detection frame;

and the determining module is used for determining the identification result of the target object based on the processed object detection frame.

In a fourth aspect, an embodiment of the present disclosure further provides a terminal control device, including:

the acquisition module is used for acquiring a group of face images shot by the binocular camera, and the group of face images comprise a first face image shot by a first camera in the binocular camera and a second face image shot by a second camera in the binocular camera;

a determining module, configured to obtain, by the image processing method according to the first aspect and any one of the various embodiments thereof, a recognition result of a person corresponding to the group of face images, where the recognition result includes whether the person is a real face;

and the control module is used for responding to the recognition result of the person, wherein the recognition result of the person comprises that the person is a real face, and the person passes the identity authentication and controlling the terminal to execute the specified operation.

In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the image processing method according to the first aspect and any of its various embodiments or the steps of the terminal control method according to the second aspect.

In a sixth aspect, the disclosed embodiments further provide a computer-readable storage medium, where a computer program is stored, and the computer program is executed by an electronic device, where the electronic device executes the steps of the image processing method according to the first aspect and any of its various embodiments or the steps of the terminal control method according to the second aspect.

For the description of the effects of the above apparatus, electronic device, and computer-readable storage medium, reference is made to the description of the above method, which is not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of an image processing method provided in an embodiment of the present disclosure;

fig. 2(a) is a schematic diagram illustrating an application of an image processing method according to a first embodiment of the present disclosure;

fig. 2(b) is a schematic diagram illustrating an application of an image processing method according to a first embodiment of the present disclosure;

fig. 2(c) is a schematic diagram illustrating an application of an image processing method according to a first embodiment of the present disclosure;

fig. 2(d) is a schematic diagram illustrating an application of an image processing method according to a first embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a terminal control method according to a first embodiment of the disclosure;

fig. 4 shows a schematic diagram of an image processing apparatus provided in a second embodiment of the disclosure;

fig. 5 is a schematic diagram illustrating a terminal control device according to a second embodiment of the disclosure;

fig. 6 shows a schematic diagram of an electronic device provided in a third embodiment of the present disclosure;

fig. 7 shows a schematic diagram of another electronic device provided in the third embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Research shows that at present, the in-vivo detection is usually realized by using a group of images collected by each module (corresponding to a binocular camera) by using an in-vivo detection model. In general, different live body detection models are trained for different modules, which results in a large amount of resources consumed for model training.

However, since the baselines of different modules are also different, i.e. the relative distances of two cameras in different binocular cameras are different, this will result in a good model on one module, but a poor accuracy performance on the other module with a different baseline.

Therefore, the living body detection model has weaker adaptation capability to different modules, and different living body detection models are often required to be trained aiming at different modules in the related technology, so that the time cost of model training is greatly improved.

Based on the research, the present disclosure provides an image processing method and apparatus, and a terminal control method and apparatus, so as to improve the generalization capability of a module and reduce the time cost for subsequent applications such as target object identification.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, an image processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the image processing method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capability, and the electronic device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the image processing method may be implemented by a processor calling computer readable instructions stored in a memory.

The following describes an image processing method provided by the embodiment of the present disclosure.

Example one

Referring to fig. 1, which is a flowchart of an image processing method provided in the embodiment of the present disclosure, the method includes steps S101 to S104, where:

s101, acquiring an image to be detected obtained by shooting a target object through binocular cameras, wherein the image to be detected comprises images acquired through each camera in the binocular cameras respectively;

s102, carrying out target object detection on an image to be detected to obtain an object detection frame of a target object in the image to be detected;

s103, carrying out external expansion processing on the object detection frames, and carrying out translation processing on at least one object detection frame subjected to external expansion processing to obtain a processed object detection frame;

and S104, determining the identification result of the target object based on the processed object detection frame.

Here, in order to facilitate understanding of the image processing method provided by the embodiment of the present disclosure, an application scenario of the image processing method may be briefly described first. The image processing method can be mainly applied to related applications of target recognition based on a binocular camera, for example, living body detection can be performed on faces captured by the binocular camera, license plate recognition can be performed on vehicles captured by the binocular camera, and other related applications can be performed, and no specific limitation is made here.

In the embodiment of the disclosure, the target object in the same scene can be shot based on the binocular camera, and two images can be obtained. And obtaining the disparity map by using a stereo matching algorithm, and further obtaining the depth map to realize target identification. In consideration of the fact that under the condition that the relative distances between two cameras included in different modules (corresponding to a binocular camera) are different, even if the same recognition method is used for target recognition on the same target, the recognition results may be different due to the existence of a base line, and particularly in the process of target recognition by using a target detection model, because a large number of image samples need to be input into a training model, if the training is performed by using the image samples with different base lines, the model accuracy is greatly reduced, and in the related art, although different target detection models can be trained on different modules to ensure the model accuracy, the training cost is greatly increased in such a manner.

In order to solve the problem, the embodiment of the present disclosure provides an image processing method capable of providing a universal pseudo baseline for different modules and further performing target identification based on images acquired by different modules, where the method may first reduce the influence of the baseline existing in the current module by using target object detection, and then construct a pseudo baseline based on the cooperation of outward expansion and translation in order to facilitate subsequent target identification, so as to ensure that target identification is achieved based on a uniform pseudo baseline while eliminating the influence of the original baseline.

The two images to be detected collected by the binocular camera in the embodiment of the present disclosure may be determined based on an application scene where the binocular camera is located, for example, in a face recognition application, the two images to be detected collected here may be images including a face; as another example, in an intelligent transportation application, the two images to be detected collected here may be images containing vehicles.

Considering that the relative distance (corresponding to the baseline) between the two cameras included in the binocular camera makes directional difference (i.e., parallax) exist when the two cameras capture the same target object, here, in order to construct a uniform pseudo baseline, first, the influence of the original baseline caused by parallax may be eliminated based on an object detection frame obtained by performing target object detection on the image to be detected.

In the embodiment of the disclosure, the object detection frame where the target object is located can be detected from the image to be detected based on a traditional target object detection method. The target object detection method here may be a frame difference method, a background subtraction method, an optical flow method, or the like.

The embodiment of the disclosure can adopt the above traditional method to realize object detection, and can also perform object detection based on a trained target detection model. The target detection model can be obtained by training an image sample with an object detection frame mark, and the training can be the corresponding relation between an input image sample and an output object detection frame, so that the object detection frame in the image to be detected can be determined under the condition that the image to be detected is input into the trained target detection model.

Under the influence of eliminating the original baseline, the embodiment of the disclosure can create a common pseudo baseline based on the translation operation guidance in consideration of the key role of parallax caused by the baseline on target recognition.

Considering that before the object detection frame is translated, the target object framed in the object detection frame is a complete object, such as a target face including five sense organs, hair, and neck of a person, if the object detection frame is directly translated, the target face will be incomplete, which is not conducive to subsequent object recognition.

Based on this, the embodiment of the present disclosure may perform the flaring operation before performing the translation operation to be able to construct a common pseudo-baseline for different modules.

The external expansion operation can expand the image area framed by the object detection frame to a certain extent. Considering that the larger the image area is, the more the information content contained in the area is, the more the accuracy of the subsequent target identification can be improved, and the translation basis can be provided for the subsequent translation operation.

It should be noted that the translation operation in the embodiment of the present disclosure may be a translation operation performed on one of the two object detection frames after the external expansion processing, where a camera corresponding to the other object detection frame that is not translated may be used as a reference for translation, or a translation operation may be performed on both the two object detection frames, and here, the central positions of the two cameras may be used as a reference for translation.

After the image processing method provided by the embodiment of the present disclosure is subjected to the translation processing, the recognition result of the target object may be determined based on the object detection frame subjected to the translation processing.

In specific application, the trained target recognition model can be used for carrying out target recognition on the object detection frame after the translation processing so as to determine the recognition result of the target object.

The target recognition model in the embodiment of the present disclosure may be a living body detection model related to face recognition, and the object detection frame after the translation processing is input into the trained living body detection model, so as to determine whether the target face corresponding to the object detection frame is a real face.

The target recognition model may be a vehicle detection model related to vehicle recognition, and the type information of the target vehicle corresponding to the target detection frame may be determined by inputting the object detection frame after the translation process to the trained vehicle detection model.

In order to facilitate further understanding of the above target recognition process, the following may take face recognition as an example, and the above process will be described in detail with reference to fig. 2(a) to 2 (d).

As shown in fig. 2(a), two images to be detected are acquired by a binocular camera set for human face living body detection, a left image is a human body image 1 acquired by a left eye camera (i.e., RGB camera) included in the binocular camera, and a right image is a human body image 2 acquired by a right eye camera (i.e., near-infrared camera) included in the binocular camera, and after target human face detection is performed on the two human face images, an object detection frame of a target human face in the two human body images can be generated, as shown in fig. 2 (b).

For the two object detection boxes shown in fig. 2(b), the embodiment of the present disclosure may perform an extension process, as shown in fig. 2 (c). Here, the object detection frame in the right image of fig. 2(c) may be subjected to translation processing to obtain the object detection frame after translation processing, and as shown in the right image of fig. 2(d), the left image of fig. 2(d) is the same as the left image of fig. 2(c), and includes the object detection frame after the extension processing without translation processing.

In the embodiment of the present disclosure, two images shown in fig. 2(d) may be cut based on the object detection frame to obtain corresponding face images, and when the two face images are input to the living body detection model, it may be determined whether the target face is a real face.

Here, considering that the right image in fig. 2(d) is an image captured by a near-infrared camera, in the case where it is determined that there are non-living body elements such as paper, a screen, and the like in the target detection frame, it can be directly determined that the target face is not a real face, and the corresponding living body detection score is 0.

It should be noted that, according to the above example manner, the embodiment of the present disclosure may perform translation processing on an image acquired by a right-eye camera, and may also perform translation processing on an image acquired by the right-eye camera.

Considering the key influence of the extension process on the target recognition, the relevant contents of the detection box extension process will be described in detail below. In the embodiment of the present disclosure, the external expansion process may specifically be performed through the following steps:

step one, determining the position coordinates of the corner points of an object detection frame in an image to be detected;

and secondly, carrying out external expansion processing on the object detection frame based on the determined angular point position coordinates and a preset external expansion ratio to obtain the object detection frame after the external expansion processing.

Here, first, the angular point position coordinates of the object detection frame in the image to be detected may be determined, where the angular point position coordinates may correspond to image coordinates of four corners of the object detection frame, and when the angular point position coordinates are determined, the object detection frame may be subjected to an external expansion process based on the angular point position coordinates and a preset external expansion ratio to obtain a processed object detection frame.

The preset outward expansion ratio may be determined based on the actual size of the target object. This is mainly considered that the actual size of the target object is different, and the corresponding imaging size is also different. The face and the vehicle at the same position are captured by using the binocular camera in the same scene, and the vehicle detection frame corresponding to the vehicle is far larger than the face detection frame corresponding to the face. Here, a larger proportion of the dilation factor may be set for a larger-sized target object so that coverage of target object peripheral region information of a larger imaging size can be achieved, and a smaller proportion of the dilation factor may be set for a smaller-sized target object so as to be sufficient for coverage of target object peripheral region information of a smaller imaging size.

The image processing method provided by the embodiment of the disclosure may be directly based on the coordinates of the corner points of the four corners of the object detection frame and the preset outward expansion ratio set for the object detection frame to perform outward expansion processing, for example, the outward expansion of the object detection frame may be 0.5 times; the external expansion processing can also be realized by dragging four corners of the object detection frame; the length corresponding to the four frames (corresponding to the upper frame, the lower frame, the left frame and the right frame) can be determined based on the angular point position coordinates of the four corners, the outward expansion processing for the frames is further realized through the preset outward expansion proportion set for each frame, and the outward dragging mode of the frames can be realized during specific application.

It should be noted that, for four frames formed by four corners of the object detection frame, different scaling factors may be set to adapt to the requirement of the target object for extending the key portion. For the face object frame, the four frames may be respectively extended in the extending direction according to a certain extending proportion, for example, the four frames of the left frame, the right frame, the upper frame, and the lower frame may be respectively extended by 0.4 times, 0.8 times, and 0.4 times, and the sizes of the four frames of the face object frame after being extended are all changed. The reason why the eye corresponding to the upper frame is expanded by a larger multiple is mainly that the eye is the upper part of the five sense organs detected by the face detection frame, and the part is also provided with living elements such as the forehead, so that the upper frame can be expanded by a larger multiple appropriately in order to improve the accuracy of subsequent living body detection.

It should be noted that in this embodiment of the present disclosure, the outward expansion processing may also be performed separately on one of the four frames, or may also be performed simultaneously on a pair of frames (for example, a left frame and a right frame) of the four frames, or may be performed in other outward expansion processing manners, which is not described herein again.

In view of the key role of the translation process for the pseudo-baseline construction, the relevant contents of the detection frame translation process will be described in detail below.

In the case of performing translation processing on one object detection frame after the external expansion processing, in the embodiment of the present disclosure, the translation processing may be specifically performed through the following steps:

step one, selecting a frame to be translated from object detection frames subjected to external expansion processing, wherein the frame to be translated is located in an image acquired by a first camera;

and step two, translating the detection frame to be translated along the direction departing from the second camera to obtain the processed object detection frame.

In the embodiment of the present disclosure, for an object detection frame after the outward expansion processing, one frame to be translated and detected may be selected from the object detection frame, where if an object detection frame (i.e., a right detection frame) corresponding to a right-eye camera (corresponding to a first camera) included in a binocular camera is selected as the frame to be translated and detected, a translation direction of the right detection frame may be determined based on a relative position relationship between the right-eye camera and the left-eye camera, and if the right-eye camera is relatively close to the right, the translation direction of the right detection frame may be determined to be the right (i.e., a direction away from the left-eye camera), and if an object detection frame (i.e., a left detection frame) corresponding to the left-eye camera is selected as the frame to be translated and detected, the translation direction of the left detection frame may.

In the embodiment of the present disclosure, in addition to performing translation according to the translation direction, translation may also be performed in combination with a translation distance, and in the embodiment of the present disclosure, the translation distance may be determined based on size information of the object detection frame after the external expansion processing.

Considering that target objects with different distances from the camera have certain difference in imaging size, when the size information of the object detection frame is larger, it is described to a certain extent that the distance between the target object and the camera is shorter, the parallax of the target object at the shorter distance is larger, and when the size information of the object detection frame is smaller, it is described to a certain extent that the distance between the target object and the camera is longer, and the parallax of the target object at the longer distance is smaller.

In order to construct a pseudo baseline applicable to both large-size target objects and small-size target objects, here, the translation distance may be determined based on a preset translation coefficient and a width value in the size information of the object detection frame, where the width value of the object detection frame corresponds to the size of the lateral edge of the object detection frame.

The translation distance is proportional to the width, that is, the larger the translation distance determined by the object detection frames with larger widths is, the larger parallax of the large-sized target object is balanced, and the smaller the translation distance determined by the object detection frames with smaller widths is, the smaller parallax of the small-sized target object is balanced, so that a sharable pseudo-baseline suitable for various target objects can be constructed.

In the case of performing the translation processing on the two object detection frames after the external expansion processing, in the embodiment of the present disclosure, each object detection frame of the two object detection frames after the external expansion processing may be translated in a direction away from the other object detection frame, so as to obtain the processed object detection frames.

Here, the translation directions of the left detection frame and the right detection frame may be respectively determined based on a relative positional relationship between a right eye camera and a left eye camera included in the binocular camera, the right eye camera is relatively close to the right, the translation direction of the right detection frame may be determined to be the right (i.e., the direction away from the left detection frame), the left eye camera is relatively close to the left, and the translation direction of the left detection frame may be determined to be the left (i.e., the direction away from the right detection frame).

Here, the translation processing may also be implemented in combination with the translation distance, and specific reference may be made to the above description, which is not described herein again.

The image processing method provided by the embodiment of the disclosure can overcome the problem of inaccurate identification result caused by different base lines of different binocular cameras, has strong robustness, and can be widely applied to various technical fields.

First, the image processing method may be applied to a terminal control application, as shown in fig. 3, and may be specifically implemented according to the following steps:

s301, acquiring a group of face images shot by a binocular camera, wherein the group of face images comprises a first face image shot by a first camera in the binocular camera and a second face image shot by a second camera in the binocular camera;

s302, obtaining a group of recognition results of people corresponding to the face images through the image processing method, wherein the recognition results comprise whether the people are real faces or not;

s303, responding to the recognition result of the person, wherein the person is a real face and passes the identity authentication, and controlling the terminal to execute the specified operation.

Here, when it is determined that the person corresponding to the group of face images is a real face based on the image processing method, the terminal may be controlled to execute the specified operation in combination with the result of successful identity authentication, otherwise, the terminal may refuse to execute the specified operation and issue an alarm prompt.

In the embodiment of the present disclosure, in the case where this terminal control method is applied to different terminals, the corresponding executed designated operations may also be different. The terminal here may be a user terminal, a gate device terminal, a payment terminal, or the like.

For example, when the method is applied to an unlocking scene, when it is determined that a person corresponding to a group of face images shot by a binocular camera of a user terminal is a real face and the identity of the person is legal, it is determined that unlocking can be successful, and when a non-real face or an illegal identity is identified, it is determined that unlocking fails, at this time, an unlocking failure prompt message can be returned to the user terminal to prompt a user.

For another example, when the method is applied to a gate machine passing verification scene, a passing switch connected with a gate machine equipment terminal can be controlled to be opened under the conditions that a real human face is determined based on an identification result and the identity is legal, so that automatic passing of the gate machine is realized, and passing cannot be performed if a non-real human face is identified or the identity is illegal.

It should be noted that the image processing method provided in the present disclosure may be applied not only to the unlocking application and the gate verification application, but also to other scenarios, for example, pedestrian detection in a video monitoring scenario may be performed, and may also be embedded into a financial device to perform live body detection of a financial service, which is not described herein again.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an image processing apparatus corresponding to the image processing method and a terminal control apparatus corresponding to the terminal control method are also provided in the embodiments of the present disclosure.

Example two

Referring to fig. 4, a schematic diagram of an image processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: an acquisition module 401, a detection module 402, an external expansion module 403 and a determination module 404; wherein the content of the first and second substances,

the acquiring module 401 is configured to acquire an image to be detected, which is obtained by shooting a target object through binocular cameras, where the image to be detected includes images acquired through each of the binocular cameras respectively;

the detection module 402 is configured to perform target object detection on an image to be detected to obtain an object detection frame of the target object in the image to be detected;

an external expansion module 403, configured to perform external expansion processing on the object detection frames, and perform translation processing on at least one object detection frame after the external expansion processing, to obtain a processed object detection frame;

a determining module 404, configured to determine, based on the processed object detection box, a recognition result of the target object.

The embodiment of the disclosure can focus on the target object based on the target object detection, can preliminarily reduce the influence of the baseline, and also considers that in the process of identifying the target object aiming at two images to be detected collected by the binocular cameras, the depth information of the relevant target object needs to be determined by referring to the parallax formed by one module (corresponding to one binocular camera), so that the embodiment of the disclosure further achieves the effect of simulating the parallax of human eyes by performing the matched processing operation of external expansion and translation on the target detection frame, and obtains the processed target detection frame for identifying the target object. The embodiment of the disclosure has strong universality, and can realize the purpose that different modules share one set of pseudo base line to simulate the parallax of human eyes, thereby improving the generalization capability of the modules and reducing the time cost of subsequent application such as target object identification.

In one possible embodiment, the binocular camera comprises a first camera and a second camera; in a case that the one object detection frame after the extension processing is subjected to the translation processing, the extension module 403 is configured to perform the translation processing on at least one object detection frame after the extension processing according to the following steps:

and translating the detection frame to be translated along the direction departing from the second camera to obtain the processed object detection frame.

In a possible implementation manner, the outward expansion module 403 is configured to translate the detection frame to be translated in a direction away from the second camera, to obtain a processed object detection frame, according to the following steps:

and moving the to-be-translated detection frame by the translation distance according to the direction departing from the second camera to obtain the object detection frame after translation processing.

In a possible implementation manner, the external expansion module 403 is configured to determine the translation distance based on the size information of the object detection frame after the external expansion processing according to the following steps:

In a possible embodiment, in the case of performing translation processing on two object detection frames after the extension processing, the extension module 403 is configured to perform translation processing on at least one object detection frame after the extension processing according to the following steps to obtain a processed object detection frame:

In a possible implementation manner, the external expansion module 403 is configured to perform an external expansion process on the object detection frame according to the following steps:

In one possible embodiment, the target object is a target face; a determining module 404, configured to determine, based on the processed object detection box, a recognition result of the target object according to the following steps:

and carrying out target face identification on the processed object detection frame by using the trained living body detection model, and determining whether the target face corresponding to the object detection frame is a real face.

Referring to fig. 5, a schematic diagram of a terminal control device provided in an embodiment of the present disclosure is shown, where the terminal control device includes: an acquisition module 501, a determination module 502 and a control module 503; wherein the content of the first and second substances,

the system comprises an acquisition module 501, a display module and a display module, wherein the acquisition module is used for acquiring a group of face images shot by a binocular camera, and the group of face images comprises a first face image shot by a first camera in the binocular camera and a second face image shot by a second camera in the binocular camera;

a determining module 502, configured to obtain, by using the image processing method, a recognition result of a person corresponding to a group of face images, where the recognition result includes whether the person is a real face;

and the control module 503 is configured to, in response to that the recognition result of the person includes that the person is a real face and that the person passes through the identity authentication, control the terminal to perform a specified operation.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

EXAMPLE III

An embodiment of the present disclosure further provides an electronic device, as shown in fig. 6, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 601, a memory 602, and a bus 603. The memory 602 stores machine-readable instructions executable by the processor 601 (for example, execution instructions corresponding to the acquisition module 401, the detection module 402, the expansion module 403, and the determination module 404 in the image processing apparatus in fig. 4, and the like), when the electronic device is operated, the processor 601 communicates with the memory 602 through the bus 603, and when the processor 601 executes the following processes:

carrying out target object detection on an image to be detected to obtain an object detection frame of a target object in the image to be detected;

and determining the identification result of the target object based on the processed object detection frame.

In one possible embodiment, the binocular camera comprises a first camera and a second camera; when performing a translation process on one object detection frame after the extension process, the instructions executed by the processor 601 perform a translation process on at least one object detection frame after the extension process to obtain a processed object detection frame, including:

In a possible implementation manner, in the instructions executed by the processor 601, translating the detection frame to be translated in a direction away from the second camera to obtain a processed object detection frame, including:

In a possible implementation manner, the determining, by the processor 601, the translation distance based on the size information of the object detection frame after the external expansion processing includes:

In one possible embodiment, in a case that the two object detection frames after the extension processing are subjected to the translation processing, the instructions executed by the processor 601 to perform the translation processing on at least one object detection frame after the extension processing to obtain the processed object detection frame includes:

In a possible implementation manner, the instructions executed by the processor 601 perform an outward expansion process on the object detection frame, including:

In one possible embodiment, the target object is a target face; the instructions executed by the processor 601 for determining the recognition result of the target object based on the processed object detection frame include:

An embodiment of the present disclosure further provides an electronic device, as shown in fig. 7, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 701, a memory 702, and a bus 703. The memory 702 stores machine-readable instructions executable by the processor 701 (for example, execution instructions corresponding to the obtaining module 501, the determining module 502, and the control module 503 in the terminal control apparatus in fig. 5, and the like), when the electronic device is operated, the processor 701 and the memory 702 communicate via the bus 703, and when the machine-readable instructions are executed by the processor 701, the following processing is performed:

acquiring a group of face images shot by a binocular camera, wherein the group of face images comprise a first face image shot by a first camera in the binocular camera and a second face image shot by a second camera in the binocular camera;

obtaining the recognition result of the figure corresponding to the group of face images by the image processing method, wherein the recognition result comprises whether the figure is a real face;

and in response to the fact that the recognition result of the person comprises that the person is a real face and the person passes the identity authentication, controlling the terminal to execute the specified operation.

The specific execution process of the instruction may refer to the steps of the method described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program executes the steps of the image processing method and the terminal control method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the image processing method and the terminal control method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the image processing method and the terminal control method described in the embodiments of the methods.

The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

2. The image processing method according to claim 1, wherein the binocular camera includes a first camera and a second camera;

3. The image processing method according to claim 2, wherein translating the detection frame to be translated in a direction away from the second camera to obtain a processed object detection frame comprises:

and moving the frame to be translated by the translation distance according to the direction departing from the second camera to obtain the processed object detection frame.

4. The image processing method according to claim 3, wherein the determining a translation distance based on the size information of the object detection frame after the dilation processing comprises:

5. The image processing method according to claim 1, wherein, in a case where the two object detection frames after the extension processing are subjected to the translation processing, the performing the translation processing on at least one object detection frame after the extension processing to obtain the processed object detection frame includes:

6. The image processing method according to any one of claims 1 to 5, wherein the step of performing the outward expansion processing on the object detection frame includes:

7. The image processing method according to any one of claims 1 to 6, wherein the target object is a target face; the determining, based on the processed object detection box, a recognition result of the target object includes:

and carrying out target face recognition on the processed object detection frame by using the trained living body detection neural network, and determining whether the target face corresponding to the object detection frame is a real face.

8. A terminal control method is characterized in that the terminal is provided with a binocular camera, and the method comprises the following steps:

obtaining a recognition result of a person corresponding to the set of face images by the image processing method according to any one of claims 1 to 7, wherein the recognition result includes whether the person is a real face;

9. An image processing apparatus characterized by comprising:

10. A terminal control apparatus, comprising:

a determining module, configured to obtain, through the image processing method according to any one of claims 1 to 7, a recognition result of a person corresponding to the group of face images, where the recognition result includes whether the person is a real face;

11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the image processing method according to any one of claims 1 to 7 or the steps of the terminal control method according to claim 8.

12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by an electronic device, causes the electronic device to perform the steps of the image processing method according to any one of claims 1 to 7 or the steps of the terminal control method according to claim 8.