CN114827351A

CN114827351A - Method, device, equipment and storage medium for automatically answering incoming call

Info

Publication number: CN114827351A
Application number: CN202210437029.0A
Authority: CN
Inventors: 匡心意
Original assignee: Shenzhen Xiaopai Technology Co ltd
Current assignee: Shenzhen Xiaopai Technology Co ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-07-29

Abstract

The invention discloses a method, a device and equipment for automatically answering an incoming call and a computer readable storage medium, and belongs to the technical field of intelligent interaction. The method obtains an analysis result by analyzing a target image acquired when incoming call access is detected; judging whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result; if yes, judging whether a preset gesture exists in the target image according to the coordinate information in the analysis result; and if the preset gesture is judged to exist, automatically answering the incoming call. The invention solves the problem of low accuracy rate when the intelligent terminal identifies the gesture of the user and automatically answers the incoming call in the prior art, effectively improves the accuracy rate of identifying the specific gesture, and ensures that the intelligent terminal can accurately identify the gesture of the user for answering the call, thereby automatically answering the incoming call.

Description

Method, device, equipment and storage medium for automatically answering incoming call

Technical Field

The invention relates to the technical field of intelligent interaction, in particular to a method, a device, equipment and a computer readable storage medium for automatically answering an incoming call.

Background

With the development of artificial intelligence technology, various intelligent terminals using the artificial intelligence technology are continuously being developed, such as various smart televisions, smart furniture, smart phones, and the like. Compared with the traditional terminal, the intelligent terminal has incomparable advantages and convenience in a man-machine interaction and control mode, for example, a user can interact or control with the intelligent terminal through voice, or the intelligent terminal is controlled to automatically execute a corresponding instruction through voice, gesture and other modes.

At present, many large-screen intelligent terminals also have a conversation function, and compared with small-screen terminals such as mobile phones, a camera on the large-screen terminal can shoot at least the user in the scene. Based on the function, many large-screen intelligent terminals realize the function of automatically answering the incoming call based on the specific gesture recognition, and the function of automatically answering the incoming call is realized by recognizing whether the user makes the specific gesture or not, so that the user does not need to manually touch and answer the incoming call. In practical application, due to the influence of a plurality of complex factors such as illumination, scene, individual change and the like, if the intelligent terminal simply identifies the gesture action of the user, the accuracy is often low. Therefore, the prior art proposes to perform call answering based on recognizing whether the whole upper body of the user makes a corresponding call answering gesture.

However, the existing method for answering a call based on gesture recognition usually needs a large amount of original sample data to perform deep learning training, but due to cost, the original sample data available for deep learning training in real life is usually limited, so that the accuracy rate of the intelligent terminal in the prior art when recognizing a specific gesture is not high, and the problem that the accuracy rate is not high when the intelligent terminal recognizes a user gesture to automatically answer the call exists.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a computer readable storage medium for automatically answering a call, aiming at accurately identifying the gesture of a user for answering the call so as to automatically answer the call.

In order to achieve the above object, the present invention provides a method for automatically answering a call, comprising the steps of:

analyzing a target image acquired when incoming call access is detected to obtain an analysis result;

judging whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result;

if yes, judging whether a preset gesture exists in the target image according to the coordinate information in the analysis result;

and if the preset gesture is judged to exist, automatically answering the incoming call.

Optionally, the step of analyzing the target image collected when the phone access is detected to obtain an analysis result includes:

respectively sending the target image to a preset face recognition library and a preset gesture recognition library for analysis to obtain an analysis result, wherein the analysis result comprises the existence probability information and the coordinate information;

acquiring first existence probability information of the face image and second existence probability information of the gesture image in the existence probability information;

and acquiring first coordinate information of the face image and second coordinate information of the gesture image in the coordinate information.

Optionally, the step of determining whether the face image and the gesture image simultaneously exist in the target image according to the existence probability information in the analysis result includes:

judging whether the first existence probability information and the second existence probability information in the existence probability information are simultaneously greater than a preset probability value or not;

if the first existence probability information and the second existence probability information are simultaneously greater than the preset probability value, the fact that the face image and the gesture image simultaneously exist in the target image is judged.

Optionally, the step of determining whether a preset gesture exists in the target image according to the coordinate information in the analysis result includes:

acquiring coordinate difference information of a coordinate point between the first coordinate information and the second coordinate information in the coordinate information;

and judging whether the coordinate difference information reaches a preset condition, and determining that a preset gesture exists in the target image when the coordinate difference information reaches the preset condition.

Optionally, the step of obtaining coordinate difference information of a coordinate point between the first coordinate information and the second coordinate information in the coordinate information includes:

obtaining a distance difference value of a coordinate center point between the first coordinate information and the second coordinate information;

acquiring a height difference value and a horizontal axis difference value of coordinate points between the first coordinate information and the second coordinate information;

and assigning the distance difference value, the height difference value and the horizontal axis difference value as the coordinate difference information.

Optionally, the step of determining whether the coordinate difference information meets a preset condition includes:

judging whether the distance difference value is within a first difference value range, whether the height difference value is within a second difference value range and whether the horizontal axis difference value is within a third difference value range;

and if the distance difference is within the first difference range, the height difference is within the second difference range, and the horizontal axis difference is within the third difference range, determining that the coordinate difference information reaches the preset condition.

Optionally, the method for automatically answering a call is applied to a large-screen terminal, the large-screen terminal is configured with a camera device, and before the step of analyzing a target image acquired when the incoming call is detected to obtain an analysis result, the method includes:

and when the incoming call is detected, starting the camera device to acquire the target image through the camera device.

In addition, the invention also provides a device for automatically answering the incoming call, which comprises:

the analysis module is used for analyzing the target image acquired when the incoming call access is detected to obtain an analysis result;

the judging module is used for judging whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result;

the gesture recognition module is used for judging whether a preset gesture exists in the target image or not according to the coordinate information in the analysis result if the preset gesture exists in the target image;

and the answering module is used for automatically answering the incoming call if the preset gesture is judged to exist.

Optionally, the parsing module is further configured to:

Optionally, the determining module is further configured to:

Optionally, the gesture recognition module is further configured to:

acquiring a height difference value and a transverse axis difference value of a coordinate point between the first coordinate information and the second coordinate information;

Optionally, the determining module is further configured to:

Optionally, the apparatus further comprises:

and the acquisition module is used for starting the camera device to acquire the target image through the camera device when detecting that an incoming call is accessed.

In addition, the invention also provides a device for automatically answering the incoming call, which comprises: the system comprises a memory, a processor and a program for automatically answering the incoming call, wherein the program for automatically answering the incoming call is stored in the memory and can be operated on the processor, and the program for automatically answering the incoming call is configured to realize the steps of the method for automatically answering the incoming call.

In addition, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a program for automatically answering the incoming call, and the program for automatically answering the incoming call realizes the steps of the method for automatically answering the incoming call when being executed by a processor.

The method obtains an analysis result by analyzing a target image acquired when the incoming call access is detected; judging whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result; if yes, judging whether a preset gesture exists in the target image according to the coordinate information in the analysis result; and if the preset gesture is judged to exist, automatically answering the incoming call.

Compared with the existing mode of simply identifying the gesture of the user or integrally identifying the action of the upper half of the user to judge whether the specific gesture exists, the invention judges whether the face image and the gesture image simultaneously exist in the target image or not by analyzing the target image, and further judges whether the preset gesture exists in the target image or not according to the coordinate information of the face image and the gesture image when the face image and the gesture image simultaneously exist in the target image, so that the identification mode of the intelligent terminal to the specific gesture is changed, because the existing face identification technology is mature and has higher accuracy, the method effectively improves the accuracy of identifying the specific gesture, solves the problem that the accuracy of identifying the gesture of the intelligent terminal is not high when the intelligent terminal automatically answers an incoming call in the prior art, and ensures that the intelligent terminal can accurately identify the gesture of answering the call of the user, thereby automatically answering the incoming call.

Drawings

FIG. 1 is a schematic structural diagram of a device for automatically answering an incoming call by hardware operation according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a method for automatically answering an incoming call according to the present invention;

FIG. 3 is a schematic diagram of an application flow involved in an embodiment of a method for automatically answering an incoming call according to the present invention;

fig. 4 is a schematic diagram of a functional module structure of an apparatus for automatically answering an incoming call according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an apparatus for automatically answering an incoming call according to an embodiment of the present invention. The device for automatically answering the incoming call can be a large-screen intelligent terminal with a conversation function, such as an intelligent television.

As shown in fig. 1, the apparatus for automatically answering an incoming call may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the apparatus for automatically answering an incoming call, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a program for automatically answering an incoming call.

In the automatic incoming call answering apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with other apparatuses; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the automatic call answering device of the present invention can be arranged in the automatic call answering device, and the automatic call answering device calls the program of the automatic call answering stored in the memory 1005 through the processor 1001 and executes the following operations:

Further, the processor 1001 may be configured to call a program stored in the memory 1005 for automatically answering an incoming call, and further perform the following operations:

With the continuous development of Artificial Intelligence technology, advanced learning algorithms such as AI (Artificial Intelligence) image recognition have many open-source engineering implementations. By introducing an open source engineering NCNN (high performance neural network forward computing framework optimized for a mobile phone end) algorithm library and a learning tool into the android platform, a real-time picture acquired by a camera can be transmitted into an AI library, and the existence, the existence probability and the position of a certain specific object can be calculated by an identification library.

Therefore, the method for automatically answering the incoming call based on the specific gesture recognition is provided, namely, the intelligent terminal automatically answers the incoming call when recognizing that the user makes the specific gesture for answering the call, and the user does not need to manually control to answer the call. However, the simple gesture recognition is greatly influenced by the factors such as scene and illumination, and is difficult to accurately recognize. At this time, based on that the large-screen intelligent terminal can acquire a complete image of the upper half of the user through the camera, a method for acquiring the image of the upper half of the user to integrally identify whether the user makes a corresponding call answering action so as to judge whether to automatically answer the call is provided. However, in this way, when a new recognition target is introduced into the deep learning library, a large number of original target images are required to be introduced, so that the recognition rate and accuracy can be improved through continuous training. In practical application, a target picture for making an answer call is basically difficult to obtain from other channels, and in consideration of cost, an original target image is also difficult to obtain in a mode of finding a person to find a scene and shooting a large number of target pictures. Therefore, the original target image available for deep learning is usually limited, and therefore, the accuracy is not high when the intelligent terminal identifies the gesture of the user and automatically answers the incoming call in the prior art.

In order to solve the above problem, the present invention provides a method for automatically answering an incoming call, comprising: analyzing a target image acquired when incoming call access is detected to obtain an analysis result; judging whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result; if yes, judging whether a preset gesture exists in the target image according to the coordinate information in the analysis result; and if the preset gesture is judged to exist, automatically answering the incoming call.

The method judges whether the preset gesture exists or not by judging whether the face image and the gesture image exist in the target image at the same time or not and further judging whether the preset gesture exists or not according to the coordinate information of the face image and the gesture image when the face image and the gesture image exist at the same time, so that whether the call is answered automatically or not is judged. Compared with the method for simply recognizing the gestures in the prior art, the method increases the recognition conditions of the human face, the coordinate information of the human face and the gesture image, namely the position relation. Because the face recognition technology is mature in the prior art, other cost investment is not needed, and the accuracy is very high, the method combining the face recognition and the gesture recognition can effectively improve the recognition accuracy of specific gestures without increasing extra cost, and solves the problem that the accuracy is not high when an intelligent terminal recognizes the gestures of a user and automatically answers an incoming call in the prior art. The intelligent terminal can accurately identify the gesture of the user for answering the call, so that the call can be automatically answered.

An embodiment of the present invention provides a method for automatically answering a call, and referring to fig. 2, fig. 2 is a schematic flow diagram of an embodiment of the method for automatically answering a call according to the present invention.

In this embodiment, the method for automatically answering an incoming call includes:

and step S10, analyzing the target image collected when the incoming call is detected to obtain an analysis result.

In this embodiment, the execution subject is an intelligent terminal with a call function, such as an intelligent television. The target image is an image which is acquired by the intelligent terminal and at least comprises an image of the upper half of the user when the intelligent terminal detects that an incoming call is accessed, and is used for identifying whether the user makes a gesture of answering the incoming call or not so as to judge whether the incoming call needs to be automatically answered or not.

The analysis result refers to whether the face image and the gesture image obtained by the recognition library exist or not, and result information of existence probability and position, and includes existence probability information and coordinate information of the face image and the gesture image in the target image.

Specifically, for example, as shown in the application flow shown in fig. 3, the target image is respectively sent to a preset face recognition library and a preset gesture recognition library, and recognition results of the face recognition library and the gesture recognition library are obtained as the analysis results.

Optionally, in step S10, before analyzing the target image acquired when the incoming call is detected to obtain an analysis result, the method includes:

and step S01, when the incoming call is detected, starting the camera device to acquire the target image through the camera device.

In this embodiment, the method for automatically answering a call is applied to a large-screen terminal equipped with a camera device, where the large-screen terminal refers to an intelligent terminal with a relatively large screen such as an intelligent television, and compared with a small-screen terminal such as a smart phone that is usually held by a user, the large-screen terminal is usually located at a certain distance from the user, so that a target image at least including an image of the upper half of the user can be acquired when an incoming call is detected to be accessed.

The camera device is a device with a camera function for acquiring a target image, for example, a camera in an intelligent terminal. According to the application process shown in fig. 3, when the intelligent terminal detects that the user has incoming call access, that is, when a call request is received, the camera device can be automatically started, and the camera function is turned on, so that the external scene image at the current moment is collected through the camera and serves as a target image to be recognized.

In the embodiment, when the incoming call is detected, the target image is acquired through the camera device, namely, the image of the external scene at the current moment is shot, so that whether the user makes a gesture for answering the call or not is further judged, and whether the incoming call needs to be automatically answered or not is judged.

And step S20, judging whether the target image has a face image and a gesture image at the same time according to the existence probability information in the analysis result.

In this embodiment, in the AI image recognition process, the machine recognition image does not completely recognize a complex picture at a time, but divides a complete picture into a plurality of small parts, extracts the features in each small part (i.e., recognizes each small part), and then combines the features of the small parts together, thereby completing the machine recognition image process. Therefore, after the machine has finished recognizing the image, the existence probability information and the coordinate information of the specific object existing in the image, that is, the analysis result is output. For example, when performing face recognition on an image, the existence probability information of a face and the coordinate information of the face in the image are calculated, and the obtained face existence probability information and position information are analysis results. The coordinate information is used to indicate the position information of a specific object existing in the image, and is usually a set of coordinate points. The presence probability information is used to indicate a probability value of the presence of the specific object in the image, and generally, the greater the probability value, the greater the possibility of the presence of the specific object in the image.

Judging whether a face image and a gesture image exist in a target image at the same time or not, namely dividing the target image into two points for identification, wherein one point is used for simply identifying a gesture in the target image, the other point is used for simply identifying the face in the target image, after the face and the gesture in the target image are respectively identified to obtain an analysis result, judging whether the face image and the gesture image exist in the target image at the same time or not according to existence probability information in the analysis result, and the gesture image is a gesture image which indicates that the call needs to be answered automatically.

When it needs to be explained, due to the influence of factors such as the usage scene of the large-screen terminal, the collected and recognized face and gesture images may be single or multiple, and at least one gesture image and one face image need to exist simultaneously, so that it can be determined that the gesture image and the face image exist simultaneously in the target image.

And step S30, if yes, judging whether a preset gesture exists in the target image according to the coordinate information in the analysis result.

In this embodiment, if yes, it indicates that the face image and the gesture image exist in the target image at the same time. The preset gesture refers to a gesture for indicating that a call needs to be answered. When a user makes a gesture for answering a call, the hand is usually located slightly below the face, that is, the positional relationship between the face image and the gesture image is relatively fixed, that is, the gesture image needs to be located a short distance below the face image.

Therefore, the position relation between the existing face image and the gesture image can be obtained according to the coordinate information in the analysis result, whether the gesture is not far below the face is judged, and whether the preset gesture representing the incoming call exists in the target image is further judged.

And step S40, if the preset gesture is judged to exist, the incoming call is automatically answered.

In this embodiment, when the coordinate information of the face and the gesture in the analysis result meets the condition, that is, the recognized face image and the recognized gesture image meet the condition that the gesture is not far below the face, it is determined that the preset gesture exists, and at this time, the incoming call is automatically answered.

According to the method, the target image is analyzed to obtain the analysis result, whether the human face image and the gesture image exist in the target image or not is judged firstly according to the existence probability information in the analysis result, when the human face image and the gesture image exist in the target image at the same time, whether the preset gesture exists in the target image or not is further judged according to the coordinate information in the analysis result, the recognition mode of the intelligent terminal on the specific gesture is changed, the recognition accuracy rate of the specific gesture is effectively improved, and the technical effect of reducing the cost is achieved because a large number of pictures of users making phone answering actions are not required to be obtained as original sample data of deep learning. The problem of among the prior art intelligent terminal when discerning user's gesture automatic answer incoming call, the rate of accuracy is not high is solved for intelligent terminal can accurately discern user's gesture, answers the incoming call according to the demand automation, need not user manual control and answers comparatively convenient.

Further, in another embodiment of the method for automatically answering an incoming call according to the present invention, in step S10, the analyzing the target image collected when the incoming call is detected to obtain an analysis result includes:

step S11, the target image is respectively sent to a preset face recognition library and a preset gesture recognition library for analysis, so as to obtain the analysis result, wherein the analysis result comprises the existence probability information and the coordinate information.

In this embodiment, the preset face recognition library is based on the existing face recognition technology, and a trained, more accurate and complete analysis library is used for recognizing and analyzing the face image in the target image without new development and learning. The preset gesture recognition library is an analysis library obtained by deep learning and training according to pure gesture pictures of various answering calls and is used for recognizing and analyzing gesture images in the target images.

As shown in the application flow of fig. 3, after the same target image is respectively sent to the face recognition library and the gesture recognition library, the two recognition libraries respectively recognize and analyze the image, and output corresponding analysis results. The analysis result comprises existence probability information and coordinate information of the human face or the gesture image in the target image.

It should be noted that when an incoming call is detected, a plurality of frames of target images are usually collected, and the obtained analysis result needs to be existence probability information and coordinate information of a face or a gesture existing on the same frame of image.

Step S12, obtaining first existence probability information of the face image and second existence probability information of the gesture image in the existence probability information.

In this embodiment, based on the existence probability information in the parsing result output by the parsing library, first existence probability information of a face image and second existence probability information of a gesture image in the same frame of target image may be obtained respectively. Since there may be a plurality of face or gesture images in the same frame of target image, the first and second existence probability information may include existence probability information of one or more face or gesture images, for example, the first existence probability information may include: existence probability of face 1: 50%, existence probability of face 2: 65%, existence probability of face 3: 80 percent; the second presence probability information may include: probability of presence of gesture 1: 20% and gesture 2 existence probability 60%.

Step S13, obtaining first coordinate information of the face image and second coordinate information of the gesture image in the coordinate information.

In this embodiment, based on the coordinate information in the analysis result, first coordinate information of a face image and second coordinate information of a gesture image existing in the same frame of target image may be obtained respectively. The first and second coordinate information are coordinate point set information for representing positions of the human face and the gesture image in the target image. Since the machine usually recognizes images from rectangular regions, the coordinate information to be finally output is usually a coordinate point set including at least coordinates of four vertices and coordinates of a center point of a rectangular image region.

In this embodiment, the same target image is respectively sent to the face recognition library and the gesture recognition library, so as to obtain an analysis result, that is, probability information and position information of the face image and the gesture image existing in the target image, so as to further determine whether a preset gesture exists in the target image.

Further, in another embodiment of the method for automatically answering an incoming call of the present invention, the step S20 of determining whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the parsing result includes:

step S21, determining whether the first existence probability information and the second existence probability information in the existence probability information are simultaneously greater than a preset probability value.

In this embodiment, the preset probability value is the lowest probability value that needs to be reached when it is determined that a face or a gesture image exists, and may be defined by itself according to actual needs. In addition, the criterion of the preset existence probability information of the human face and the gesture image may not be consistent, that is, the preset probability values of the first and second existence probability information may not be consistent. For example, because the face recognition is accurate, the preset probability value of the face image can be set to 90%; and the preset probability value of the gesture image is set to 80%, and then comparison is carried out respectively to judge whether the face image and the gesture image exist at the same time.

Specifically, for example, it is determined whether the first existence probability information is greater than a first preset probability value, and whether the second existence probability information is greater than a second preset probability value.

Step S22, if the first existence probability information and the second existence probability information are simultaneously greater than the preset probability value, determining that a face image and a gesture image simultaneously exist in the target image.

In this embodiment, when both the first existence probability information and the second existence probability information are greater than the preset probability value, it is indicated that the face image and the gesture image exist in the same frame of target image at the same time. Because the first existence probability information and the second existence probability information can both contain existence probability information of a plurality of human faces or gesture images, the existence probability value of one human face image and the existence probability value of one gesture image are at least larger than the preset probability value, and the human face image and the gesture image can be judged to exist in the target image at the same time.

In addition, as shown in the application flow of fig. 3, if the probability values in the first existence probability information are all smaller than the preset probability value and/or the probability values in the second existence probability information are all smaller than the preset probability value, it is determined that the face image and the gesture image do not exist in the target image at the same time, the target image is discarded, the next frame of target image continues to be acquired through the camera device, and the above step S10 and subsequent steps are executed again.

In the embodiment, whether the face image and the gesture image exist in the target image at the same time is determined according to the first existence probability information that the face image exists and the second existence probability information that the gesture image exists, if the face image and the gesture image exist at the same time, the step S30 and the subsequent steps are continuously executed, and if the face image and the gesture image do not exist at the same time, the target image is discarded, and the step S10 and the subsequent steps are continuously executed.

Further, in another embodiment of the method for automatically answering an incoming call of the present invention, the step S30 of determining whether a preset gesture exists in the target image according to the coordinate information in the analysis result includes:

step S31, obtaining coordinate difference information of a coordinate point between the first coordinate information and the second coordinate information in the coordinate information.

In this embodiment, the coordinate difference information includes a distance difference between center points of coordinates between the first and second coordinate information, a height difference between the coordinate points, and a difference between horizontal axes, and is used to determine whether a position relationship between the face image and the gesture image satisfies a preset condition.

Step S32, determining whether the coordinate difference information meets a preset condition, and determining that a preset gesture exists in the target image when the coordinate difference information meets the preset condition.

In this embodiment, the preset condition is that a distance difference, a height difference, and a horizontal axis difference in the coordinate difference information all need to be within a certain difference range, that is, a distance and position relationship between the face image and the gesture image in the icon image needs to satisfy a certain condition, and specifically, the gesture image needs to be located at a position not far below the face image.

And when the coordinate difference information meets a preset condition, namely the position relation between the face image and the gesture image meets the condition, the preset gesture can be determined to exist in the target image, namely the user needs to automatically answer the call at the moment.

Optionally, in step S31, acquiring coordinate difference information of a coordinate point between the first coordinate information and the second coordinate information in the coordinate information, where the acquiring includes:

step S311, obtaining a distance difference between the coordinate center points of the first coordinate information and the second coordinate information.

In this embodiment, since a plurality of face images and gesture images may exist in the target image, a plurality of coordinate point sets of the face images or the gesture images may exist in the first and second coordinate information at the same time. Therefore, the first coordinate information and the second coordinate information may include one or more center point coordinates. At this time, the distance difference between each center point coordinate in the first coordinate information and each center point coordinate in the second coordinate information may be calculated.

In step S312, a height difference and a horizontal axis difference of the coordinate point between the first coordinate information and the second coordinate information are obtained.

In the present embodiment, the height difference is the up-down distance between each recognized face image and each recognized gesture image. The maximum value of the Y axis in each coordinate point set in the first coordinate information, namely the highest position of each identified face image, can be obtained; then, acquiring the maximum value of the Y axis in each coordinate point set in the second coordinate information, namely the highest position of each recognized gesture image; and then calculating the difference value of the maximum values of the Y axes in the first coordinate information and the second coordinate information to obtain the height difference value.

The horizontal axis difference is the left-right distance between each recognized face image and each recognized gesture image. The minimum value of the X axis in each coordinate point set in the first coordinate information can be obtained, namely the leftmost end of each identified face image; then acquiring the minimum value of the X axis in each coordinate point set in the second coordinate information, namely the leftmost end of each recognized gesture image; and then calculating the difference value of the minimum value of the X axis in the first coordinate information and the second coordinate information, thus obtaining the difference value of the transverse axis.

Step 313, assigning the distance difference, the height difference and the horizontal axis difference as the coordinate difference information.

In this embodiment, a center point distance difference, a height difference, and a cross axis difference of each coordinate point set in the first and second coordinate information are used as coordinate difference information, that is, a center point distance, an up-down distance, and a left-right distance of each face image and each gesture image to be recognized are used as coordinate difference information, so that a position relationship between each face image and each gesture image can be obtained, and whether a preset gesture exists in the target image is determined by determining whether the position relationship satisfies a preset condition.

In this embodiment, the distance difference, the height difference and the horizontal axis difference of the coordinate points in the first and second coordinate information are respectively obtained to be used as coordinate difference information, that is, the distance between the center points, the vertical distance and the horizontal distance of each face image and each gesture image are obtained to determine whether the position relationship between the face image and the gesture image meets the preset condition, and further determine whether the preset gesture exists in the target image.

Optionally, in step S32, the determining whether the coordinate difference information meets a preset condition includes:

step S321, determining whether the distance difference is within a first difference range, the height difference is within a second difference range, and the horizontal axis difference is within a third difference range.

In this embodiment, the first difference range, which is a range between the face and the center point of the gesture image, cannot be too close or too far, and a specific value may be defined by itself, for example, may be (10 px) to (30 px).

The second difference range is the range of the up-down distance between the human face and the gesture image, the gesture image needs to be located at a position short of the lower portion of the human face image, and specific numerical values can be defined by self. For example, the height difference value may be (-15px to-30 px), and it should be noted that, since the gesture needs to be located below the face, if the preset difference value range is negative, the height difference value at this time needs to be obtained by subtracting the maximum Y value of the face image from the maximum Y value of the gesture image.

The third difference range is the range of the left-right distance between the human face and the gesture image, and the specific numerical value can be defined by self. Since the gesture image may be located on the left and right sides of the face image, the third difference range at this time may be two, for example, the left side (10px to 20px), and the right side (-10px to-20 px).

Step S322, if the distance difference is within the first difference range, the height difference is within the second difference range, and the horizontal axis difference is within the third difference range, it is determined that the coordinate difference information meets the preset condition.

In this embodiment, the distance difference is within a first difference range, that is, the distance between the center points of the face image and the gesture image satisfies the condition; the height difference is within a second difference range, namely the up-down distance between the face image and the gesture image meets the condition that the gesture is not far below the face; the difference value of the horizontal axis is within a third difference value range, namely the left-right distance between the face image and the gesture image meets the condition. The three conditions need to be met simultaneously, and the difference information can be judged to reach the preset condition, namely the position relation between the face image and the gesture image meets the preset condition.

In the embodiment, whether the distance difference value, the height difference value and the cross axis difference value are within the difference value ranges is judged respectively, and only when the conditions are all met, the coordinate difference value information can be judged to reach the preset condition, so that the existence of the preset gesture is judged, and the accuracy of recognizing the specific gesture is effectively improved.

Further, an embodiment of the present invention further provides an apparatus for automatically answering a call, as shown in fig. 4, the apparatus for automatically answering a call of the present invention includes:

the analysis module 10 is used for analyzing a target image acquired when the incoming call access is detected to obtain an analysis result;

the judging module 20 is configured to judge whether a face image and a gesture image simultaneously exist in the target image according to the existence probability information in the analysis result;

the gesture recognition module 30 is configured to, if yes, determine whether a preset gesture exists in the target image according to the coordinate information in the analysis result;

and the answering module 40 is used for automatically answering the incoming call if the preset gesture is judged to exist.

Preferably, the apparatus for automatically answering an incoming call further comprises:

The steps implemented when the functional modules of the device for automatically answering a call run in the present invention can refer to the embodiments of the method for automatically answering a call, and are not described herein again.

Further, an embodiment of the present invention further provides an apparatus for automatically answering an incoming call, where the apparatus includes: the method for automatically answering the incoming call comprises a memory, a processor and a program for automatically answering the incoming call, wherein the program for automatically answering the incoming call is stored in the memory and can be run on the processor, the program for automatically answering the incoming call is configured to realize the steps of the method for automatically answering the incoming call provided by the embodiment, specific implementation steps can refer to the embodiment, and redundant description is omitted here.

Further, an embodiment of the present invention further provides a computer-readable storage medium, where a program for automatically answering a call is stored in the computer-readable storage medium, and when the program for automatically answering a call is executed by a processor, the steps of the method for automatically answering a call provided in the foregoing embodiment are implemented.

The device, the equipment and the computer readable storage medium provided by the embodiment of the invention are used for realizing the method for automatically answering the incoming call provided by the embodiment, and solving the problem that the accuracy is not high when the intelligent terminal identifies the gesture of the user to automatically answer the incoming call in the prior art.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims

1. A method for automatically answering a call is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of parsing the target image collected when the incoming call is detected to obtain a parsed result comprises:

3. The method of claim 2, wherein the step of determining whether the face image and the gesture image exist in the target image simultaneously according to the existence probability information in the analysis result comprises:

4. The method as claimed in claim 2, wherein the step of determining whether the preset gesture exists in the target image according to the coordinate information in the parsing result comprises:

5. The method for automatically answering an incoming call according to claim 4, wherein the step of obtaining coordinate difference information of a coordinate point between the first coordinate information and the second coordinate information in the coordinate information comprises:

6. The method as claimed in claim 5, wherein the step of determining whether the coordinate difference information meets a predetermined condition comprises:

7. The method for answering an incoming call automatically according to claim 1, wherein the method for answering an incoming call automatically is applied to a large screen terminal equipped with a camera device, and comprises, before the step of analyzing a target image acquired when an incoming call is detected to obtain an analysis result:

8. An apparatus for automatically answering an incoming call, the apparatus comprising:

9. An apparatus for automatically answering an incoming call, the apparatus comprising: a memory, a processor and a program for automatically answering a call stored on the memory and executable on the processor, the program for automatically answering a call being configured to implement the steps of the method for automatically answering a call according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a program for automatically answering a call, which when executed by a processor, performs the steps of the method for automatically answering a call according to any one of claims 1 to 7.