CN110705356B

CN110705356B - Function control method and related equipment

Info

Publication number: CN110705356B
Application number: CN201910820001.3A
Authority: CN
Inventors: 余承富
Original assignee: Shenzhen Danale Technology Co ltd
Current assignee: Shenzhen Danale Technology Co ltd
Priority date: 2019-08-31
Filing date: 2019-08-31
Publication date: 2023-12-29
Anticipated expiration: 2039-08-31
Also published as: CN110705356A

Abstract

The application discloses a function control method and related equipment, which are applied to a network camera, wherein the method comprises the following steps: acquiring first image information of a first area monitored by the network camera; performing person identification on the first image information; if a person is identified based on the first image information, identifying a gesture of the person; and if the gesture of the person is recognized to comprise a call gesture, starting a call function. By adopting the embodiment of the application, the functions of the network camera can be expanded.

Description

Function control method and related equipment

Technical Field

The present disclosure relates to the field of electronic technologies, and in particular, to a function control method and related devices.

Background

The network camera is a new product generated by combining the traditional camera with the network video technology, and can be widely applied to the fields of monitoring, remote video interaction, teaching and the like because the network camera can simply monitor and upload pictures in real time. However, the current webcam generally can only perform simple monitoring and image uploading, and has a single function.

Disclosure of Invention

The embodiment of the application provides a function control method and related equipment, which are used for expanding functions of a network camera.

In a first aspect, an embodiment of the present application provides a function control method, applied to a network camera, where the method includes:

acquiring first image information of a first area monitored by the network camera;

performing person identification on the first image information;

if a person is identified based on the first image information, identifying a gesture of the person;

and if the gesture of the person is recognized to comprise a call gesture, starting a call function.

In a second aspect, an embodiment of the present application provides a device for controlling a function, which is applied to a network camera, and the device includes:

the acquisition unit is used for acquiring the image information of the first area monitored by the network camera;

the identification unit is used for carrying out person identification on the first image information;

the identifying unit is further used for identifying the gesture of the person if the person is identified based on the first image information;

and the control unit is used for starting a call function if the gesture of the person is recognized to comprise a call gesture.

In a third aspect, an embodiment of the present application provides a network camera, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for executing steps in the method described in the first aspect of the embodiment of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to perform some or all of the steps described in the method according to the first aspect of the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program, the computer program being operable to cause a computer to perform some or all of the steps described in the method according to the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

It can be seen that in the embodiment of the present application, the network camera first acquires the first image information of the monitored first area, then performs task recognition based on the first image, if a person is recognized, recognizes the gesture of the person, and if the gesture of the person is recognized to include a call gesture, starts the call function, so that the call function is started through the gesture, and the function of the network camera is expanded.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1A is a schematic structural diagram of a communication system according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of a network camera according to an embodiment of the present application;

fig. 2A is a schematic flow chart of a function control method according to an embodiment of the present application;

FIG. 2B is a schematic diagram of a call gesture according to an embodiment of the present disclosure;

FIG. 2C is a schematic diagram of a shooting gesture provided in an embodiment of the present application;

FIG. 2D is a schematic diagram of a stop gesture provided by an embodiment of the present application;

FIG. 3 is a flow chart of another method for controlling functions according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a network camera according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a device of a network camera according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

The following will describe in detail.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1A, fig. 1A is a schematic structural diagram of a communication system provided in an embodiment of the present application, where the communication system includes a network camera and a terminal device. The network camera is rotatable and is used for collecting image information of a monitored area so as to achieve the purpose of monitoring a certain area. The network camera communicates with the terminal device in a wireless mode. The webcam may be located in a home, in a corridor, on a building of a living cell, on a street, etc. The form and number of network cameras and terminal devices shown in fig. 1A are only for example, and do not constitute a limitation of the embodiments of the present application.

Of course, the communication system described in fig. 1A may further include an audio playing device, where the audio playing device is connected to the webcam. The audio playing device is, for example, a smart speaker, a bluetooth speaker, or other devices capable of playing audio files.

In addition, the communication system shown in fig. 1A may further include a video playing device, where the video playing device is connected to the network camera. The video playing device is, for example, a smart television or other devices capable of playing video files.

In addition, the communication system described in fig. 1A may further include a service device, which is connected to the network camera. Wherein the service device is a device that provides a computing service. Since the server needs to respond to the service request and process it, the server should generally have the ability to afford the service and secure the service. The service device may be a background server or other devices.

The terminal device may include various handheld devices, vehicle mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), mobile Station (MS), terminal device (terminal device), etc. having a wireless communication function.

As shown in fig. 1B, fig. 1B is a schematic structural diagram of a network camera according to an embodiment of the present application. The network camera includes a camera module, an audio output device, an audio input device, a signal processor, a communication interface, a processor, a memory, a random access memory (Random Access Memory, RAM), and the like. The communication interface is connected with the signal processor, and the camera module, the audio output device, the audio input device, the signal processor, the RAM and the memory are all connected with the processor.

The processor is a control center of the network camera, and is connected with each part of the whole network camera by various interfaces and lines, and executes various functions of the network camera and processes data by running or executing software programs and/or modules stored in the memory and calling data stored in the memory, so that the network camera is monitored integrally.

The processor may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.

The memory is used for storing software programs and/or modules, and the processor executes the software programs and/or modules stored in the memory so as to execute various functional applications and data processing of the network camera. The memory may mainly include a memory program area and a memory data area, wherein the memory program area may store an operating system, a software program required for at least one function, and the like; the storage data area may store data created according to the use of the network camera, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The audio output device may be, for example, a speaker, or other device capable of outputting audio.

The audio input device may be, for example, a microphone, or other device capable of inputting audio.

The embodiments of the present application are described in detail below.

Referring to fig. 2A, fig. 2A is a flow chart of a function control method provided in an embodiment of the present application, which is applied to the above network camera, and specifically includes the following steps:

step 201: and the network camera acquires first image information of a first area monitored by the network camera.

The first image information is obtained by acquiring an image of the first area by the network camera. The image information may be image information (such as a plurality of pictures which are continuous at the time of acquisition), or video information (such as a video of a certain duration, such as a video of 10 s), etc.

In an implementation of the present application, before step 201, the method further includes:

the network camera determines that the current moment is in a set period. The set period may be an operating period (e.g., 9:00 am-12:00 am and 14:00 pm-18:00 pm), a non-operating period (e.g., 18:00 pm-9:00 am and 12:00 am-14:00 pm), or a specific set period (e.g., 8:00 am-11:00 am, 14:), which is not limited herein.

the network camera collects audio through the audio input equipment to obtain second audio information;

the network camera analyzes the second audio information to obtain at least one keyword;

the network camera determines that the at least one keyword comprises at least one first set keyword.

The first setting keywords include gesture detection, gesture recognition, on, start, and the like.

the network camera collects audio through the audio input equipment to obtain third audio information;

the network camera analyzes the third audio information to obtain the volume of the third audio information;

the network camera determines that the volume is greater than or equal to a third threshold.

The third threshold is, for example, 40 db, 50 db, 55 db, 58 db, or other value.

Step 202: and the network camera performs person identification on the first image information.

Wherein prior to step 202, the method further comprises:

the network camera determines the resolution ratio of the first image information; if the resolution of the first image information is smaller than or equal to a first threshold value, the network camera performs image processing on the first image information, and the resolution of the processed first image information is larger than the first threshold value.

The network camera can perform image processing on the first image information by using an image enhancement and restoration method. The first threshold is, for example, 300 pixels/foot, 350 pixels/foot, 400 pixels/foot, or other value.

Therefore, before person identification, the image information is processed to improve the resolution of the image information, so that the accuracy of person identification is improved.

Step 203: and if the person is identified based on the first image information, the network camera identifies the gesture of the person.

Step 204: if the gesture of the person is recognized to comprise a call gesture, the network camera starts a call function.

The schematic diagram of the call gesture is shown in fig. 2B.

The call function started by the network camera is a call function of the network device, or the network camera controls the first terminal device to start the call function of the first terminal device (specifically, a control instruction is sent to the first terminal device, the control quality is used for indicating to start the call function), and the first terminal device is a terminal device bound with the person.

Optionally, the method further comprises:

if the gesture of the person is recognized to comprise a shooting gesture, the network camera starts a shooting function of the network camera.

A schematic diagram of the shooting gesture is shown in fig. 2C.

Optionally, the network camera is connected with an audio playing device, and the audio playing device is currently in a playing state, and the method further includes:

and if the gesture of the person is recognized to comprise a stopping gesture, the network camera controls the audio playing device to stop playing the audio file.

Optionally, the network camera is connected with a video playing device, and the video playing device is currently in a playing state, and the method further includes:

and if the gesture of the person is recognized to comprise a stopping gesture, the network camera controls the video playing device to stop playing the video file.

A schematic diagram of the shooting gesture is shown in fig. 2D.

In an implementation manner of the present application, before the network camera starts the call function, the method further includes: and the network camera determines that the time length of the character maintaining the call gesture is greater than or equal to the first time length.

Wherein the first duration is fixed (e.g., has other values of 2s,3s,3.5s, or 5s, etc.).

Alternatively, the first time period is determined based on the current time. Specifically, a first time length corresponding to the current time is determined based on a first mapping relation between time periods and time lengths. The first mapping relationship is shown in table 1, and as shown in table 1, the durations corresponding to the different periods may be the same or different.

TABLE 1

Time period of	Duration of time
		8:00am～12:00am	5s
12:00am～18:00pm	7s
		18:00pm～22:00pm	5s
……	……

It can be seen that in the embodiment of the present application, the call function is started only when the duration of the person maintaining the call gesture exceeds a certain value, so that the probability of starting the call function by mistake can be reduced, and the performance of the network camera is further improved.

In an implementation manner of the present application, before the network camera starts the call function, the method further includes: the network camera determines that the movement mode of the hand of the person under the call gesture comprises a first movement mode.

The first moving mode may be, for example, a leftward movement, a rightward movement, an upward movement, a downward movement, a clockwise movement, a counterclockwise movement, a leftward movement and a rightward movement, a rightward movement and a leftward movement, a downward movement and an upward movement, an upward movement and a downward movement, and so on.

It can be seen that in the embodiment of the present application, the call function is started only when the movement mode of the hand of the person under the call gesture includes the first movement mode, so that the probability of starting the call function by mistake can be reduced, and the performance of the network camera is further improved.

In an implementation manner of the present application, the person recognition performed by the network camera on the first image information includes:

the network camera performs human-shape recognition and/or face recognition on the first image information;

if the person shape and/or the face are/is identified, the network camera determines that the person is identified based on the first image information.

Optionally, the method further comprises:

if no person shape and/or face is identified, the webcam performs step 201.

Optionally, the first image information includes video information, the video information includes N frames of images, N is an integer greater than 1, and the network camera performs human recognition and/or face recognition on the first image information, including:

the network camera performs human shape recognition and/or face recognition on each frame of image included in the video information;

if all the continuous M frames of images identify the human shape and/or the human face, the network camera determines that the human shape or the human face is identified, M is an integer greater than 1, and M is smaller than or equal to N;

If no continuous M frames of images all recognize the human shape and/or the human face, the network camera determines that the human shape or the human face is not recognized.

It can be seen that in the embodiment of the application, whether the person exists is judged by identifying the face or the shape of the person, so that the identification data is less, and the speed of identifying the person is improved.

In an implementation manner of the present application, before the network camera recognizes the gesture of the person, the method further includes:

the network camera obtains second image information of the person identified, and the first image information comprises the second image information;

the network camera carries out face recognition on the second image information;

the network camera determines that the face of the person is recognized based on the second image information.

Optionally, the method further comprises:

if the face of the person is not recognized based on the second image information, the network camera performs step 201.

Optionally, the network camera acquires second image information identifying the person, including:

the network camera selects K frame images from the M frame images, wherein K is an integer greater than 1, and is smaller than or equal to M;

the network camera performs image interception on each frame of image in the K frames of images to obtain K first intercepted images, wherein each first intercepted image comprises a person shape of the person and/or a face of the person, and the K first intercepted images are in one-to-one correspondence with the K frames of images;

And the network camera takes the K first intercepted images as the second image information.

Optionally, the network camera performs face recognition on the second image information, including:

the network camera carries out face recognition on each first intercepted image in the K first intercepted images;

if all the continuous H first intercepted images recognize the front face, the network camera determines that the front face of the person is recognized, H is an integer greater than 1, and H is smaller than or equal to K;

if the front faces of the first intercepted images are not recognized, the network camera determines that the front faces of the people are not recognized.

Optionally, the front face recognition is performed on the first intercepted image i by the network camera, including:

the network camera analyzes the first intercepted image i to obtain X personal face characteristics, wherein X is formally larger than 1, and the first intercepted image i is any one of the K first intercepted images;

if the X face features comprise Y face features, the network camera determines that the face is recognized based on the first intercepted image i, wherein Y is a positive integer;

and if the X face features do not comprise Y face features, the network camera determines that the face is not recognized based on the first intercepted image i.

Among them, the frontal face features are, for example, 2 eyes, two ears, two nose wings, two eyebrows, and the like.

It can be seen that in the embodiment of the present application, after recognizing that a person exists, whether the person has a front face is further determined, so that the probability of mistakenly starting the call function can be reduced, and the performance of the network camera is further improved.

In an implementation manner of the present application, the network camera recognizes a gesture of the person, including:

the network camera obtains third image information of the front face of the person, wherein the second image information comprises the third image information;

and the network camera recognizes the gesture of the person based on the third image information.

Optionally, the network camera obtains third image information identifying the front face of the person, including:

the network camera selects T first intercepted images from the H first intercepted images, wherein T is an integer greater than 1, and T is smaller than or equal to H;

the network camera performs image interception on each first intercepted image in the T first intercepted images to obtain T second intercepted images, each second intercepted image comprises a gesture of the person, and the T first intercepted images are in one-to-one correspondence with the T second intercepted images;

And the network camera takes the T second intercepted images as the second image information.

Optionally, the network camera recognizes the gesture of the person based on the third image information, including:

the network camera matches each second intercepted image in the T second intercepted images with a set call gesture image;

if the continuous W second intercepted images are matched with the set call gesture image, the network camera determines that the gesture of the person is recognized to comprise a call gesture, and W is an integer greater than 1;

if the discontinuous W second intercepted images are matched with the set call gesture image, the network camera determines that the gesture of the person is not recognized and comprises a call gesture.

The call gesture image is set as shown in fig. 2B, for example.

It can be seen that in the embodiment of the application, the intercepted image information is used for gesture recognition, so that recognition data are less, and the speed of character recognition is improved.

In an implementation manner of the present application, the call function that is started includes a call function of the network camera, and after the network camera starts the call function, the method further includes:

the network camera determines R communication objects, wherein R is a positive integer;

The network camera outputs the R communication objects through the audio output equipment;

the network camera collects audio through the audio input equipment to obtain first audio information;

the network camera analyzes the first audio information to obtain a first communication object;

if the R communication objects comprise the first communication object, the network camera calls the first communication object through the call function of the network camera;

and if the R communication objects do not comprise the first communication object, outputting a call error prompt by the network camera through the audio output equipment.

Optionally, the network camera determines R communication objects, including:

the network camera determines the identity of the person based on the second image information;

the network camera determines L communication objects allowing the person to communicate through the network camera based on the identity of the person, wherein L is greater than or equal to R;

and the network camera selects R communication objects from the L communication objects, and the distance between each communication object in the R communication objects and the communication moment of the person last time is smaller than or equal to a second threshold value.

Optionally, the network camera determines R communication objects, including:

the network camera determines second terminal equipment based on the identity of the person, and the second terminal equipment is bound with the person;

the network camera sends an information acquisition request to the second terminal equipment, wherein the information acquisition request is used for requesting to acquire a communication object frequently contacted with the person;

and the network camera receives an information acquisition response sent by the first terminal device for the information acquisition request, wherein the information acquisition response carries R communication objects.

Optionally, the network camera determines R communication objects, including:

the method comprises the steps that a network camera obtains A objects, wherein the A objects are objects allowing access to the network camera, and the A is larger than or equal to R;

and the network camera selects R objects from the A objects as R communication objects.

Optionally, the second image information includes the K first captured images, and the network camera determines the identity of the person based on the second image information, including:

the network camera matches each first intercepted image in the K first intercepted images with a plurality of face templates respectively, and each face template corresponds to an identity;

If D first intercepted images in the K first intercepted images are matched with the face template i, the network camera takes the identity corresponding to the face template i as the identity of the person, the face templates comprise the face template i, D is a positive integer, and D is smaller than or equal to K.

It can be seen that in the embodiment of the application, a plurality of communication objects are output through the audio output device, then one communication object is obtained through audio analysis acquired through the audio input device, and then the communication object is called, so that interaction between the person and the network camera is simplified, and convenience is improved.

In an implementation manner of the present application, after the network camera calls the first communication object through a call function of the network camera, the method further includes:

and in the process of carrying out video call between the person and the first communication object, the network camera carries out target tracking shooting on the person.

Optionally, the network camera performs target tracking shooting on the person, including: and the network camera performs target tracking shooting on the person by using a tracking algorithm.

The network camera can focus the person while tracking the person, so that the tracked person is photographed more clearly in a photographing picture.

The tracking algorithm may be a centroid tracking algorithm (centrod), a correlation tracking algorithm (correlation), an edge tracking algorithm (edge), or the like, which is not limited in any way in the embodiment of the present application.

Optionally, the method further comprises:

in the process of carrying out target tracking shooting on the person, a network camera recognizes gestures of the person in real time;

and if the gesture of the person is recognized to comprise a stopping gesture, stopping the communication between the person and the first communication object by the network camera.

Optionally, the method further comprises:

during the process of communicating the person with the first communication object, the network camera monitors the communication content of the person and the first communication object in real time;

and if at least one second set keyword is monitored, stopping the communication between the person and the first communication object by the network camera.

The second setting keyword is, for example, a password, money, jewelry, a bank card, a file, a security cabinet, a hard disk, a USB flash disk, and the like.

It can be seen that in the embodiment of the present application, in the process of performing a video call between a person and a first communication object, target tracking shooting is performed on the person, so as to improve the effect of the video call, and further improve the performance of the webcam.

It should be noted that, in the embodiment of the present application, the recognition operations (such as human recognition, face recognition, and gesture recognition) performed by the network camera may be all processed by the service device connected to the network camera, and the recognition operations performed by the service device are similar to the specific implementation manners of the recognition operations performed by the network camera described in the present application, which will not be described herein.

Referring to fig. 3, fig. 3 is a schematic flow chart of another function control method according to the embodiment of the present application, which is consistent with the embodiment shown in fig. 2A, and specifically includes the following steps:

step 301: and the network camera acquires first image information of a first area monitored by the network camera.

Step 302: and the network camera performs human-shape recognition and/or face recognition on the first image information.

If a person shape and/or face is identified, step 303 is performed.

If no figures and/or faces are identified, step 301 is performed.

Step 303: the network camera determines that a person is identified based on the first image information.

Step 304: the network camera acquires second image information identifying the person, and the first image information comprises the second image information.

Step 305: and the network camera performs face recognition on the second image information.

If a positive face is identified, step 306 is performed.

If no face is recognized, step 301 is performed.

Step 306: the network camera determines that the face of the person is recognized based on the second image information.

Step 307: the network camera acquires third image information of the front face of the person, wherein the third image information is recognized by the network camera, and the second image information comprises the third image information.

Step 308: and the network camera recognizes the gesture of the person based on the third image information.

If the gesture of the person is recognized to include a talk gesture, step 309 is performed.

If a gesture is recognized, step 301 is performed.

Step 309: the network camera starts a call function, and the call function started comprises the call function of the network camera.

Step 310: the network camera determines R communication objects, wherein R is a positive integer.

Step 311: and the network camera outputs the R communication objects through the audio output equipment.

Step 312: and the network camera performs audio acquisition through the audio input equipment to obtain first audio information.

Step 313: and the network camera analyzes the first audio information to obtain a first communication object.

If the R communication objects include the first communication object, then step 314 is performed.

If the R communication objects do not include the first communication object, then step 315 is performed.

Step 314: the network camera calls the first communication object through the call function of the network camera.

Step 315: and the network camera outputs a call error prompt through the audio output equipment.

It should be noted that, the specific implementation process of the embodiment of the present application may refer to the specific implementation process described in the foregoing method embodiment, which is not described herein.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a webcam provided in the embodiment of the present application, as shown in fig. 2A and 3, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the programs include instructions for executing the following steps:

performing person identification on the first image information;

In one implementation of the present application, before the call function is initiated, the program includes instructions for further performing the steps of:

and determining that the time length of the character maintaining the call gesture is greater than or equal to the first time length.

determining that the movement mode of the hand of the person under the call gesture includes a first movement mode.

In an implementation manner of the present application, in performing person recognition on the first image information, the program includes instructions specifically configured to:

Performing human shape recognition and/or face recognition on the first image information;

if the person shape and/or the face are identified, it is determined that the person is identified based on the first image information.

In one implementation of the present application, the program includes instructions for, prior to recognizing the gesture of the person, further performing the steps of:

acquiring second image information identifying the person, wherein the first image information comprises the second image information;

performing face recognition on the second image information;

determining a positive face of the person to be identified based on the second image information.

In an implementation of the present application, in identifying the gesture of the person, the program includes instructions specifically for:

acquiring third image information of the front face of the person, wherein the second image information comprises the third image information;

a gesture of the person is identified based on the third image information.

In an implementation manner of the present application, the network camera includes an audio output device and an audio input device, the call function that is started includes a call function of the network camera, and after the call function is started, the program includes instructions for further executing the following steps:

Determining R communication objects, wherein R is a positive integer;

outputting the R communication objects through the audio output device;

acquiring audio through the audio input equipment to obtain first audio information;

analyzing the first audio information to obtain a first communication object;

if the R communication objects comprise the first communication object, calling the first communication object through the call function of the network camera;

and if the R communication objects do not comprise the first communication object, outputting a call error prompt through the audio output equipment.

It should be noted that, the specific implementation process of this embodiment may refer to the specific implementation process described in the foregoing method embodiment, which is not described herein.

Referring to fig. 5, fig. 5 is a functional control device provided in an embodiment of the present application, which is applied to the above-mentioned network camera, and the device includes:

an obtaining unit 501, configured to obtain image information of a first area monitored by the network camera;

an identifying unit 502, configured to perform person identification on the first image information;

the identifying unit 502 is further configured to identify a gesture of a person if the person is identified based on the first image information;

The control unit 503 is configured to initiate a call function if it is recognized that the gesture of the person includes a call gesture.

In an implementation manner of the present application, the apparatus further includes:

a first determining unit 504, configured to determine, before the control unit 503 starts the call function, that a time period for the person to maintain the call gesture is greater than or equal to a first time period.

a second determining unit 505, configured to determine, before the control unit 503 starts the call function, that the movement mode of the hand of the person under the call gesture includes a first movement mode.

In an implementation manner of the present application, in performing person recognition on the first image information, the recognition unit 502 is specifically configured to:

In an implementation manner of the present application, before the recognition unit 502 recognizes the gesture of the person, the recognition unit 502 is further configured to obtain second image information that recognizes the person, where the first image information includes the second image information; performing face recognition on the second image information; determining a positive face of the person to be identified based on the second image information.

In an implementation manner of the present application, in identifying the gesture of the person, the identifying unit 502 is specifically configured to:

acquiring third image information of the front face of the person, wherein the second image information comprises the third image information; a gesture of the person is identified based on the third image information.

In an implementation manner of the present application, the network camera includes an audio output device and an audio input device, the call function that is started includes a call function of the network camera, and the apparatus further includes:

a third determining unit 506, configured to determine R communication objects after the call function is started, where R is a positive integer;

An output unit 507 for outputting the R communication objects;

the input unit 508 is configured to perform audio acquisition to obtain first audio information;

a parsing unit 509, configured to parse the first audio information to obtain a first communication object;

a calling unit 510, configured to call the first communication object through a call function of the network camera if the R communication objects include the first communication object;

and the output unit 507 is further configured to output a call error prompt through the audio output device if the R communication objects do not include the first communication object.

It should be noted that, the acquiring unit 501, the identifying unit 502, the controlling unit 503, the first determining unit 504, the second determining unit 505, the third determining unit 506, and the analyzing unit 509 may be implemented by a processor, the output unit 507 may be implemented by an audio output device, the input unit 508 may be implemented by an audio input device, and the calling unit 510 may be implemented by a communication interface.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program for electronic data exchange, and the computer program causes a computer to execute part or all of the steps described by the network camera in the embodiment of the method.

Embodiments of the present application also provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described by a webcam in the above method. The computer program product may be a software installation package.

The steps of a method or algorithm described in the embodiments of the present application may be implemented in hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in an access network device, a target network device, or a core network device. It is of course also possible that the processor and the storage medium reside as discrete components in an access network device, a target network device, or a core network device.

Those of skill in the art will appreciate that in one or more of the above examples, the functions described in the embodiments of the present application may be implemented, in whole or in part, in software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a digital video disc (Digital Video Disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The foregoing embodiments have been provided for the purpose of illustrating the embodiments of the present application in further detail, and it should be understood that the foregoing embodiments are merely illustrative of the embodiments of the present application and are not intended to limit the scope of the embodiments of the present application, and any modifications, equivalents, improvements, etc. made on the basis of the technical solutions of the embodiments of the present application are included in the scope of the embodiments of the present application.

Claims

1. A function control method, characterized by being applied to a network camera, the method comprising:

performing human shape recognition or face recognition on the first image information;

if the human shape or the human face is identified, determining that the human figure is identified based on the first image information;

performing face recognition on the second image information;

determining a positive face identified to the person based on the second image information;

Identifying a gesture of the person based on the third image information;

2. The method of claim 1, wherein prior to the initiating a call function, the method further comprises: and determining that the time length of the character maintaining the call gesture is greater than or equal to the first time length.

3. The method according to claim 1 or 2, wherein before the call function is started, the method further comprises: determining that the movement mode of the hand of the person under the call gesture includes a first movement mode.

4. The method according to claim 1 or 2, wherein the network camera comprises an audio output device and an audio input device, the initiated talk function comprises a talk function of the network camera, and after the initiating of the talk function, the method further comprises:

determining R communication objects, wherein R is a positive integer;

outputting the R communication objects through the audio output device;

analyzing the first audio information to obtain a first communication object;

5. A device for controlling functions, which is applied to a network camera, the device comprising:

the acquisition unit is used for acquiring first image information of a first area monitored by the network camera;

the identification unit is used for performing human shape identification or face identification on the first image information; if the human shape or the human face is identified, determining that the human figure is identified based on the first image information;

the acquisition unit is further used for acquiring second image information identifying the person, and the first image information comprises the second image information;

the identification unit: the method is also used for carrying out face recognition on the second image information; determining a positive face identified to the person based on the second image information;

the acquiring unit is further configured to acquire third image information identifying a front face of the person, where the second image information includes the third image information; the recognition unit recognizes a gesture of the person based on the third image information;

6. A network camera comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-4.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, wherein the computer program is processed to perform the method according to any of claims 1-4.