CN110839128B

CN110839128B - Photographing behavior detection method and device and storage medium

Info

Publication number: CN110839128B
Application number: CN201810936534.3A
Authority: CN
Inventors: 陈锡; 任烨; 童俊艳
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2021-04-27
Anticipated expiration: 2038-08-16
Also published as: CN110839128A

Abstract

The invention discloses a photographing behavior detection method, a photographing behavior detection device and a storage medium, and belongs to the technical field of data processing. The method comprises the following steps: the method comprises the steps of obtaining a video image of a target scene, and determining an image area of a target mobile terminal included in the video image. And calling a target positioning model, inputting the image area into the target positioning model, and outputting a photographing gesture label of the target mobile terminal, wherein the target positioning model is used for determining the photographing gesture of the mobile terminal according to the image area of any mobile terminal, and determining whether the target mobile terminal has a photographing behavior based on the outputted photographing gesture label. According to the embodiment of the invention, the photographing behavior detection does not need to be carried out according to the flash lamp, namely, whether the mobile terminal starts the flash lamp or not can be detected whether the target mobile terminal has the photographing behavior in the target scene or not.

Description

Photographing behavior detection method and device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a photographing behavior detection method, a photographing behavior detection device and a storage medium.

Background

In daily life, some application scenarios such as museums, exhibition halls, etc. do not usually allow taking pictures. However, as the functions of mobile terminals are expanding, some users may steal the mobile terminals to take pictures. Therefore, in this type of application scenario, there is a need for detection of a photographing behavior.

At present, a camera device can be erected in the application scene to detect whether a mobile terminal has a photographing behavior in the application scene through the camera device. For example, whether a flash flashes or not can be detected through the camera device, and when the flash is detected to be turned off, it is determined that the mobile terminal has a photographing behavior.

However, in the above implementation, only when the user uses the flash light, it can be detected that the mobile terminal has the photographing behavior, and if the flash light is not turned on during photographing, it cannot be detected that the mobile terminal has the photographing behavior, and thus it is seen that the detection result of the photographing behavior is inaccurate.

Disclosure of Invention

The embodiment of the invention provides a photographing behavior detection method, a photographing behavior detection device and a storage medium, which can solve the problem of inaccurate detection result of photographing behavior in the related art. The technical scheme is as follows:

in a first aspect, a photographing behavior detection method is provided, and the method includes:

acquiring a video image of a target scene;

determining an image area of a target mobile terminal included in the video image;

calling a target positioning model, inputting the image area into the target positioning model, and outputting a photographing gesture label of the target mobile terminal, wherein the target positioning model is used for determining a photographing gesture of the mobile terminal according to the image area of any mobile terminal;

and determining whether the target mobile terminal has a photographing behavior or not based on the outputted photographing gesture tag.

Optionally, the determining an image area of the target mobile terminal included in the video image includes:

calling a target detection model, inputting the video image into the target detection model, and outputting the position information of a target mobile terminal included in the video image, wherein the target detection model is used for detecting the position information of the mobile terminal included in the video image according to any video image;

and determining an image area of the target mobile terminal from the video image based on the position information of the target mobile terminal.

Optionally, the determining an image area of the target mobile terminal from the video image based on the location information of the target mobile terminal includes:

determining a target area in the video image, wherein the target area is an area obtained by enlarging a position area corresponding to the position information of the target mobile terminal;

and cutting the target area from the video image to obtain the image area of the target mobile terminal.

Optionally, the target detection model is obtained by training a detection model to be trained based on a plurality of video image samples and position information of the mobile terminal in each video image sample.

Optionally, the target positioning model is obtained by training a positioning model to be trained based on a plurality of image area samples and the photographing gesture label of each image area sample.

Optionally, the determining whether the target mobile terminal has a photographing behavior based on the outputted photographing gesture tag includes:

and when the output photographing gesture tag belongs to the photographing behavior tag, determining that the target mobile terminal has the photographing behavior.

Optionally, after determining that the target mobile terminal has the photographing behavior, the method further includes:

determining the photographing times of the target mobile terminal, wherein the photographing times refer to the photographing times of the target mobile terminal within a preset time from the current time;

and when the shooting frequency of the target mobile terminal reaches the threshold value of the shooting frequency, giving an alarm prompt.

Optionally, before the acquiring the video image, the method further includes:

detecting whether the video image is an effective image, wherein the effective image is an image shot when the camera device is not shielded and is not moved;

when the video image is an effective image, executing the operation of acquiring the video image; and when the video image is not an effective image, carrying out abnormal detection alarm prompt.

In a second aspect, there is provided a photographing behavior detection apparatus, the apparatus comprising:

the image acquisition module is used for acquiring a video image of a target scene;

the area determining module is used for determining an image area of the target mobile terminal included in the video image;

the calling module is used for calling a target positioning model, inputting the image area into the target positioning model and outputting a photographing gesture label of the target mobile terminal, and the target positioning model is used for determining a photographing gesture of the mobile terminal according to the image area of any mobile terminal;

and the behavior determining module is used for determining whether the target mobile terminal has the photographing behavior or not based on the outputted photographing gesture tag.

Optionally, the region determining module is configured to:

Optionally, the behavior determination module is configured to:

Optionally, the apparatus further comprises:

the number determining module is used for determining the number of times of photographing of the target mobile terminal, wherein the number of times of photographing is the number of times of photographing of the target mobile terminal within a preset time length from the current time;

and the alarm module is used for giving an alarm prompt when the shooting times of the target mobile terminal reach the shooting time threshold value.

Optionally, the apparatus further comprises:

the detection module is used for detecting whether the video image is an effective image, wherein the effective image is an image shot when the camera device is not shielded and is not moved;

the image acquisition module is further configured to execute the operation of acquiring the video image when the video image is an effective image; and when the video image is not an effective image, carrying out abnormal detection alarm prompt.

In a third aspect, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores instructions for executing, by a processor, the photographing behavior detection method according to the first aspect.

In a fourth aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the photo-activity detection method of the first aspect described above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

and acquiring a video image of a target scene, wherein the target scene is a scene to be monitored. And determining an image area of the target mobile terminal in the video image, calling a target positioning model, and inputting the image area into the target positioning model. The target positioning model can determine the photographing gesture of the mobile terminal according to the image area of any mobile terminal, so that the photographing gesture tag of the target mobile terminal can be output through the target positioning model. Therefore, whether the target mobile terminal has the photographing behavior or not can be detected according to the outputted photographing gesture label. According to the embodiment of the invention, the photographing behavior detection does not need to be carried out according to the flash lamp, namely, whether the target mobile terminal has the photographing behavior in the target scene can be detected no matter whether the target mobile terminal starts the flash lamp or not.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a photographing behavior detection method according to an exemplary embodiment.

Fig. 2 is a schematic structural diagram illustrating a photographing behavior detection apparatus according to an exemplary embodiment.

Fig. 3 is a schematic structural diagram illustrating a photographing behavior detection apparatus according to another exemplary embodiment.

Fig. 4 is a schematic structural diagram illustrating a photographing behavior detection apparatus according to another exemplary embodiment.

Fig. 5 is a block diagram illustrating a terminal 700 according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Before describing the method for detecting a photographing behavior provided by the embodiment of the present invention in detail, an application scenario and an implementation environment related to the embodiment of the present invention are briefly described.

First, a brief description is given of an application scenario related to the embodiment of the present invention.

In daily life, some scenes prohibit taking a picture. However, since the user usually carries the mobile terminal with him or her and the mobile terminal has a photographing function, it is inevitable to take a picture that some users steal using the mobile terminal. At present, whether a target mobile terminal has a photographing behavior can be detected only by detecting whether a person uses the flash off lamp, but if a user uses the target mobile terminal to photograph, the flash off lamp is not turned on, and the photographing behavior of the user cannot be detected. Therefore, the embodiment of the invention provides a photographing behavior detection method, which determines a photographing gesture tag of a target mobile terminal through a target positioning model based on an image area of the target mobile terminal, so as to detect whether the target mobile terminal has a photographing behavior according to the determined photographing gesture tag. Therefore, whether the target mobile terminal has the photographing behavior or not in the target scene can be detected no matter whether the target mobile terminal is started with the flash lamp or not. The specific implementation process is shown in the following embodiments.

Next, a brief description is given of an implementation environment related to the embodiments of the present invention.

The photographing behavior detection method provided by the embodiment of the invention can be executed by a terminal, and in a possible implementation manner, the terminal can be configured with a camera device, or the terminal can be connected with an external camera device through a data line, a Bluetooth or other connection equipment so as to acquire a video image of a target scene to be monitored through the camera device. In some embodiments, the terminal may be a mobile phone, a tablet computer, a computer, and the like, which is not limited in the embodiments of the present invention.

After the application scenarios and the implementation environments related to the embodiment of the present invention are described, the method for detecting a photographing behavior provided by the embodiment of the present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart illustrating a photographing behavior detection method according to an exemplary embodiment, where the photographing behavior detection method can be applied in the above implementation environment, and the photographing behavior detection method can include the following implementation steps:

step 101: a video image of a target scene is acquired.

The target scene is generally a scene to be monitored, which does not allow photographing, for example, the target scene may include, but is not limited to, a museum, an exhibition hall, and a financial field scene sensitive to information. In some embodiments, a camera may be installed in the target scene, and the terminal acquires a video image of the target scene through the camera.

Further, before the video image of the target scene is acquired, whether the video image is an effective image is detected, the effective image is an image shot by the camera device when the camera device is not shielded and is not moved, when the video image is the effective image, the video image is acquired, and when the video image is not the effective image, abnormal detection alarm prompt is performed.

In order to avoid the detection of the photographing behavior, some users may shield or move the camera device monitoring the target scene to other directions, so that the terminal cannot successfully perform the detection of the photographing behavior. For this reason, before acquiring a video image of a target scene, the terminal may detect whether the video image is a valid image, that is, whether the camera is blocked or moved. And when the video image is determined to be the effective image, acquiring the video image and continuously executing subsequent operations. On the contrary, when the video image is not an effective image, it indicates that the camera device may be blocked or moved, and at this time, an abnormal alarm prompt may be performed, for example, a voice "camera device shooting abnormal" may be played, so that the worker may find the detection abnormality in time.

In a possible implementation manner, the specific implementation of detecting whether the video image is a valid image may include: comparing the pixel value of the video image with a preset video image, determining that the video image is an effective image when the difference value between the pixel value of the video image and the pixel value of the preset video image is less than or equal to a pixel value threshold, and determining that the video image is not the effective image when the difference value between the pixel value of the video image and the pixel value of the preset video image is greater than the pixel value threshold.

That is, in order to detect whether a video image is valid, a preset video image may be stored in the terminal in advance, where the preset video image is a video image obtained by shooting a target scene when the camera device is not blocked and is not moved. And in the shooting behavior detection process, comparing the pixel value of the video image of the target scene with the pixel value of the preset video image. If the difference between the pixel value of the video image and the pixel value of the preset video image is less than or equal to the pixel value threshold, it is indicated that the difference between the video image and the preset video image is not large, and therefore, the video image can be determined to be an effective image.

On the contrary, when the difference between the pixel value of the video image and the pixel value of the preset video image is greater than the pixel value threshold, it is indicated that the difference between the video image and the preset video image is large, and at this time, it may be determined that the video image is not an effective image. For example, when the camera is blocked, the captured video image of the target scene may be a completely black image, and at this time, the difference between the completely black video image and the preset video image is large, so that it can be determined that the video image is not a valid image.

The pixel value threshold may be set by a user according to actual requirements in a self-defined manner, or may be set by the terminal as a default, which is not limited in the embodiment of the present invention.

Step 102: and determining an image area of the target mobile terminal included in the video image.

In one possible implementation manner, determining a specific implementation of the image area of the target mobile terminal included in the video image may include: and calling a target detection model, inputting the video image into the target detection model, and outputting the position information of the target mobile terminal included in the video image, wherein the target detection model is used for detecting the position information of the target mobile terminal included in the video image according to any video image, and determining the image area of the target mobile terminal from the video image based on the position information of the target mobile terminal.

The position information of the target mobile terminal may include a size of the target mobile terminal and a position coordinate of the target mobile terminal in the video image.

In some embodiments, the structure of the object detection model may include an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. In this case, after the terminal inputs the video image from the input layer into the object detection model, data processing is performed through the convolutional layer, the pooling layer, and the full link layer included in the object detection model, and the position information of the target mobile terminal included in the video image is output on the output layer.

In a possible implementation manner, the specific implementation of determining the image area of the target mobile terminal from the video image based on the location information of the target mobile terminal includes: and determining a target area in the video image, wherein the target area is an enlarged area of a position area corresponding to the position information of the target mobile terminal, and cutting the target area from the video image to obtain an image area of the target mobile terminal.

It is understood that, in order to accurately position the photographing gesture of the target mobile terminal, a location area of the target mobile terminal in the video image may be determined according to the location information of the target mobile terminal, and then an enlarged area of the location area may be determined, so that the enlarged area may include a gesture portion, a photographing rod portion for erecting the target mobile terminal, or other user limb portions, and a target area is obtained, so that the photographing gesture of the target mobile terminal may be determined according to the target area.

In a possible implementation manner, the terminal determines the target area, and the target area can be cut out from the video image, so as to obtain the image area of the target mobile terminal.

Or, in another possible implementation manner, after the terminal determines the target area, the target area may be circled from the video image to obtain an image area of the target mobile terminal, which is not limited in the embodiment of the present invention.

Further, the target detection model is obtained by training the detection model to be trained based on the plurality of video image samples and the position information of the mobile terminal in each video image sample.

That is, before the target detection model is called, the plurality of video image samples and the position information of the mobile terminal in each video image sample may be obtained, and then the detection model to be trained is trained based on the plurality of video image samples and the position information of the mobile terminal in each video image sample to obtain the target detection model.

In some embodiments, the detection model to be trained may be a convolutional neural network model, wherein the structure of the convolutional neural network model may include an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. Before the target detection model is called, the terminal inputs the video image samples and the position information of the mobile terminal in each video image sample into the convolutional neural network model for deep learning and training to obtain the target detection model, and thus, the obtained target detection model can detect the position information of the mobile terminal included in the video image according to any video image.

It is worth mentioning that the position information of the target mobile terminal is detected by adopting the target detection model obtained after deep learning, so that the reliability and the accuracy of the position information detection are ensured.

Step 103: and calling a target positioning model, inputting the image area into the target positioning model, and outputting a photographing gesture label of the target mobile terminal, wherein the target positioning model is used for determining the photographing gesture of the mobile terminal according to the image area of any mobile terminal.

The image region may include a plurality of pixels, and each of the plurality of pixels corresponds to a pixel value, for example, the pixel value may be represented by RGB, and the value range of RGB may be 0 to 255. In a possible implementation manner, the target location model may determine the photographing gesture tag of the target mobile terminal based on a pixel value of a pixel point included in the image region.

Further, the target positioning model may also perform data format conversion processing on pixel values of an image area of the target mobile terminal. For example, if the pixel values of each image region are represented by RGB, the pixel values of each image region may be formatted to include three sets of data, namely, R, G and B sets, each of which is used to store R, G and B values of the pixel values. In summary, when implemented, the pixel values of each image area are in a format that satisfies the data format required by the target positioning model.

In some embodiments, the structure of the object-locating model may include an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. In this case, after the terminal inputs the image area from the input layer to the object location model, data processing is performed through the convolution layer, the pooling layer and the full connection layer included in the object location model, and the photographing gesture tag of the object mobile terminal is output on the output layer.

Further, if the data format conversion processing is also performed on the pixel value of the image area, the target positioning model may determine the photographing gesture tag of the target mobile terminal based on the pixel value of the image area after the data format conversion processing.

In some embodiments, the gesture tag of the target mobile terminal output by the target positioning model may include, but is not limited to, "hold the target mobile terminal without any operation", "play the target mobile terminal", "make a call", "hang the target mobile terminal on the neck", "hold the target mobile terminal to take a picture", "the target mobile terminal takes a picture with the display screen facing the target on the picture taking bar", "the target mobile terminal takes a picture with the lens facing the target on the picture taking bar", and "the target mobile terminal does not have any operation on the picture taking bar".

Further, the target positioning model is obtained by training the positioning model to be trained based on the plurality of image area samples and the photographing gesture label of each image area sample.

That is, before the target positioning model is called, a plurality of image area samples and the photographing gesture label of each image area sample are obtained, and the positioning model to be trained is trained based on the plurality of image area samples and the photographing gesture label of each image area sample, so as to obtain the target positioning model.

In some embodiments, the localization model to be trained may be a convolutional neural network model whose structure may include an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer. The terminal can input the multiple image area samples and the photographing gesture label of each image area sample into the convolutional neural network model for deep learning and training, so as to obtain a target positioning model capable of classifying photographing gestures. Therefore, the obtained target positioning model can determine the photographing gesture of the mobile terminal according to the pixel value of the image area of any mobile terminal.

It should be noted that, the above description is only given by taking the convolutional neural network model as an example, in some embodiments, the positioning model to be trained may also adopt other multi-label classification networks, and the embodiment of the present invention does not limit this.

It is worth mentioning that the target positioning model is determined by adopting the multi-tag classification network, so that various photographing gesture tags corresponding to the target mobile terminal can be finely positioned at a lower cost, and the expandability is enhanced.

In addition, the embodiment of the invention is based on two models obtained after deep learning in sequence, realizes the detection method of the photographing behavior, ensures that the whole realization system has certain anti-interference capability, and enhances the practicability.

Step 104: and determining whether the target mobile terminal has the photographing behavior or not based on the outputted photographing gesture tag.

In a possible implementation manner, the specific implementation of determining whether the target mobile terminal has the photographing behavior based on the outputted photographing gesture tag may include: and when the output photographing gesture tag belongs to the photographing behavior tag, determining that the target mobile terminal has the photographing behavior.

The terminal may have a plurality of photographing behavior tags stored therein in advance, for example, the photographing behavior tags may include but are not limited to "hold the target mobile terminal to photograph", "the target mobile terminal photographs the target on the photographing bar with the display screen facing the target", and "the target mobile terminal photographs the target on the photographing bar with the lens facing the target".

Therefore, the terminal inquires whether the photographing gesture tag output by the target positioning model belongs to the plurality of photographing behavior tags, if yes, the corresponding target mobile terminal can be determined to have the photographing behavior, and if not, the non-target mobile terminal can be determined to have the photographing behavior.

Further, after the target mobile terminal is determined to have the photographing behavior, the photographing times of the target mobile terminal are determined, the photographing times refer to the photographing times of the target mobile terminal within a preset time length from the current time, and when the photographing times of the target mobile terminal reach a photographing time threshold value, an alarm prompt is given.

The preset duration may be set by a user according to actual needs in a self-defined manner, or may be set by the default of the terminal, which is not limited in the embodiment of the present invention. For example, the preset time duration may be set to 30 minutes, and at this time, if it is detected that the number of times of photographing of the target mobile terminal reaches the threshold number of times of photographing within half an hour from the current time, an alarm is given.

The threshold value of the number of times of taking a picture may be set by a user according to actual needs in a self-defined manner, or may be set by the terminal as a default, which is not limited in the embodiment of the present invention. For example, the threshold value of the number of times of photographing may be set to 1, or may be set to any integer greater than 1.

It is worth mentioning that if the threshold of the number of times of photographing is set to be an integer greater than 1, the target mobile terminal is subjected to behavior tracking, and only when it is determined that the target mobile terminal has the photographing behavior for multiple times, the alarm is prompted, so that a credible alarm signal can be given, false alarm caused by one-time judgment error is prevented, and the effectiveness of alarm is improved.

In some embodiments, when the video image includes a plurality of target mobile terminals, the terminal may ID-number each target mobile terminal included in the video image, generate a tracking list based on the ID-number, and maintain the tracking list to perform behavior tracking for each target mobile terminal. In the list maintenance process, the terminal counts the photographing times of the target mobile terminal corresponding to each ID, and if the photographing times within a preset time from the current time reach the photographing time threshold, an alarm prompt is performed.

Further, the terminal can adopt various modes to give an alarm, for example, voice 'please civilized visit, forbid taking pictures' can be broadcasted. Further, the terminal can also display information of the target mobile terminal for illegal photographing on the display screen, so that staff can conveniently conduct on-the-spot deterrent management. The information may include, among other things, location information.

In the embodiment of the invention, a video image of a target scene is acquired, wherein the target scene is a scene to be monitored. And determining an image area of the target mobile terminal in the video image, calling a target positioning model, and inputting the image area into the target positioning model. The target positioning model can determine the photographing gesture of the mobile terminal according to the image area of any mobile terminal, so that the photographing gesture tag of the target mobile terminal can be output through the target positioning model. Therefore, whether the target mobile terminal has the photographing behavior or not can be detected according to the outputted photographing gesture label. According to the embodiment of the invention, the photographing behavior detection does not need to be carried out according to the flash lamp, namely, whether the target mobile terminal has the photographing behavior in the target scene can be detected no matter whether the target mobile terminal starts the flash lamp or not.

Fig. 2 is a schematic structural diagram illustrating a photographing behavior detection apparatus according to an exemplary embodiment, which may be implemented by software, hardware, or a combination of the two. The photographing detection apparatus may include:

an image obtaining module 201, configured to obtain a video image of a target scene;

a region determining module 202, configured to determine an image region of the target mobile terminal included in the video image;

the calling module 203 is used for calling a target positioning model, inputting the image area into the target positioning model and outputting a photographing gesture tag of the target mobile terminal, wherein the target positioning model is used for determining a photographing gesture of the mobile terminal according to the image area of any mobile terminal;

and the behavior determining module 204 is configured to determine whether the target mobile terminal has a photographing behavior based on the outputted photographing gesture tag.

Optionally, the region determining module 202 is configured to:

calling a target detection model, inputting the video image into the target detection model, and outputting the position information of a target mobile terminal included in the video image, wherein the target detection model is used for detecting the position information of the target mobile terminal included in the video image according to any video image;

Optionally, the region determining module 202 is configured to:

Optionally, the behavior determination module 204 is configured to:

Optionally, referring to fig. 3, the apparatus further includes:

a frequency determining module 205, configured to determine a number of times of taking a picture of the target mobile terminal, where the number of times of taking a picture of the target mobile terminal is within a preset time from a current time;

and the alarm module 206 is configured to perform alarm prompting when the number of times of taking the picture of the target mobile terminal reaches a threshold value of the number of times of taking the picture.

Optionally, referring to fig. 4, the apparatus further includes:

a detecting module 207, configured to detect whether the video image is an effective image, where the effective image is an image captured when the image capturing apparatus is not blocked and is not moved;

the image obtaining module 201 is further configured to, when the video image is an effective image, perform the operation of obtaining the video image; and when the video image is not an effective image, carrying out abnormal detection alarm prompt.

It should be noted that: the device for detecting a photographing behavior provided in the foregoing embodiment is only illustrated by dividing the functional modules when the method for detecting a photographing behavior triggers an intelligent network service, and in practical applications, the function allocation may be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the photographing behavior detection apparatus and the photographing behavior detection method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 5 is a block diagram illustrating a terminal 700 according to an exemplary embodiment of the present invention. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so on.

In general, terminal 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 702 is configured to store at least one instruction for execution by the processor 701 to implement the photo-activity detection method provided by the method embodiments herein.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch screen display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, providing the front panel of the terminal 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is used to locate the current geographic Location of the terminal 700 for navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 709 is provided to supply power to various components of terminal 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 709 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of terminal 700 and/or an underlying layer of touch display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the terminal 700. When a physical button or a vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical button or the vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on a front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front surface of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the touch display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually becomes larger, the processor 701 controls the touch display 705 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 5 does not constitute a limitation of terminal 700 and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

An embodiment of the present application further provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to execute the method for detecting a photographing behavior provided in the embodiment shown in fig. 1.

The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for detecting a photographing behavior provided in the embodiment shown in fig. 1.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A photographing behavior detection method, comprising:

acquiring a video image of a target scene;

determining whether the target mobile terminal has a photographing behavior based on the outputted photographing gesture tag;

the determining the image area of the target mobile terminal included in the video image includes:

2. The method of claim 1, wherein the determining an image area of the target mobile terminal from the video image based on the location information of the target mobile terminal comprises:

3. The method of claim 1, wherein the target detection model is trained based on a plurality of video image samples and location information of the mobile terminal in each video image sample.

4. The method of claim 1, wherein the target positioning model is obtained by training a positioning model to be trained based on a plurality of image area samples and the photographing gesture label of each image area sample.

5. The method of claim 1, wherein the determining whether the target mobile terminal has the photographing behavior based on the outputted photographing gesture tag comprises:

6. The method of claim 5, wherein after determining that the target mobile terminal has the photographing behavior, further comprising:

7. The method of claim 1, wherein prior to acquiring the video image, further comprising:

8. A photographing behavior detection apparatus, characterized in that the apparatus comprises:

the behavior determining module is used for determining whether the target mobile terminal has the photographing behavior or not based on the outputted photographing gesture label;

the region determination module is to:

9. The apparatus of claim 8, wherein the region determination module is to:

10. The apparatus of claim 8, wherein the target detection model is trained based on a plurality of video image samples and location information of the mobile terminal in each video image sample.

11. The apparatus of claim 8, wherein the target location model is obtained by training a location model to be trained based on a plurality of image area samples and the photographing gesture label of each image area sample.

12. The apparatus of claim 8, wherein the behavior determination module is to:

13. The apparatus of claim 12, wherein the apparatus further comprises:

14. The apparatus of claim 8, wherein the apparatus further comprises:

the image acquisition module is further used for executing the operation of acquiring the video image when the video image is an effective image; and when the video image is not an effective image, carrying out abnormal detection alarm prompt.

15. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the method of any of claims 1-7.