CN113810599A

CN113810599A - Method for focusing on designated area by AI action recognition, mobile terminal and storage medium

Info

Publication number: CN113810599A
Application number: CN202110922536.9A
Authority: CN
Inventors: 张虹
Original assignee: Huizhou TCL Cloud Internet Corp Technology Co Ltd
Current assignee: Huizhou TCL Cloud Internet Corp Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-12-17
Also published as: WO2023016207A1

Abstract

The invention provides a method for identifying focusing on a designated area by utilizing AI actions, a mobile terminal and a storage medium, wherein the method comprises the steps of acquiring a preview image; identifying the preview image to obtain a target in the preview image; and when the target is the display article, focusing the area where the target is located to obtain an updated preview image. The method provided by the invention has the principle that the proportion of the background plane of the display article in the picture is increased, so that the camera image is focused on the display article, the product can be better displayed by changing the focusing logic, the purpose of rapidly focusing the picture on the product is realized, and the effect of displaying the clear state of the product on the picture is achieved.

Description

Method for focusing on designated area by AI action recognition, mobile terminal and storage medium

Technical Field

The present invention relates to the field of cameras, and in particular, to a method, a mobile terminal, and a storage medium for focusing on a designated area by AI action recognition.

Background

Currently, live broadcasting is very popular, and mobile phone products supporting front-facing camera focusing begin to appear in the market. When the anchor shows the product on hand to the user, if the article is very small or slender, the camera is often difficult to focus on the product, the anchor needs to be gathered closer, the product occupies a larger proportion on the picture, or a larger background is placed on the same plane of the exhibit, so that the camera can focus on the exhibit.

In the prior art, a focusing mode generally only includes default modes of full-picture calculation focusing, face area focusing and manual user click focusing, and is not used for processing a special situation of live broadcast display, a plane where a display area is located occupies a small picture proportion, and face focusing is added, so that a camera cannot be focused on a display product, erroneous judgment may occur in AI action recognition, a camera cannot be well focused on a display object, the display product displayed by a live broadcast picture is fuzzy, and the visual effect of watching the display product by a user is influenced.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, an object of the present invention is to provide a method, a mobile terminal and a storage medium for focusing on a designated area by AI action recognition, which aims to overcome the technical problem that the display is not easily focused on the display when the display is live.

The technical scheme of the invention is as follows:

the first embodiment of the present invention provides a method for recognizing focusing on a designated area by using AI actions, which includes:

acquiring a preview image;

identifying the preview image to obtain a target in the preview image;

and when the target is a display article, focusing the area where the target is located to obtain an updated preview image.

Optionally, after the preview image is identified to obtain the target in the preview image, the method further includes:

and when the target is not the display article, focusing the preview image to obtain an updated preview image.

Optionally, when the target is not a display article, the step of focusing the preview image to obtain an updated preview image includes:

when the target is not a display article, determining an interested area of the preview image according to the preview image;

carrying out image processing on the region of interest to obtain a focusing position of the region of interest; and focusing the focusing position to obtain an updated preview image.

Optionally, the region of interest is an entire image of the preview image.

Optionally, when the target is a display article, the step of focusing the area where the target is located to obtain an updated preview image includes:

when the target is a display article, taking the area where the target is located as an interested area;

carrying out image processing on the region of interest to obtain a focusing position of the region of interest;

and focusing the focusing position to obtain an updated preview image.

Optionally, the displayed article is an article displayed forward by a hand in the preview image, and the area where the target is located is the area of the hand in the preview image.

Optionally, the ISP is an image signal processing unit, and performs post-processing on the output signal of the front-end image sensor.

A second embodiment of the disclosure is a system for recognizing focusing on a designated area by AI actions, including:

the acquisition module acquires a preview image;

the AI action recognition module is used for recognizing the preview image to obtain a target in the preview image;

the selection focusing module is used for focusing the area where the target is located to obtain an updated preview image when the target is a display article; and when the target is not the display article, focusing the preview image to obtain an updated preview image.

A third embodiment of the present disclosure is a mobile terminal, including: the device comprises a processor, a memory and a control program for focusing the designated area, wherein the control program for focusing the designated area is stored in the memory and can run on the processor, and when being executed by the processor, the steps of the method for focusing the designated area by utilizing AI action recognition are realized.

A fourth embodiment of the present invention is a computer-readable storage medium, wherein a control program for focusing a designated area is stored in the computer-readable storage medium, and when the control program for focusing a designated area is executed by a processor, the steps of the method for recognizing focusing on a designated area by using an AI action are implemented.

Has the advantages that: the invention provides a method for identifying focusing on a designated area by utilizing AI actions, a mobile terminal and a storage medium, comprising the steps of obtaining a preview image; identifying the preview image to obtain a target in the preview image; and when the target is the display article, focusing the area where the target is located to obtain an updated preview image. The method provided by the invention has the principle that the proportion of the background plane of the display article in the picture is increased, so that the camera image is focused on the display article, the product can be better displayed by changing the focusing logic, the purpose of rapidly focusing the picture on the product is realized, and the effect of displaying the clear state of the product on the picture is achieved.

Drawings

Fig. 1 is a flowchart illustrating steps of a method, a mobile terminal and a storage medium for recognizing focusing on a designated area by using an AI action according to an embodiment of the present invention.

Detailed Description

The present invention provides a method, a mobile terminal and a storage medium for focusing on a designated area by AI action recognition, and the present invention is further described in detail below in order to make the objects, technical solutions and effects of the present invention clearer and clearer. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element.

It should also be noted that the same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", etc. based on the orientation or positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but it is not intended to indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present patent, and the specific meaning of the terms may be understood by those skilled in the art according to specific circumstances.

Currently, live broadcasting is very popular, and mobile phone products supporting front-facing camera focusing begin to appear in the market. When the anchor shows the product on hand to the user, if the article is very small or slender, the camera is often difficult to focus on the product, the anchor needs to be gathered closer, the product occupies a larger proportion on the picture, or a larger background is placed on the same plane of the exhibit, so that the camera can focus on the exhibit. In the prior art, a focusing mode generally only has default modes of full-picture calculation focusing, face area focusing and manual user click focusing, and is not processed aiming at the special condition of live broadcast display, and AI action recognition may cause misjudgment, so that a camera cannot be well focused on a display object, the display object displayed by a live broadcast picture is fuzzy, and the visual effect of watching the display object by a user is influenced.

A first embodiment of the present invention provides a method for recognizing focusing on a designated area by using an AI action, which is applied to a live broadcast scene, as shown in fig. 1, and includes:

s1: acquiring a preview image;

the display article is positioned in front of the lens as a target, an image on the display screen is not required to be clicked for focusing, the camera is started to work, the portrait is positioned in the working range of the camera, the camera collects a preview picture, the display screen is connected with the microprocessor, the display screen receives the control of the microprocessor and displays data output by the microprocessor, the collected preview picture is processed by the microprocessor, and the output data are displayed in the display screen.

Further, mobile terminal specifically is the cell-phone, fixes the cell-phone, and the live broadcast personnel is just to the leading camera of cell-phone, when beginning the live broadcast, starts the camera work, makes the portrait be in camera working range, and the preview picture is gathered to the camera, with its processing through microprocessor and make the data of output show in the display screen, the image information that the camera acquireed passes through the display screen and shows, the display screen is connected with microprocessor, the display screen accepts microprocessor's control and demonstration the data of microprocessor output, the preview picture that shows the collection in the display screen can include portrait, space background, showpiece and the picture of combination.

S2: identifying the preview image to obtain a target in the preview image;

further, after the preview image is identified to obtain the target in the preview image, the method further includes: and when the target is not the display article, focusing the preview image to obtain an updated preview image.

Preferably, the preview picture is sent to the AI action recognition module, the AI action recognition module judges and collects the action in the preview picture, and the AI action recognition judges whether the action that the live broadcast personnel stretch hands to move towards the direction of the camera is recognized in the preview picture, so that the action in the preview picture is recognized, and the focusing logic is selected.

The AI motion recognition detects the motion of the person in the screen in real time, and detects the motions such as call making, hand raising, smoking, and the like.

S3: and when the target is a display article, focusing the area where the target is located to obtain an updated preview image.

When the target is a display article, focusing the area where the target is located to obtain an updated preview image, wherein the step of focusing the area where the target is located comprises the following steps: when the target is a display article, taking the area where the target is located as an interested area; carrying out image processing on the region of interest to obtain a focusing position of the region of interest; and focusing the focusing position to obtain an updated preview image.

Further, the displayed article is the article displayed forward by the hand in the preview image, and the area where the target is located is the area of the hand in the preview image.

Preferably, the ISP is an image signal processing unit, and performs post-processing on the output signal of the front-end image sensor.

Further, when the AI action recognition detects that the hands of the live broadcast personnel extend forwards to display the article, the focusing logic is changed, the focusing logic can comprise full-picture calculation focusing, face focusing, click focusing and the like, and the focusing ROI is reduced to the area of the hand display article. And the image signal processing ISP calculates the focusing position of the displayed article to focus the displayed article.

It should be noted that, ROI (region of interest) is a region of interest, in machine vision and image processing, a region to be processed is outlined in a manner of a square frame, a circle, an ellipse, an irregular polygon, etc. from a processed image, and is called a region of interest, and the ROI is used for calculating definition to focus on the region; isp (image Signal processing) is a unit for processing an image Signal, and is mainly used for processing an output Signal of a front-end image sensor.

Preferably, when the target is not a display article, determining an interested area of the preview image according to the preview image; carrying out image processing on the region of interest to obtain a focusing position of the region of interest; and focusing the focusing position to obtain an updated preview image. Further, the region of interest is the entire image of the preview image.

Further, when the AI action recognition does not detect that the hands of the live broadcast personnel extend forwards to display the article, the focusing logic is recovered, and the focusing ROI is recovered to the whole picture. And the image signal processing ISP calculates the focusing position of the displayed article and focuses the whole picture.

The above-described method disclosed by the present invention is further described in more detail below with reference to the example shown in fig. 1.

K1, the mobile terminal is specifically a mobile phone, the mobile phone is fixed, live broadcast personnel are over against a front camera of the mobile phone, and when the live broadcast is started, the camera is started to work, so that the portrait is in the working range of the camera.

K2, camera are gathering the preview picture, with its processing through microprocessor and make the data of output show in the display screen, the image information that the camera acquireed shows through the display screen, the display screen is connected with microprocessor, the display screen accepts microprocessor's control and demonstration microprocessor output's data, the preview picture that shows the collection in the display screen can include the picture of portrait, space background, showpiece and combination.

And K3, sending the preview picture into an AI action recognition module, judging and collecting actions in the preview picture through the AI action recognition module, wherein the AI action recognition module can detect actions such as making a call, raising hands, smoking, raising hands, shaking head, stretching hands and the like in real time, but does not recognize that actions of stretching hands of live broadcast personnel to move towards the camera direction appear in the preview picture, so that the actions in the preview picture are recognized.

K4, judging whether the hands are extended forward to display the article.

And K411, when the AI action recognition detects that the hands of the live broadcast personnel extend forwards to display the article, changing focusing logic, wherein the focusing logic can comprise full-picture calculation focusing, face focusing, click focusing and the like, and reducing the focusing ROI to the hand display article area.

K412, the image signal processing ISP calculates the focusing position of the displayed article, and the displayed article is focused.

And K421, when the AI action identification does not detect that the hands of the live broadcast personnel extend forwards to display the article, restoring the focusing logic and restoring the focusing ROI to the whole picture.

K422, the image signal processing ISP calculates the focusing position of the displayed article, and the whole picture is focused.

K5, judging whether the live broadcast is finished, if the live broadcast is continued, returning to K2 and repeating the steps.

And K6, if K5 judges that the live broadcast is finished, the live broadcast is finished.

the acquisition module acquires a preview image;

Preferably, the mobile terminal may include Radio Frequency (RF) circuitry, a memory including one or more computer-readable storage media, an input unit, a display unit, a sensor, an audio circuit, a Wireless Fidelity (Wi-Fi) module, a processor including one or more processing cores, and a power supply.

Further, the RF circuit may be configured to receive and transmit signals during a message transmission or communication process, and in particular, receive downlink information from a base station and then send the received downlink information to one or more processors for processing; in addition, data relating to uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.

Preferably, the memory is used for storing software programs and modules, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the mobile terminal, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide access to the memory by the processor and the input unit.

Preferably, the input unit is operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor, and can receive and execute commands sent by the processor. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit may comprise other input devices than a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

Preferably, the display unit may be used to display information input by or provided to the user and various graphic user interfaces of the mobile terminal, which may be configured by graphics, text, icons, video, and any combination thereof. The Display unit may include a Display screen, and optionally, the Display screen may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay the display screen, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor to determine the type of the touch event, and then the processor provides a corresponding visual output on the display screen according to the type of the touch event. In the embodiment of the present invention, the display unit is a display device, the display device includes a side light source, a light guide plate, a photo-curable adhesive, a display panel, and a reflective sheet, the side light source is disposed on a first side of the light guide plate, the photo-curable adhesive is used for bonding a second end surface of the light guide plate to an end surface of the display panel, the reflective sheet is fixed on a second side of the light guide plate, the first side of the light guide plate and the second side of the light guide plate are opposite sides, the light guide plate includes a first end surface of the light guide plate and a second end surface of the light guide plate, the first end surface of the light guide plate is provided with a plurality of groove-shaped structures, the groove-shaped structures are used for refracting light emitted from the side light source, the second end surface of the light guide plate is provided with a plurality of prisms, the prisms are used for refracting light emitted from the side light source, the groove-shaped structures, the prisms and the light guide plate are integrated structures, and the first end surface of the light guide plate and the second end surface of the light guide plate are opposite surfaces, the second end face of the light guide plate is close to the display panel, and the display panel is a reflection type liquid crystal display.

Preferably, the mobile terminal may further include at least one sensor, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration (generally three axes) at each position, can detect the magnitude and position of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping) and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. may also be configured with respect to the mobile terminal.

Further, an audio circuit, a speaker, and a microphone may provide an audio interface between the user and the mobile terminal. The audio circuit can transmit the electric signal converted from the received audio data to the loudspeaker, and the electric signal is converted into a sound signal by the loudspeaker to be output; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by the audio circuit and converted into audio data, which is then output to the processor for processing, and then transmitted to, for example, another mobile terminal via the RF circuit, or the audio data is output to the memory for further processing. The audio circuitry may also include an earbud jack to provide communication of a peripheral headset with the mobile terminal. Wi-Fi belongs to a short-distance wireless transmission technology, a mobile terminal can help a user to receive and send emails, browse webpages, access streaming media and the like through a Wi-Fi module, and wireless broadband internet access is provided for the user.

Further, the processor is a control center of the mobile terminal, connects various parts of the whole mobile phone by using various interfaces and lines, and executes various functions and processes data of the mobile terminal by running or executing software programs and/or modules stored in the memory and calling data stored in the memory, thereby integrally monitoring the mobile phone. Optionally, the processor may include one or more processing cores; preferably, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.

Preferably, the mobile terminal further comprises a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the processor through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The invention provides a method for identifying focusing on a designated area by utilizing AI actions, a mobile terminal and a storage medium, comprising the steps of obtaining a preview image; identifying the preview image to obtain a target in the preview image; and when the target is the display article, focusing the area where the target is located to obtain an updated preview image. The method provided by the invention has the principle that the proportion of the background plane of the display article in the picture is increased, so that the camera image is focused on the display article, the focusing is not needed to be manually clicked when the display article is displayed in a live broadcast manner, as long as a display person extends the hands to display the product, the AI action recognition is utilized, and when the action of the extension-handed display product is detected in the anchor broadcast manner, the focusing logic is changed, so that the product can be better displayed, and the picture can be rapidly focused on the product.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A method for recognizing focusing on a designated area by using AI actions is characterized by comprising the following steps:

acquiring a preview image;

identifying the preview image to obtain a target in the preview image;

2. The method for focusing on a designated area by using AI action recognition according to claim 1, wherein after recognizing the preview image and obtaining the target in the preview image, the method further comprises:

3. The method for focusing on the designated area by using AI action recognition according to claim 2, wherein the step of focusing the preview image when the target is not a display item to obtain an updated preview image comprises:

and focusing the focusing position to obtain an updated preview image.

4. The method of utilizing AI action recognition for focusing on a specified area according to claim 3, wherein the area of interest is the entire image of the preview image.

5. The method of claim 1, wherein focusing the area where the target is located when the target is a display item, and obtaining the updated preview image comprises:

and focusing the focusing position to obtain an updated preview image.

6. The method of claim 5, wherein the displayed item is an item displayed forward of the hand in the preview image, and the area where the target is located is an area of the hand in the preview image.

7. The method for focusing on the designated area by using AI action recognition according to any one of claims 1 to 6, wherein the ISP is an image signal processing unit that performs post-processing on the output signal of the front-end image sensor.

8. A system for recognizing focus on a designated area using AI actions, comprising:

the acquisition module acquires a preview image;

9. A mobile terminal, comprising: a processor, a memory and a control program for focusing a designated area stored on the memory and executable on the processor, wherein the control program for focusing a designated area realizes the steps of the method for identifying focusing on a designated area by AI action according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, on which a control program for focusing on a designated area is stored, which, when executed by a processor, carries out the steps of the method for identifying focusing on a designated area using AI actions according to any one of claims 1 to 7.