CN113473198A

CN113473198A - Control method of intelligent equipment and intelligent equipment

Info

Publication number: CN113473198A
Application number: CN202010582730.2A
Authority: CN
Inventors: 陈维强; 冯谨强; 孟祥奇; 高雪松
Original assignee: Qingdao Hisense Electronic Industry Holdings Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2021-10-01
Anticipated expiration: 2040-06-23
Also published as: CN113473198B

Abstract

The application relates to the technical field of intelligent equipment, and provides a control method of the intelligent equipment and the intelligent equipment, wherein the method comprises the following steps: receiving a plurality of first images acquired by an image acquisition device in a set period, intercepting a second image from one first image according to a preset interception rule when reading one first image, identifying the second image, and determining at least one gesture identification area and a corresponding gesture identification result; and if the same control gesture exceeding the set number threshold exists in the same gesture recognition area, controlling the intelligent equipment to execute corresponding operation based on the control gesture. The first image is preprocessed according to the preset intercepting rule so as to reduce the hand detection difficulty, and the static gesture of the user is collected, so that the time spent in gesture recognition can be shortened, and the user experience can be improved.

Description

Control method of intelligent equipment and intelligent equipment

Technical Field

The application relates to the technical field of intelligent equipment, and provides a control method of intelligent equipment and the intelligent equipment.

Background

Along with the intelligent upgrading of product equipment, a user operates a screen by using hands to perform different continuous actions (namely dynamic gestures), so that the functions of adjusting volume, changing channels, fast forward, fast backward and the like are realized, and the operation of controlling the intelligent equipment by the user is simpler and simpler.

However, when the intelligent device is controlled in the above manner, the following problems may occur: the farther the user is away from the camera, the smaller the proportion of the hand image of the user in the acquired image is, so that the hand detection difficulty is greatly improved; when a user makes a dynamic gesture, the camera acquires a plurality of continuous images and inputs the plurality of continuous images into the neural network for recognition, and the recognition process of the neural network takes a long time due to the fact that the number of input images is large, and the real-time response requirement cannot be met; finally, the user needs to make a dynamic gesture to meet the response, and the user experience is poor.

In view of this, the present application provides a new method for controlling an intelligent device and an intelligent device.

Disclosure of Invention

The embodiment of the application provides a control method of an intelligent device and the intelligent device, which are used for reducing the hand detection difficulty, shortening the time spent on gesture recognition and improving the user experience.

In a first aspect, an embodiment of the present application provides an intelligent device, including:

a display configured to display a screen;

the image acquisition device is configured to acquire a plurality of first images in a set period and transmit the plurality of first images to the controller;

the controller is configured to receive the plurality of first images acquired by the image acquirer within a set period;

respectively executing the following processing for each first image, wherein each time one first image is read, a second image is intercepted from the first image according to a preset interception rule, wherein the occupation ratio of a user hand image on the second image is higher than that of the user hand image on the first image; recognizing the second image, and determining at least one gesture recognition area on the second image and a corresponding gesture recognition result;

and if the same control gesture exceeding a set number threshold exists in the same gesture recognition area, controlling the intelligent equipment to execute corresponding operation based on the same control gesture.

Optionally, the controller is configured to:

determining the size information and the corner point coordinate information of the hand part image according to the field angle of the image collector, a set field angle threshold value and the size information of the first image;

and intercepting the second image from the first image according to the size information of the hand image and the corner point coordinate information.

Optionally, the controller is further configured to:

and if the field angle of the image collector in the X direction and the field angle of the image collector in the Y direction are both lower than the field angle threshold value, determining the first image as the second image.

Optionally, the controller is configured to:

if M continuous awakening gestures exist in the same gesture recognition area, determining the same gesture recognition area as a gesture control area;

if the gesture control area has N continuous first control gestures after the M continuous awakening gestures, controlling the intelligent device to execute corresponding operations based on the first control gestures;

wherein M, N are all positive integers.

Optionally, the controller is configured to:

if the gesture control area has N continuous second control gestures after the N continuous first control gestures, controlling the intelligent device to execute corresponding operations based on the second control gestures; wherein N is a positive integer.

In a second aspect, an embodiment of the present application further provides a method for controlling an intelligent device, including:

receiving a plurality of first images acquired by an image acquisition device in a set period;

Optionally, intercepting a second image from the first image according to a preset interception rule, including:

Optionally, before determining the size information and the corner coordinate information of the hand image, the method further includes:

Optionally, if the same control gesture exceeding the set number threshold exists in the same gesture recognition area, controlling the smart device to execute a corresponding operation based on the same control gesture, including:

wherein M, N are all positive integers.

Optionally, after controlling the smart device to perform a corresponding operation based on the first control gesture, the method further includes:

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, which includes program code, and when the program product runs on a terminal, the program code is configured to enable the terminal to execute the steps of any one of the above-mentioned control methods for an intelligent device.

The beneficial effect of this application is as follows:

the application provides a control method of an intelligent device and the intelligent device, wherein the method comprises the following steps: receiving a plurality of first images acquired by an image acquisition device in a set period, intercepting a second image from one first image according to a preset interception rule when reading one first image, identifying the second image, and determining at least one gesture identification area and a corresponding gesture identification result; and if the same control gesture exceeding the set number threshold exists in the same gesture recognition area, controlling the intelligent equipment to execute corresponding operation based on the control gesture. The first image is preprocessed according to the preset intercepting rule so as to reduce the hand detection difficulty, and the static gesture of the user is collected, so that the time spent in gesture recognition can be shortened, and the user experience can be improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1a is a schematic diagram illustrating an operation scenario between a smart device and a control apparatus;

fig. 1b shows a block diagram of the configuration of the control device 100 from fig. 1 a;

FIG. 1c is a block diagram illustrating the configuration of the smart device 200 of FIG. 1 a;

FIG. 1d is a block diagram illustrating the architectural configuration of the operating system in the memory of the smart device 200;

FIG. 2 illustrates a flow diagram for controlling a smart device;

fig. 3a shows an exemplary field angle diagram;

fig. 3b shows an exemplary top view of the field angle in the X direction;

fig. 3c shows a side view of the field angle in the Y direction as an example;

FIG. 4a illustrates a wake gesture diagram;

FIG. 4b illustrates a control gesture diagram for turning up volume;

FIG. 4c illustrates a control gesture diagram for turning down the volume;

FIG. 4d illustrates a control gesture diagram for video rewind;

FIG. 4e illustrates a control gesture diagram for video fast forward;

FIG. 4f is a schematic diagram illustrating an exemplary determined control gesture;

FIG. 4g illustrates a control gesture diagram for cancellation;

fig. 4h illustrates an exemplary mute/end control gesture diagram.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

Fig. 1a is a schematic diagram illustrating an operation scenario between a smart device and a control apparatus. As shown in fig. 1a, the control apparatus 100 and the smart device 200 may communicate with each other in a wired or wireless manner.

The control apparatus 100 is configured to control the smart device 200, receive an operation instruction input by a user, convert the operation instruction into an instruction recognizable and responsive by the smart device 200, and mediate interaction between the user and the smart device 200. Such as: the user responds to the channel up/down operation by operating the channel up/down key on the control device 100.

The control device 100 may be a remote control 100A, which includes infrared protocol communication, bluetooth protocol communication, other short-distance communication methods, and the like, and controls the smart device 200 in a wireless or other wired manner. The user may input user instructions via keys on a remote control, voice input, control panel input, etc. to control the smart device 200. Such as: the user can input a corresponding control instruction through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to realize the function of controlling the smart device 200.

The control device 100 may be a mobile terminal 100B, a tablet computer, a notebook computer, or the like. For example, the smart device 200 is controlled using an application program running on the mobile terminal 100B. The application program may provide various controls to a user through an intuitive User Interface (UI) on a screen associated with the mobile terminal 100B through configuration.

The user interface in the embodiment of the application is a media interface for interaction and information exchange between an application program or an operating system and a user, and realizes conversion between an internal form of information and a form acceptable by the user. A common presentation form of a user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the smart device 200, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

For example, the mobile terminal 100B may install a software application with the smart device 200, implement connection communication through a network communication protocol, and implement the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 100B may be caused to establish a control instruction protocol with the smart device 200 to implement the function of the physical keys as arranged by the remote control 100A by operating various function keys or virtual buttons of the user interface provided on the mobile terminal 100B. The audio and video content displayed on the mobile terminal 100B may also be transmitted to the display of the smart device 200, so as to implement a synchronous display function.

In some other exemplary embodiments, the smart device 200 may further invoke an internally configured image collector, such as a camera, a video camera, etc., for collecting external environment scenes, so as to adaptively change the display parameters of the smart device 200; and the first images are acquired in a set period and may contain the attributes of the user or the interaction gestures of the user, so that the interaction between the intelligent equipment and the user is realized.

The smart device 200 may provide a network tv function of a broadcast receiving function and a computer support function. The smart device may be a digital television, a web television, an Internet Protocol Television (IPTV), or the like. The display of the smart device 200 may be a liquid crystal display, an organic light emitting display, a projection device. The specific display type, size, resolution, etc. are not limited.

The smart device 200 also performs data communication with the server 300 through various communication means. This may allow the smart device 200 to be communicatively coupled via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may provide various contents and interactions to the smart device 200. For example, the smart device 200 may send and receive information such as: receiving Electronic Program Guide (EPG) data, receiving software program updates, or accessing a remotely stored digital media library. The servers 300 may be a group or groups of servers, and may be one or more types of servers. Other web service contents such as a video on demand and an advertisement service are provided through the server 300.

Fig. 1b shows a block diagram of an exemplary configuration of the control device 100. As shown in fig. 1b, the control device 100 includes a controller 110, a memory 120, a communicator 130, a user input interface 140, an output interface 150, and a power supply 160.

The controller 110 includes a Random Access Memory (RAM)111, a Read Only Memory (ROM)112, a processor 113, a communication interface, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components of the communication cooperation, external and internal data processing functions.

Illustratively, when an interaction of a user pressing a key disposed on the remote controller 100A or an interaction of touching a touch panel disposed on the remote controller 100A is detected, the controller 110 may control to generate a signal corresponding to the detected interaction and transmit the signal to the smart device 200.

And a memory 120 for storing various operation programs, data and applications for driving and controlling the control apparatus 100 under the control of the controller 110. The memory 120 may store various control signal commands input by a user.

The communicator 130 enables communication of control signals and data signals with the smart device 200 under the control of the controller 110. Such as: the control apparatus 100 transmits a control signal (e.g., a touch signal or a button signal) to the smart device 200 via the communicator 130, and the control apparatus 100 may receive the signal transmitted by the smart device 200 via the communicator 130. The communicator 130 may include an infrared signal interface 131 and a radio frequency signal interface 132. For example: when the infrared signal interface is used, a user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the intelligent device 200 through the infrared sending module. The following steps are repeated: when the radio frequency signal interface is used, a user input instruction needs to be converted into a digital signal, and then the digital signal is modulated according to a radio frequency control signal modulation protocol and then is sent to the intelligent device 200 through the radio frequency sending terminal.

The user input interface 140 may include at least one of a microphone 141, a touch pad 142, a sensor 143, a key 144, and the like, so that a user can input a user instruction regarding controlling the smart device 200 to the control apparatus 100 through voice, touch, gesture, press, and the like.

The output interface 150 outputs a user instruction received by the user input interface 140 to the smart device 200, or outputs an image or voice signal received by the smart device 200. Here, the output interface 150 may include an LED interface 151, a vibration interface 152 generating vibration, a sound output interface 153 outputting sound, a display 154 outputting an image, and the like. For example, the remote controller 100A may receive an output signal such as audio, video, or data from the output interface 150, and display the output signal in the form of an image on the display 154, in the form of audio on the sound output interface 153, or in the form of vibration on the vibration interface 152.

And a power supply 160 for providing operation power support for each element of the control device 100 under the control of the controller 110. In the form of a battery and associated control circuitry.

A block diagram of the hardware configuration of the smart device 200 is illustrated in fig. 1 c. As shown in fig. 1c, the smart device 200 may include a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a memory 260, a user interface 265, a video processor 270, a display 275, an audio processor 280, an audio output interface 285, and a power supply 290.

The tuner demodulator 210 receives the broadcast television signal in a wired or wireless manner, may perform modulation and demodulation processing such as amplification, mixing, and resonance, and is configured to demodulate, from a plurality of wireless or wired broadcast television signals, an audio/video signal carried in a frequency of a television channel selected by a user, and additional information (e.g., EPG data).

The tuner demodulator 210 is responsive to the user selected frequency of the television channel and the television signal carried by the frequency, as selected by the user and controlled by the controller 250.

The tuner demodulator 210 can receive a television signal in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and can demodulate the analog signal and the digital signal according to the different kinds of the received television signals.

In other exemplary embodiments, the tuning demodulator 210 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs a television signal after modulation and demodulation, and inputs the television signal into the smart device 200 through the external device interface 240.

The communicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the smart device 200 may transmit content data to an external device connected via the communicator 220, or browse and download content data from an external device connected via the communicator 220. The communicator 220 may include a network communication protocol module or a near field communication protocol module, such as a WIFI module 221, a bluetooth communication protocol module 222, and a wired ethernet communication protocol module 223, so that the communicator 220 may receive a control signal of the control device 100 according to the control of the controller 250 and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.

The detector 230 is a component of the smart device 200 for collecting signals of an external environment or interaction with the outside. The detector 230 may include a sound collector 231, such as a microphone, which may be used to receive the sound of the user, such as a voice signal of a control instruction of the user controlling the smart device 200; alternatively, ambient sounds for identifying the type of ambient scene may be collected, enabling the smart device 200 to adapt to ambient noise.

In some other exemplary embodiments, the detector 230, which may further include an image collector 232, such as a camera, a video camera, etc., may be configured to collect external environment scenes to adaptively change the display parameters of the smart device 200; and the first images are used for acquiring a plurality of first images in a set period, and the first images may contain attributes of the user or interaction gestures with the user so as to realize the interaction function between the intelligent equipment and the user.

In some other exemplary embodiments, the detector 230 may further include a light receiver for collecting the ambient light intensity to adapt to the display parameter variation of the smart device 200.

In some other exemplary embodiments, the detector 230 may further include a temperature sensor, such as by sensing an ambient temperature, and the smart device 200 may adaptively adjust a display color temperature of the image. For example, when the temperature is higher, the smart device 200 may be adjusted to display a color temperature of the image that is cooler; when the temperature is lower, the smart device 200 may be adjusted to display a warmer color temperature of the image.

The external device interface 240 is a component for providing the controller 250 to control data transmission between the smart device 200 and an external device. The external device interface 240 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), additional information (e.g., EPG), etc. of the external apparatus.

The external device interface 240 may include: a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS) terminal 242, an analog or digital Component terminal 243, a Universal Serial Bus (USB) terminal 244, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like.

The controller 250 controls the operation of the smart device 200 and responds to the user's operations by running various software control programs (e.g., an operating system and various application programs) stored on the memory 260. The controller 250 may further perform the following processing for each first image, respectively, and each time one first image M is read, intercept one second image N from the first image M according to a preset interception rule, where an occupation ratio of the user hand image on the second image N is higher than an occupation ratio of the user hand image on the first image M; identifying a second image N, and determining at least one gesture identification area on the second image N and a corresponding gesture identification result; and if the same control gesture exceeding the set threshold exists in the same gesture recognition area, controlling the intelligent equipment to execute corresponding operation based on the same control gesture.

As shown in FIG. 1c, controller 250 includes Random Access Memory (RAM)251, Read Only Memory (ROM)252, graphics processor 253, CPU processor 254, communication interface 255, and communication bus 256. The RAM251, the ROM252, the graphic processor 253, and the CPU processor 254 are connected to each other through a communication bus 256 through a communication interface 255.

The ROM252 stores various system boot instructions. If the smart device 200 starts to boot up when receiving the power-on signal, the CPU processor 254 executes the system boot instruction in the ROM252 and copies the operating system stored in the memory 260 to the RAM251 to start running the boot operating system. After the start of the operating system is completed, the CPU processor 254 copies the various application programs in the memory 260 to the RAM251 and then starts running and starting the various application programs.

And a graphic processor 253 for generating various graphic objects such as icons, operation menus, and user input instruction display graphics, etc. The graphic processor 253 may include an operator for performing an operation by receiving various interactive instructions input by a user, and further displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on the display 275.

A CPU processor 254 for executing operating system and application program instructions stored in memory 260. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.

In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some initialization operations of the smart device 200 in the smart device preloading mode and/or operations of displaying a screen in the normal mode. And a plurality of or a sub-processor for performing an operation in a state of a smart device standby mode or the like.

The communication interface 255 may include a first interface to an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.

The controller 250 may control the overall operation of the smart device 200. For example: in response to receiving a user input command for selecting a GUI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user input command.

Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to the object. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the smart device 200 or a voice command corresponding to a voice spoken by the user.

The memory 260 is used for storing various types of data, software programs, or applications that drive and control the operation of the smart device 200. The memory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes the memory 260, the RAM251 and the ROM252 of the controller 250, or a memory card in the smart device 200.

In some embodiments, the memory 260 is specifically used for storing an operating program for driving the controller 250 in the smart device 200; storing various application programs built in the smart device 200 and downloaded by a user from an external device; data such as visual effect images for configuring various GUIs provided by the display 275, various objects related to the GUIs, and selectors for selecting GUI objects are stored.

In some embodiments, memory 260 is specifically configured to store drivers for tuner demodulator 210, communicator 220, detector 230, external device interface 240, video processor 270, display 275, audio processor 280, etc., and related data, such as external data (e.g., audio-visual data) received from the external device interface or user data (e.g., key information, voice information, touch information, etc.) received by the user interface.

In some embodiments, memory 260 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., the middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.

A block diagram of the architectural configuration of the operating system in the memory of the smart device 200 is illustrated in fig. 1 d. The operating system architecture comprises an application layer, a middleware layer and a kernel layer from top to bottom.

The application layer, the application programs built in the system and the non-system-level application programs belong to the application layer. Is responsible for direct interaction with the user. The application layer may include a plurality of applications such as a setup application, a post application, a media center application, and the like. These applications may be implemented as Web applications that execute based on a WebKit engine, and in particular may be developed and executed based on HTML5, Cascading Style Sheets (CSS), and JavaScript.

Here, HTML, which is called HyperText Markup Language (HyperText Markup Language), is a standard Markup Language for creating web pages, and describes the web pages by Markup tags, where the HTML tags are used to describe characters, graphics, animation, sound, tables, links, etc., and a browser reads an HTML document, interprets the content of the tags in the document, and displays the content in the form of web pages.

CSS, known as Cascading Style Sheets (Cascading Style Sheets), is a computer language used to represent the Style of HTML documents, and may be used to define Style structures, such as fonts, colors, locations, etc. The CSS style can be directly stored in the HTML webpage or a separate style file, so that the style in the webpage can be controlled.

JavaScript, a language applied to Web page programming, can be inserted into an HTML page and interpreted and executed by a browser. The interaction logic of the Web application is realized by JavaScript. The JavaScript can package a JavaScript extension interface through the browser to realize communication with the kernel layer.

The middleware layer may provide some standardized interfaces to support the operation of various environments and systems. For example, the middleware layer may be implemented as multimedia and hypermedia information coding experts group (MHEG) middleware related to data broadcasting, DLNA middleware which is middleware related to communication with an external device, middleware which provides a browser environment in which each application program in the smart device operates, and the like.

The kernel layer provides core system services, such as: file management, memory management, process management, network management, system security authority management and the like. The kernel layer may be implemented as a kernel based on various operating systems, for example, a kernel based on the Linux operating system.

The kernel layer also provides communication between system software and hardware, and provides device driver services for various hardware, such as: provide display driver for the display, provide camera driver for the camera, provide button driver for the remote controller, provide wiFi driver for the WIFI module, provide audio driver for audio output interface, provide power management drive for Power Management (PM) module etc..

A user interface 265 receives various user interactions. Specifically, it is used to transmit an input signal of a user to the controller 250 or transmit an output signal from the controller 250 to the user. For example, the remote controller 100A may transmit an input signal, such as a power switch signal, a channel selection signal, a volume adjustment signal, etc., input by the user to the user interface 265, and then the input signal is transferred to the controller 250 through the user interface 265; alternatively, the remote controller 100A may receive an output signal such as audio, video, or data output from the user interface 265 via the controller 250, and display the received output signal or output the received output signal in audio or vibration form.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user interface 265 receives the user input commands through the GUI. Specifically, the user interface 265 may receive user input commands for controlling the position of a selector in the GUI to select different objects or items.

Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user interface 265 receives the user input command by recognizing the sound or gesture through the sensor. The video processor 270 is configured to receive an external video signal, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a video signal that is directly displayed or played on the display 275.

Illustratively, the video processor 270 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is configured to demultiplex an input audio/video data stream, where, for example, an input MPEG-2 stream (based on a compression standard of a digital storage media moving image and voice), the demultiplexing module demultiplexes the input audio/video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert a frame rate of an input video, for example, convert a frame rate of an input 60Hz video into a frame rate of 120Hz or 240Hz, where a common format is implemented by using, for example, an interpolation frame method.

And a display formatting module for converting the signal output by the frame rate conversion module into a signal conforming to a display format of a display, such as converting the format of the signal output by the frame rate conversion module to output an RGB data signal.

A display 275 for receiving the image signal from the video processor 270 and displaying the video content, the image and the menu manipulation interface. The display video content may be from the video content in the broadcast signal received by the tuner-demodulator 210, or from the video content input by the communicator 220 or the external device interface 240. The display 275, and at the same time, displays a user manipulation interface UI generated in the smart device 200 and used to control the smart device 200.

And, the display 275 may include a display screen assembly for presenting a picture and a driving assembly for driving the display of an image. Alternatively, a projection device and projection screen may be included, provided display 275 is a projection display.

The audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played by the speaker 286.

Illustratively, audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), and the like.

The audio output interface 285 is used for receiving an audio signal output by the audio processor 280 under the control of the controller 250, and the audio output interface 285 may include a speaker 286 or an external sound output terminal 287, such as an earphone output terminal, for outputting to a generating device of an external device.

In other exemplary embodiments, video processor 270 may comprise one or more chips. Audio processor 280 may also comprise one or more chips.

And, in other exemplary embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated with the controller 250 in one or more chips.

And a power supply 290 for providing power supply support for the smart device 200 by the power input from the external power source under the control of the controller 250. The power supply 290 may be a built-in power supply circuit installed inside the smart device 200, or may be a power supply installed outside the smart device 200.

In the prior art, a dynamic gesture image of a user needs to be collected and recognized, and the intelligent device is controlled to execute corresponding operations according to corresponding recognition results, but when the intelligent device is controlled by adopting the above method, the following problems can be caused: the farther the user is away from the camera, the smaller the proportion of the hand image of the user in the acquired image is, so that the hand detection difficulty is greatly improved; the recognition process of the neural network takes a long time, and continuous multiple images are recognized, so that the real-time response requirement cannot be met; and, the user needs to do dynamic gestures to satisfy the response, and the user experience is poor. In order to solve the foregoing problem, an embodiment of the present application provides a new method for controlling a smart device, which is shown in fig. 2 and includes the following steps:

s201: and receiving a plurality of first images acquired by an image acquisition device in a set period.

When the intelligent device is in a starting state, the image collector is called to shoot a current scene according to a preset collection frequency in a set period, and a plurality of continuous first images are obtained. Because the image collector performs the image collecting operation according to the set period and the preset collecting frequency, a pure background image without a user may be collected, and an image containing one user or a plurality of users may also be collected. For example, the camera captures 20 images in 1 second, wherein the first 5 images are pure background images containing no user and the 6 th to 20 th images are images containing two users.

S202: a first image M is read.

S203: and according to a preset interception rule, intercepting a second image N from the first image M, wherein the occupation ratio of the hand image of the user on the second image N is higher than that of the hand image on the first image M.

Because the hand image of the user needs to be detected in the embodiment of the application, the number of pixels occupied by the hand image in the collected first image is generally small, and the occupation ratio of the hand image in the collected first image is smaller along with the fact that the user is farther away from the camera, so that the hand detection difficulty is greatly improved, therefore, the first image needs to be preprocessed before the hand image is detected in the embodiment of the application, so that the second image with the high occupation ratio of the hand image is intercepted, and the detection difficulty is reduced. Specifically, the step of capturing the second image N from the first image M is as follows:

a1: and determining the size information and the corner coordinate information of the hand image according to the field angle of the image collector, the set field angle threshold value and the size information of the first image M.

In the image capturing device, the lens of the image capturing device is used as a vertex, and the angle formed by two edges of the maximum range through which the object image of the object to be measured can pass is referred to as the field angle, and fig. 3a shows a schematic view of the field angle.

Referring to fig. 3b, when the line passing through the center of the lens (i.e. the optical axis of the lens) is parallel to the ground, the angle formed by the two edges of the lens where the object image of the measured object can pass through the maximum range is called the angle of view (i.e. V) in the X direction_X) (ii) a Referring to FIG. 3c, when the optical axis of the lens is perpendicular to the ground, the object image of the target can pass through the angle formed by the two edges of the lens with the largest range, which is called the angle of view (i.e. V) in the X direction_Y)。

If V_XAnd V_YIf the first image M is smaller than the threshold viewing angle, which indicates that the size of the first image M is smaller and the ratio of the hand image in the first image M is higher, the whole first image M is input into the trained hand detection network as the second image N, and at least one gesture recognition area included in the second image N is detected.

If V_XAnd (3) calculating the width to be intercepted, V, Of the Region Of Interest (ROI) containing the hand image by adopting a formula (1) when the angle Of view threshold is larger than the angle Of view threshold_XCharacterizing views in the X directionField angle, T_VCharacterizing the threshold of field angle, L_XThe width of the ROI to be truncated is characterized and W represents the width of the first image M.

If V_YAnd (3) calculating the height to be intercepted, V, Of the Region Of Interest (ROI) containing the hand image by adopting a formula (2) when the height is larger than the threshold value Of the field angle_YCharacterizing the field of view, T, in the Y direction_VCharacterizing the threshold of field angle, L_YThe height of the ROI to be intercepted is represented, and H represents the height of the first image M.

Further, in the embodiment of the present application, the upper left corner of the ROI is determined as a corner, and the corner coordinate information of the ROI is calculated by using formula (3). The final ROI dimension information and corner coordinate information may be expressed as (x, y, w, h), (x, y) corner coordinate information characterizing the ROI, w being equal to L_XRepresenting the width to be intercepted of the ROI; h is equal to L_YAnd representing the height to be intercepted of the ROI.

A2: and intercepting a second image N from the first image M according to the size information and the corner point coordinate information of the hand image.

S204: and recognizing the second image N, and determining at least one gesture recognition area on the second image N and a corresponding gesture recognition result.

B1: and inputting the second image N into the trained hand detection network to obtain the number of the gesture recognition areas contained in the second image N, the coordinate information of each gesture recognition area on the second image N, and the probability value (namely the confidence degree of the gesture recognition area) of the hand image in each gesture recognition area.

In order to ensure the accuracy of hand detection, the embodiment of the application adopts a multi-scale SSD algorithm and designs an end-to-end hand detection network. The design and training process of the hand detection network is as follows:

the hand detection network is composed of an input layer, a convolution layer, a pooling layer and an output layer, wherein the convolution layer is used for extracting useful features such as horizontal, vertical, edge or diagonal features from the second image N transmitted by the input layer; the pooling layer is used to increase the receptive field of the extracted features, which is the size of the region corresponding to one pixel back to the first image M, and to reduce the optimization difficulty.

In order to rapidly reduce the size of the second image N, the embodiment of the present application configures a larger sampling step size for the convolutional layer and the pooling layer. For example, if the sampling step size of convolutional layer 1 is 4 and the sampling step sizes of convolutional layer 2, pooling layer 1 and pooling layer 2 are all 2, the second image N is reduced in size by a factor of 32 after passing through two convolutional layers and 2 pooling layers. Furthermore, the embodiment of the application adopts an inclusion module comprising a plurality of convolution branches, wherein the inclusion module is composed of pooling layers and convolution layers with different structures, so that the width of a network is increased, the adaptability of the network to the scale is increased, and the diversity of receptive fields is effectively improved.

Labeling each sample image, calling a rectangular frame containing the hand image on the sample image as a positive rectangular frame, calling a rectangular frame only containing a pure background as a negative rectangular frame, and determining coordinate information corresponding to the positive rectangular frame and the negative rectangular frame;

the method comprises the steps that each time a labeled sample image is read by a hand detection network, coordinate information and category information of a prediction rectangular frame output by the hand detection network are obtained, because the number of negative rectangular frames on the sample image is far higher than that of positive rectangular frames, if parameters of the hand detection network are readjusted based on category error values between all actual rectangular frames and prediction rectangular frames corresponding to the actual rectangular frames, the trained hand detection network is enabled to be biased to output a negative rectangular frame result, and in order to solve the problem, in the embodiment of the application, only all positive rectangular frames and a plurality of negative rectangular frames which are difficult to detect are selected, a classification loss function is called, category error values between the negative rectangular frames and the corresponding prediction rectangular frames are calculated, and category error values between all positive rectangular frames and the corresponding prediction rectangular frames are calculated; calling a position loss function, and only calculating a position error value between the coordinate position of the predicted rectangular frame and the coordinate position of the corresponding normal rectangular frame; finally, based on the category error value and the position error value, readjusting parameters of the hand detection network;

and repeating the training process until the set iteration number is reached, or all the sample images are completely read, or the error value is lower than a set error threshold value, and outputting the trained hand detection network.

B2: and respectively inputting the images corresponding to the gesture recognition areas into the trained gesture classification network to obtain corresponding gesture recognition results.

The gesture classification network consists of a basic network, a full connection layer and a SoftMax classification network, the image is input into the basic network, and finally the SoftMax classification network outputs the gesture category of the image and the confidence coefficient of the gesture category (namely a gesture recognition result). The basic Network may be any one of a Visual Geometry Group Network (VGGNet), a Residual Network (ResNet), AlexNet, or other Convolutional Neural Network (CNN).

The gesture categories may be classified into an invalid gesture, a wake gesture, and a control gesture. The awakening gesture of the embodiment of the application is shown in fig. 4a, the awakening gesture is set to inform the intelligent device that the corresponding operation needs to be executed only by the following action, misoperation is avoided, the problem that the intelligent device determines a gesture control area in a multi-user interaction scene is solved, and it is guaranteed that only one user is allowed to control the intelligent device in the multi-user interaction scene. In this embodiment of the application, the control gesture shown in fig. 4b represents turning up the volume, the control gesture shown in fig. 4c represents turning down the volume, the control gesture shown in fig. 4d represents moving back of the video, the control gesture shown in fig. 4e represents moving fast forward of the video, the control gesture shown in fig. 4f represents determining, the control gesture shown in fig. 4g represents canceling, and the control gesture shown in fig. 4h represents muting/ending. And defining other gestures which do not belong to the preset gestures as invalid gestures.

S205: judging whether all the first images are read completely, if so, executing step 206; otherwise, return to step 202.

S206: and if the same control gesture exceeding the set number threshold exists in the same gesture recognition area, controlling the intelligent equipment to execute corresponding operation based on the same control gesture.

According to the method and the device, static gesture recognition is adopted, namely gesture recognition operation is carried out on one image, a gesture recognition result is output, because the image collector collects a plurality of first images in a set period, if the intelligent device controls the intelligent device to execute corresponding operation according to the gesture recognition result of each first image, the intelligent device can execute the same operation for multiple times in a very short time, and control of the intelligent device is not facilitated. In order to solve the foregoing problem, the smart device according to the embodiment of the present application may comprehensively determine an operation to be finally executed according to a gesture recognition result of multiple frames.

Firstly, every time a first image is read, step 203 and step 204 are executed to obtain a gesture recognition result of a gesture recognition area included in the image, that is, when all the first images are read, at least one gesture recognition area and a gesture recognition result set corresponding to the gesture recognition area are obtained. Therefore, if M consecutive wake-up gestures exist in the same gesture recognition area, the same gesture recognition area is determined as a gesture control area, where M is a positive integer.

Secondly, if the gesture control area has N continuous first operation gestures after the M continuous awakening gestures, controlling the intelligent device to execute corresponding operations based on the first control gestures, wherein N is a positive integer.

For example, the gesture recognition area 1 contains 20 gesture recognition results, where 3 rd to 10 th are all wakeup gestures, and 11 th to 15 th are all control gestures as shown in fig. 4b, the intelligent device increases the volume of the video playing according to the indication of the control gesture.

Further, after the smart device is controlled to execute the corresponding operation based on the first control gesture, if there are N consecutive second control gestures after N consecutive first control gestures in the gesture control area, the smart device is controlled to execute the corresponding operation based on the second control gesture. And N is a positive integer, and the first control gesture and the second control gesture can be the same control gesture or different control gestures.

After the above example is carried out, after the intelligent device performs the operation of turning up the volume of the video playing, if 16 th to 20 th are all the control gestures shown in fig. 4e, the intelligent device will perform a fast forward operation on the currently played video according to the instruction of the control gesture.

Further, after the intelligent device is controlled to execute corresponding operations based on the first control gesture, if the gesture control area is after N consecutive first control gestures and N consecutive second control gestures do not exist, when M consecutive wake-up gestures exist in other gesture recognition areas are determined, the control focus is shifted from the current gesture recognition area to the other gesture recognition areas, and the other gesture recognition areas are determined as new gesture control areas, so that the intelligent device executes new operations according to a gesture recognition result set of the new gesture control areas.

Further, after the intelligent device is controlled to execute corresponding operations based on the first control gesture, if the gesture control area is after N consecutive first control gestures, N consecutive second control gestures do not exist, and M consecutive wake-up gestures do not exist in other gesture recognition areas, the control focus is not transferred until a control gesture meeting requirements appears in the gesture control area, and corresponding operations are executed according to the control gesture. However, after the smart device is restarted, no valid gesture control area is present by default, and the gesture control area needs to be determined again according to the wake-up gesture meeting the requirement.

In some possible embodiments, the aspects of the control method of the smart device provided in the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the traffic control method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 2.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for traffic control of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user equipment, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A smart device, comprising:

a display configured to display a screen;

2. The smart device of claim 1, wherein the controller is configured to:

3. The smart device of claim 2, wherein the controller is further configured to:

4. The smart device of claim 1, wherein the controller is configured to:

wherein M, N are all positive integers.

5. The smart device of claim 4, wherein the controller is configured to:

6. A method of controlling a smart device, comprising:

7. The method of claim 6, wherein the step of cropping a second image from the first image according to a predetermined cropping rule comprises:

8. The method of claim 7, further comprising, prior to determining the size information and corner coordinate information for the hand image:

9. The method of claim 6, wherein if the same control gesture exceeding a set number threshold exists in the same gesture recognition area, controlling the smart device to perform a corresponding operation based on the same control gesture comprises:

wherein M, N are all positive integers.

10. The method of claim 9, after controlling the smart device to perform the respective operation based on the first control gesture, further comprising: