CN111741225A

CN111741225A - Human-computer interaction device, method and computer-readable storage medium

Info

Publication number: CN111741225A
Application number: CN202010786060.6A
Authority: CN
Inventors: 张哲�
Original assignee: Chengdu Jimi Technology Co Ltd
Current assignee: Chengdu Jimi Technology Co Ltd; Chengdu XGIMI Technology Co Ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2020-10-02

Abstract

The present application relates to the field of human-computer interaction technologies, and in particular, to a human-computer interaction device, a human-computer interaction method, and a computer-readable storage medium. The man-machine interaction equipment provided by the embodiment of the application comprises a processing assembly and a camera shooting assembly, wherein the processing assembly is connected with the camera shooting assembly. The processing assembly is used for collecting voice information of an environment where the human-computer interaction equipment is located, determining a voice direction corresponding to the voice information, generating a rotation control instruction according to the voice direction, and sending the rotation control instruction to the camera shooting assembly. The camera shooting assembly is used for executing rotation action according to the rotation control instruction so as to rotate a shooting surface of the camera shooting assembly to a first target position corresponding to the voice direction and collect a scene image at the first target position. The human-computer interaction device, the human-computer interaction method and the computer-readable storage medium can improve the automation degree of the human-computer interaction device.

Description

Human-computer interaction device, method and computer-readable storage medium

Technical Field

The present application relates to the field of human-computer interaction technologies, and in particular, to a human-computer interaction device, a human-computer interaction method, and a computer-readable storage medium.

Background

Human-computer interaction is a study of the interaction between a research system and a target user, and can be various machines (human-computer interaction devices) or computerized systems and software. For the human-computer interaction device, in the prior art, the human-computer interaction device is generally a fixedly arranged machine device, for example, a conference audio-visual device, a banking business handling device, a station ticket booking device, and the like, and in a use process of the existing human-computer interaction device, a target user needs to be fixed at a certain position of the human-computer interaction device, for example, a position facing a camera component in the human-computer interaction device, so that a normal interaction behavior between the target user and the human-computer interaction device can be realized, and therefore, in the prior art, an automation degree of the human-computer interaction device is relatively weak.

Disclosure of Invention

An object of the present application is to provide a human-computer interaction device, a human-computer interaction method, and a computer-readable storage medium, so as to solve the above problems.

In a first aspect, the human-computer interaction device provided by the application comprises a processing component and a camera shooting component, wherein the processing component is connected with the camera shooting component;

the processing assembly is used for acquiring voice information of the environment where the human-computer interaction equipment is located, determining a voice direction corresponding to the voice information, generating a rotation control instruction according to the voice direction, and sending the rotation control instruction to the camera assembly;

the camera shooting assembly is used for executing rotation action according to the rotation control instruction so as to rotate a shooting surface of the camera shooting assembly to a first target position corresponding to the voice direction and collect a scene image at the first target position.

With reference to the first aspect, an embodiment of the present application further provides a first optional implementation manner of the first aspect, where the processing component includes a voice acquisition device and a processing device, the voice acquisition device is connected to the processing device, and the processing device is further connected to the camera component;

the voice collecting device is used for collecting voice information of the environment where the human-computer interaction equipment is located, determining a voice direction corresponding to the voice information and sending the voice direction to the processing device;

the processing device is used for generating a rotation control instruction according to the voice direction and sending the rotation control instruction to the camera shooting assembly.

With reference to the first aspect, an embodiment of the present application further provides a second optional implementation manner of the first aspect, where the human-computer interaction device further includes a first display device, and the first display device is connected to the processing component;

the camera shooting assembly is also used for sending the scene image to the processing assembly;

the processing component is further used for determining a first person image from the scene image, generating a target display image according to the first person image and sending the target display image to the first display device;

the first display device is used for displaying a target display image.

In the above embodiment, the human-computer interaction device further includes a first display device, the first display device is connected to the processing component, the camera component is further configured to send the scene image to the processing component, the processing component is further configured to determine a first person image from the scene image, generate a target display image according to the first person image, and send the target display image to the first display device, and the first display device is configured to display the target display image, so that interactivity between the target user and the human-computer interaction device is enhanced.

With reference to the second optional implementation manner of the first aspect, an embodiment of the present application further provides a third optional implementation manner of the first aspect, where the processing component is configured to determine a first person image and first expression information of the first person image from the scene image, simulate a second person image according to the first expression information, serve as a target display image, and send the target display image to the first display device.

With reference to the first aspect, an embodiment of the present application further provides a fourth optional implementation manner of the first aspect, and the processing component is further connected to a second display device;

the processing component is further used for determining a first person image from the scene image and sending the first person image to the second display device;

the second display device is used for displaying the first human image.

In the above embodiment, the processing component is further connected to the second display device, and the processing component is further configured to determine the first person image from the scene image, and send the first person image to the second display device, and the second display device is configured to display the first person image, so that the human-computer interaction device can be applied to the teleconference system, and the applicable range of the human-computer interaction device is increased.

With reference to the fourth optional implementation manner of the first aspect, an embodiment of the present application further provides a fifth optional implementation manner of the first aspect, where the processing component is configured to, after determining the first person image from the scene image, obtain person identity information corresponding to the first person image, and send the first person image and the person identity information to the second display device;

the second display device is used for displaying the first person image and the person identity information.

With reference to the second, third, fourth, or fifth optional implementation manner of the first aspect, an embodiment of the present application further provides a sixth optional implementation manner of the first aspect, where after the processing component determines the first person image from the scene image, the processing component is further configured to determine a target face from the first person image, determine position information of the target face in the scene image, generate a fine adjustment instruction according to the position information, and send the fine adjustment instruction to the image capturing component;

the camera shooting assembly is used for executing fine adjustment action according to the fine adjustment instruction so as to adjust the shooting surface of the camera shooting assembly to a second target position corresponding to the first person image.

With reference to the sixth optional implementation manner of the first aspect, an embodiment of the present application further provides a seventh optional implementation manner of the first aspect, where the image capturing assembly includes a first rotation control component and a camera, and the first rotation control component and the camera are respectively connected to the processing assembly;

the first rotating control component is used for receiving the fine adjustment instruction and executing fine adjustment action according to the fine adjustment instruction so as to drive the camera to rotate in a first direction, the shooting surface of the camera is adjusted to a second target position corresponding to the first person image, and the first direction is a vertical direction.

With reference to the first aspect, an embodiment of the present application further provides an eighth optional implementation manner of the first aspect, where the camera assembly includes a second rotation control component and a camera, and the second rotation control component and the camera are respectively connected to the processing assembly;

the second rotation control part is used for receiving the rotation control instruction and executing rotation action according to the rotation control instruction so as to drive the camera to rotate in a second direction, and the shooting surface of the camera is rotated to a first target position corresponding to the voice direction, wherein the second direction is a horizontal direction.

With reference to the first aspect, an embodiment of the present application further provides a ninth optional implementation manner of the first aspect, and the human-computer interaction device further includes a storage case and a lifting control assembly, where the lifting control assembly is connected to the processing assembly and the camera assembly respectively;

the processing component is also used for sending the starting control instruction or the shutdown control instruction to the lifting control component when receiving the starting control instruction or the shutdown control instruction;

the lifting control assembly is used for controlling the camera shooting assembly to lift up from the containing shell to be exposed outside the containing shell when receiving a starting control instruction, and is used for controlling the camera shooting assembly to be contained inside the containing shell from a position exposed outside the containing shell when receiving a closing control instruction.

With reference to the first aspect, an embodiment of the present application further provides a tenth optional implementation manner of the first aspect, where the human-computer interaction device further includes a projection host, and the projection host is connected to the processing component;

the processing component is also used for determining a projection region characteristic image from the scene image and performing color segmentation on the projection region characteristic image through a linear filtering difference method to obtain a characteristic region;

the processing component is also used for matching the characteristic points according to the characteristic areas and carrying out trapezoidal correction on the projection picture of the projection host according to the characteristic point matching result.

With reference to the tenth optional implementation manner of the first aspect, this application example further provides an eleventh optional implementation manner of the tenth aspect, and the linear filtering difference method may be represented by the following calculation formula:

where f is the differential value of the linear filtering, c is the differential increment, o is the filtering range, L is the light intensity value, and i is the index value of the pixel value scanned by the line.

In a second aspect, the human-computer interaction method provided in the embodiment of the present application is applied to the human-computer interaction device provided in the first aspect or any optional implementation manner of the first aspect;

acquiring voice information of the environment where the human-computer interaction equipment is located through the processing assembly, determining a voice direction corresponding to the voice information, generating a rotation control instruction according to the voice direction, and sending the rotation control instruction to the camera assembly;

the camera shooting assembly executes a rotating action according to the rotating control instruction so as to rotate a shooting surface of the camera shooting assembly to a first target position corresponding to the voice direction and acquire a scene image at the first target position.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed, the human-computer interaction method provided in the second aspect is implemented.

The man-machine interaction equipment provided by the embodiment of the application comprises a processing component and a camera component, wherein the processing component is used for acquiring the voice information of the environment where the man-machine interaction equipment is located and determining the voice direction corresponding to the voice information, so as to generate a rotation control instruction according to the voice direction and send the rotation control instruction to the camera shooting component, the camera shooting component is used for executing rotation action according to the rotation control instruction so as to rotate the shooting surface of the camera shooting component to a first target position corresponding to the voice direction and acquire a scene image at the first target position, wherein the voice message can be sent by the target user, so that the camera assembly can automatically rotate to the position of the target user, namely, the first target position, therefore, the normal interaction behavior of the target user and the human-computer interaction equipment is realized, and the automation degree of the human-computer interaction equipment is improved.

The human-computer interaction device and the computer-readable storage medium provided by the present application have the same beneficial effects as the human-computer interaction device provided by the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic structural block diagram of a human-computer interaction device according to an embodiment of the present disclosure.

Fig. 2 is another schematic structural block diagram of a human-computer interaction device according to an embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of a human-computer interaction device according to an embodiment of the present application.

Fig. 4 is a block diagram of another schematic structure of a human-computer interaction device according to an embodiment of the present disclosure.

Fig. 5 is a block diagram of another schematic structure of a human-computer interaction device according to an embodiment of the present disclosure.

Fig. 6 is another schematic structural diagram of a human-computer interaction device according to an embodiment of the present application.

Fig. 7 is a flowchart illustrating steps of a human-computer interaction method according to an embodiment of the present disclosure.

Reference numerals: 100-a human-computer interaction device; 110-a processing component; 111-a voice acquisition device; 112-a processing device; 120-a camera assembly; 121-a first rotation control member; 122-a camera; 123-a second rotation control member; 130-a base; 140-a first display device; 150-projection host.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Further, it is noted that in the description of the present application, like reference numerals and letters refer to like items in the following drawings, and thus, once an item is defined in one drawing, it is not necessary to further define and explain it in the following drawings.

Referring to fig. 1, a schematic structural block diagram of a human-computer interaction device 100 according to an embodiment of the present disclosure is shown, where the human-computer interaction device 100 according to the embodiment of the present disclosure includes a processing component 110 and a camera component 120, and the processing component 110 is connected to the camera component 120. The processing component 110 is configured to collect voice information of an environment where the human-computer interaction device 100 is located, determine a voice direction corresponding to the voice information, generate a rotation control instruction according to the voice direction, and send the rotation control instruction to the camera component 120, and the camera component 120 is configured to execute a rotation action according to the rotation control instruction, so as to rotate a shooting surface of the camera component 120 to a first target position corresponding to the voice direction, and collect a scene image at the first target position.

It can be understood that, in the embodiment of the present application, the voice information may be sent by the target user, so that the camera module 120 can automatically rotate to the position where the target user is located, that is, the first target position, and then the captured scene image at the first target position is the captured character image of the target user, so as to implement a normal interaction behavior between the target user and the human-computer interaction device 100, so as to improve the automation degree of the human-computer interaction device 100.

Referring to fig. 2, in the embodiment of the present application, as an optional implementation manner, for the processing component 110, the processing component may include a voice acquisition device 111 and a processing device 112, where the voice acquisition device 111 is connected to the processing device 112, and the processing device 112 is further connected to the camera component 120. The voice collecting device 111 is configured to collect voice information of an environment where the human-computer interaction device 100 is located, and determine a voice direction corresponding to the voice information, so as to send the voice direction to the processing device 112, and the processing device 112 is configured to generate a rotation control instruction according to the voice direction, and send the rotation control instruction to the camera module 120.

Referring to fig. 3, in terms of mechanical structure design, the human-computer interaction device provided in the embodiment of the present application may include a base 130, based on which, in the embodiment of the present application, the camera assembly 120 may be disposed above the base 130 and can rotate relative to the base 130, the voice collecting device 111 may be disposed on the base 130, and the processing device 112 may be disposed inside the base 130. Further, in this embodiment of the application, the voice collecting device 111 may include a plurality of sound collectors and a microprocessor, the plurality of sound collectors are disposed at the periphery of the base 130 in an array manner and are respectively connected to the microprocessor, based on this, the plurality of sound collectors collect the voice information of the environment where the human-computer interaction device 100 is located, and send the respective collected voice information to the microprocessor, the microprocessor may determine the voice direction corresponding to the voice information according to the strength of the voice information, and send the voice direction to the processing device 112.

In the embodiment of the present Application, the microprocessor may be an Integrated Circuit chip having Signal processing capability, or may be a general-purpose Processor, for example, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the disclosed logic block diagram in the embodiment of the present Application. Also, in the embodiment of the present application, the processing device 112 may be an integrated circuit chip having signal processing capability, and the processing device 112 may also be a general-purpose processor, for example, a DSP, an ASIC, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or execute the logic block disclosed in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the microprocessor and the processing device 112 may be two independent integrated circuit chips or two independent general-purpose processors, but they may also be the same integrated circuit chip or general-purpose processor, and the embodiment of the present application is not limited thereto.

Further, please refer to fig. 4, in an embodiment of the present application, the human-computer interaction device 100 may further include a first display device 140, and the first display device 140 is connected to the processing component 110. Based on this, in the embodiment of the present application, the camera component 120 is further configured to send the scene image to the processing component 110, the processing component 110 is further configured to determine a first person image from the scene image, to generate a target display image according to the first person image, and send the target display image to the first display device 140, where the first display device 140 is configured to display the target display image.

As an optional implementation manner, in this embodiment of the application, the processing component 110 is specifically configured to determine the first person image and the first expression information of the first person image from the scene image, simulate a second person image according to the first expression information, serve as a target display image, and send the target display image to the first display device 140. In practical implementation, after obtaining the first expression information, the second expression information corresponding to the first expression information may be obtained from a preset expression information repository, a second person image including the second expression information is simulated as a target display image, and the target display image is sent to the first display device 140. It should be noted that, in this embodiment of the application, the correspondence policy of the first expression information and the second expression information may be determined by user setting, and specifically includes a first correspondence policy and a second correspondence policy, where the first correspondence policy may instruct to acquire, from a preset expression information repository, an expression that is the same as or similar to an expression of the first expression information as the second expression information, and the second correspondence policy may instruct to acquire, from the preset expression information repository, an expression that is opposite to the expression of the first expression information as the second expression information. After receiving a selection instruction triggered by a user, the processing component 110 determines a corresponding policy corresponding to the selection instruction as a target policy, and acquires second expression information corresponding to the first expression information from a preset expression information repository according to the target policy.

For example, in this embodiment of the application, if the target policy is the first corresponding policy, when it is determined that the first expression information of the first person image is "surprise", the second expression information obtained from the expression information repository may also be "surprise", and for example, in this embodiment of the application, if the target policy is the second corresponding policy, when it is determined that the first expression information of the first person image is "sadness", the second expression information obtained from the expression information repository may be "happy", so as to enhance interactivity between the target user and the human-computer interaction device 100.

In order to expand the applicable range of the human-computer interaction device 100 provided in the embodiment of the present application, the processing component 110 is further connected to a second display device (not shown in the figure), and the processing component 110 is further configured to determine a first person image from the scene image and send the first person image to the second display device, and the second display device is configured to display the first person image, so that the human-computer interaction device 100 can be applied to a teleconference system, thereby improving the applicable range of the human-computer interaction device 100, and when the human-computer interaction device 100 is applied to the teleconference system, the human-computer interaction device 100 can be disposed at a first conference site where a target user is located, and a second display can be disposed at a second conference site where other conference participants are located. In addition, in the embodiment of the present application, the processing component 110 and the second display device may be interconnected through remote wireless communication.

In order to facilitate other parties to obtain identity information of a target user, in this embodiment of the application, the processing component 110 is specifically configured to determine a first person image from a scene image, obtain person identity information corresponding to the first person image, and send the first person image and the person identity information to the second display device, where the second display device is specifically configured to display the first person image and the person identity information. In actual implementation, the processing component 110 is specifically configured to extract, after determining the first person image from the scene image, the person identity information corresponding to the first person image from the preset information database, and send the first person image and the person identity information to the second display device, where the person identity information may include a name, a supply and employment unit, a job title, a work experience, and the like. It should be noted that, in the embodiment of the present application, if the personal identity information corresponding to the first personal image is not stored in the preset information database, a missing prompt is generated and displayed to remind the target user to enter the personal identity information of the target user into the information database.

It should be noted that, in the embodiment of the present application, after receiving the power-on control command from the human-computer interaction device 100, after the corresponding boot-up operation is started, the processing element 110 monitors whether a mode adjustment command is received, and if a first mode adjustment command for setting the operating mode to the interactive mode is received, the processing component 110 performs the steps of determining a first person image from the scene image, generating a target display image from the first person image, and sending the target display image to the first display device 140, so that the first display device 140 displays the target display image, and if a second mode adjustment and control command for setting the operation mode to the conference mode is received, the processing component 110 performs the act of determining a first person image from the scene image and sending the first person image to the second display device to cause the second display device to display the first person image.

Further, in order to ensure that the shooting surface of the camera module 120 is directed toward the target user, in this embodiment of the application, after the processing module 110 determines the first person image from the scene image, it is further configured to determine the target face from the first person image, and determine the position information of the target face in the scene image, so as to generate a fine adjustment instruction according to the position information, and send the fine adjustment instruction to the camera module 120, and the camera module 120 is configured to perform a fine adjustment operation according to the fine adjustment instruction, so as to adjust the shooting surface of the camera module 120 to a second target position corresponding to the first person image. For example, after the processing component 110 determines the first person image from the scene image, determines the target face from the first person image, and determines the position information of the target face in the scene image, if the target face is located in the left part of the scene image, then a fine adjustment instruction for controlling the camera face of the camera component 120 to rotate towards the left side is generated, if the target face is located in the right part of the scene image, then a fine adjustment instruction for controlling the camera face of the camera component 120 to rotate towards the right side is generated, and as for the corresponding fine adjustment action amplitude, it may be determined according to the specific position of the target face in the scene image, and for example, after the processing component 110 determines the first person image from the scene image, and determines the target face from the first person image, and determines the position information of the target face in the scene image, if the target face is located in the upper part of the scene image, a fine-tuning instruction for controlling the shooting surface of the camera module 120 to rotate upward is generated, if the target face is located at the lower part of the scene image, a fine-tuning instruction for controlling the shooting surface of the camera module 120 to rotate downward is generated, and as for the corresponding fine-tuning action amplitude, the fine-tuning action amplitude can be determined according to the specific position of the target face in the scene image.

Based on the above description, please refer to fig. 5, regarding the camera assembly 120, in terms of mechanical structure design, it may include a first rotation control component 121 and a camera 122, where the first rotation control component 121 and the camera 122 are respectively connected to the processing assembly 110, and are configured to receive a fine adjustment instruction, and perform a fine adjustment action according to the fine adjustment instruction, so as to drive the camera 122 to rotate in a first direction, and adjust a shooting surface of the camera 122 to a second target position corresponding to a first person image, where the first direction is a vertical direction. Meanwhile, the camera module 120 further includes a second rotation control component 123 and a camera 122, where the second rotation control component 123 and the camera 122 are respectively connected to the processing module 110, and are configured to receive the rotation control instruction, and execute a rotation action according to the rotation control instruction, so as to drive the camera 122 to rotate in a second direction, and rotate the shooting surface of the camera 122 to a first target position corresponding to the voice direction, where the second direction is a horizontal direction. It is understood that, in the embodiment of the present application, the first direction is a vertical direction, that is, an up-down direction, and the second direction is a horizontal direction, that is, a left-right direction.

Further, in the embodiment of the present application, the human-computer interaction device 100 further includes a storage case (not shown in the figure) and a lifting control component (not shown in the figure), and the lifting control component is respectively connected to the processing component 110 and the camera component 120, based on this, in the embodiment of the present application, the processing component 110 is further configured to, when receiving a power-on control instruction or a power-off control instruction, the power-on control command or the power-off control command is sent to the lifting control component, the lifting control component is used for controlling the camera component 120 to lift up from the containing shell to be exposed outside the containing shell when receiving the power-on control command, and for controlling the camera assembly 120 to be received into the housing from a position exposed to the outside of the housing upon receiving a shutdown control command, so as to protect the camera assembly 120 and thus prolong the service life of the human-computer interaction device 100.

Please refer to fig. 6, in order to expand the application range of the human-computer interaction device 100, in the embodiment of the present application, the human-computer interaction device 100 further includes a projection host 150, the projection host 150 is connected to the processing component 110, the camera component 120 may be disposed on the projection host 150, for example, disposed right above the projection host 150, or disposed right below the projection host 150, and fixedly connected to the projection host 150, specifically, fixedly connected to the projection host 150 through the base 130, based on which the camera component 120 is further configured to send a scene image to the processing component 110, the processing component 110 is configured to determine a projection region feature image from the scene image, and decode the projection region feature image based on a De Bruijn pseudo-random sequence structured light image decoding method, and decode the projection region feature image may be understood as stripe-dividing, fringe center point extraction and color segmentation. Taking color segmentation as an example, in the embodiment of the present application, the projection region feature image may be color-segmented by a linear filtering difference method to obtain a feature region, and then, feature point matching is performed according to the feature region, and trapezoidal correction is performed on the projection screen of the projection host 150 according to a feature point matching result. Specifically, the matching of the feature points according to the feature region may include matching and comparing the feature region with a standard square image feature region to obtain a torsion degree of the feature region, and using the torsion degree as a feature point matching result to perform trapezoidal correction on the projection image of the projection host 150 according to the feature point matching result.

In the embodiment of the present application, the linear filtering difference method may be represented by the following calculation formula:

Further, in this embodiment of the application, the processing component 110 may further obtain an optimal projection focal length of the projection host 150 through a voting algorithm, and in practical implementation, the processing component 110 is configured to control the camera component 120 to acquire a plurality of scene images at different projection focal lengths, and determine a plurality of projection area characteristic images from the plurality of scene images, and then obtain a sharpness characterization coefficient of each projection area characteristic image, so as to determine a projection focal length corresponding to the maximum sharpness characterization coefficient, which is used as the target focal length.

In the embodiment of the present application, the voting algorithm can be represented by the following calculation formula:

and if the p (x, j) corresponding pixel point j reaches the clearest point at the focal length x, and the pixel point j is a characteristic pixel point in any projection region characteristic image.

Based on the same inventive concept as the human-computer interaction device, the embodiment of the present application further provides a human-computer interaction method, which is applied to the human-computer interaction device, and please refer to fig. 7, which is a schematic flow diagram of the human-computer interaction method provided by the embodiment of the present application. It should be noted that the human-computer interaction method provided in the embodiment of the present application is not limited by the sequence shown in fig. 7 and the following, and the specific flow and steps of the substance synthesis control method are described below with reference to fig. 7.

And S100, acquiring voice information of the environment where the human-computer interaction equipment is located through the processing assembly, determining a voice direction corresponding to the voice information, generating a rotation control instruction according to the voice direction, and sending the rotation control instruction to the camera assembly.

And step S200, executing a rotation action by the camera shooting component according to the rotation control instruction so as to rotate the shooting surface of the camera shooting component to a first target position corresponding to the voice direction, and acquiring a scene image at the first target position.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed, the human-computer interaction method provided in the foregoing method embodiment is implemented.

In summary, the human-computer interaction device provided by the embodiment of the present application includes a processing component and a camera component, the processing component is configured to collect voice information of an environment where the human-computer interaction device is located, and determine a voice direction corresponding to the voice information, so as to generate a rotation control instruction according to the voice direction and send the rotation control instruction to the camera shooting component, the camera shooting component is used for executing rotation action according to the rotation control instruction so as to rotate the shooting surface of the camera shooting component to a first target position corresponding to the voice direction and acquire a scene image at the first target position, wherein the voice message can be sent by the target user, so that the camera assembly can automatically rotate to the position of the target user, namely, the first target position, therefore, the normal interaction behavior of the target user and the human-computer interaction equipment is realized, and the automation degree of the human-computer interaction equipment is improved.

In addition, the human-computer interaction method and the electronic device provided in the embodiment of the present application have the same beneficial effects as those of the human-computer interaction device, and are not described herein again.

In the description of the present application, it should be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "disposed" should be interpreted broadly, for example, they may be mechanically fixed, detachably connected or integrally connected, they may be electrically connected, and they may be communicatively connected, where the communications connection may be a wired communications connection or a wireless communications connection, and furthermore, they may be directly connected, indirectly connected through an intermediate medium, or be communicated between two elements.

Furthermore, in the description of the present application, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

The above description is only a few examples of the present application and is not intended to limit the present application, and those skilled in the art will appreciate that various modifications and variations can be made in the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The human-computer interaction equipment is characterized by comprising a processing component and a camera shooting component, wherein the processing component is connected with the camera shooting component;

the camera shooting assembly is used for executing a rotating action according to the rotating control instruction so as to rotate a shooting surface of the camera shooting assembly to a first target position corresponding to the voice direction and collect a scene image at the first target position.

2. The human-computer interaction device of claim 1, wherein the processing component comprises a voice acquisition device and a processing device, the voice acquisition device is connected with the processing device, and the processing device is further connected with the camera component;

the voice acquisition device is used for acquiring voice information of the environment where the human-computer interaction equipment is located, determining a voice direction corresponding to the voice information and sending the voice direction to the processing device;

3. The human-computer interaction device of claim 1, further comprising a first display device, the first display device being connected to the processing component;

the processing component is further used for determining a first person image from the scene image, generating a target display image according to the first person image, and sending the target display image to the first display device;

the first display device is used for displaying the target display image.

4. The human-computer interaction device of claim 3, wherein the processing component is configured to determine the first person image and first facial expression information of the first person image from the scene image, simulate a second person image as the target display image according to the first facial expression information, and send the target display image to the first display device.

5. A human-computer interaction device according to claim 1, wherein the processing component is further connected to a second display device;

the second display device is used for displaying the first human image.

6. The human-computer interaction device of claim 5, wherein the processing component is configured to obtain person identity information corresponding to a first person image after the first person image is determined from the scene image, and send the first person image and the person identity information to the second display device;

7. The human-computer interaction device according to any one of claims 3 to 6, wherein after the processing component determines the first human face image from the scene image, the processing component is further configured to determine a target human face from the first human face image, determine position information of the target human face in the scene image, generate a fine adjustment instruction according to the position information, and send the fine adjustment instruction to the camera component;

the camera shooting assembly is used for executing fine adjustment action according to the fine adjustment instruction so as to adjust the shooting surface of the camera shooting assembly to a second target position corresponding to the first human image.

8. The human-computer interaction device of claim 7, wherein the camera assembly comprises a first rotation control component and a camera, and the first rotation control component and the camera are respectively connected with the processing assembly;

the first rotation control component is used for receiving the fine adjustment instruction and executing fine adjustment action according to the fine adjustment instruction so as to drive the camera to rotate in a first direction, and adjusting a shooting surface of the camera to a second target position corresponding to the first person image, wherein the first direction is a vertical direction.

9. The human-computer interaction device of claim 1, wherein the camera assembly comprises a second rotation control component and a camera, and the second rotation control component and the camera are respectively connected with the processing assembly;

the second rotation control component is used for receiving the rotation control instruction and executing rotation action according to the rotation control instruction so as to drive the camera to rotate in a second direction, and the shooting surface of the camera is rotated to a first target position corresponding to the voice direction, wherein the second direction is a horizontal direction.

10. The human-computer interaction device of claim 1, further comprising a storage case and a lifting control assembly, wherein the lifting control assembly is connected with the processing assembly and the camera assembly respectively;

the processing component is also used for sending a starting-up control instruction or a shutdown control instruction to the lifting control component when receiving the starting-up control instruction or the shutdown control instruction;

the lifting control assembly is used for controlling the camera shooting assembly to lift from the containing shell to be exposed outside the containing shell when receiving the starting control instruction, and controlling the camera shooting assembly to be contained into the containing shell from a position exposed outside the containing shell when receiving the shutdown control instruction.

11. The human-computer interaction device of claim 1, further comprising a projection host, the projection host being connected to the processing component;

the processing component is further used for determining a projection region characteristic image from the scene image and performing color segmentation on the projection region characteristic image through a linear filtering difference method to obtain a characteristic region;

and the processing component is also used for matching the feature points according to the feature areas and performing trapezoidal correction on the projection picture of the projection host according to the feature point matching result.

12. A human-computer interaction device according to claim 11, wherein the linear filtering difference method is represented by the following calculation formula:

13. A human-computer interaction method, which is applied to the human-computer interaction device of any one of claims 1 to 12;

acquiring voice information of the environment where the man-machine interaction equipment is located through the processing assembly, determining a voice direction corresponding to the voice information, generating a rotation control instruction according to the voice direction, and sending the rotation control instruction to the camera assembly;

and executing a rotation action by the camera shooting assembly according to the rotation control instruction so as to rotate the shooting surface of the camera shooting assembly to a first target position corresponding to the voice direction and acquire a scene image at the first target position.

14. A computer-readable storage medium, having stored thereon a computer program which, when executed, implements the human-computer interaction method of claim 13.