CN110673722A

CN110673722A - Human-computer interaction mode display method, device and equipment

Info

Publication number: CN110673722A
Application number: CN201910838028.5A
Authority: CN
Inventors: 王淳; 杜志军; 周明才; 周大江
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2020-01-10

Abstract

The embodiment of the specification discloses a man-machine interaction mode display method, a man-machine interaction mode display device and equipment. The method comprises the following steps: understanding scene characteristics of a human-computer interaction scene by using an environment understanding model, and inputting the scene characteristics into a decision model; and predicting and displaying a man-machine interaction mode suitable for the man-machine interaction scene based on the influence of the pre-learned scene characteristics on different man-machine interaction modes by the decision model so as to provide the user with man-machine interaction.

Description

Human-computer interaction mode display method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for displaying a human-computer interaction mode.

Background

The man-machine interaction refers to an information exchange process for completing a determined task by using a certain dialogue language between a person and equipment in a certain man-machine interaction mode. The existing offline equipment generally adopts a fixed man-machine interaction mode.

Therefore, there is a need to provide a more reliable solution.

Disclosure of Invention

The embodiment of the specification provides a man-machine interaction mode display method which is used for solving the problem that a fixed man-machine interaction mode is too limited.

An embodiment of the present specification further provides a human-computer interaction mode display method, including:

the method comprises the steps of obtaining a plurality of scene characteristics of a human-computer interaction scene, wherein the human-computer interaction scene comprises a scene in which a user and equipment perform human-computer interaction, and the scene characteristics are used for representing scene understanding of the human-computer interaction scene;

inputting the scene features into a decision model to predict a first human-computer interaction mode corresponding to the scene features, wherein the decision model is obtained by training based on historical sample data and corresponding human-computer interaction mode labels, and the historical sample data has the features with the same dimensionality as the scene features;

and displaying a human-computer interaction interface corresponding to the first human-computer interaction mode to the user.

An embodiment of the present specification further provides a human-computer interaction mode display device, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of scene characteristics of a human-computer interaction scene, the human-computer interaction scene comprises a scene in which a user and equipment perform human-computer interaction, and the scene characteristics are used for representing scene understanding of the human-computer interaction scene;

the decision module is used for inputting the scene features into a decision model so as to predict a first human-computer interaction mode corresponding to the scene features, the decision model is obtained based on historical sample data and corresponding human-computer interaction mode labels through training, and the historical sample data has the features with the same dimensionality as the scene features;

and the display module is used for displaying the human-computer interaction interface corresponding to the first human-computer interaction mode to the user.

An embodiment of the present specification further provides an electronic device, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the steps of the human interactive mode exposure method as described above.

The embodiment of the present specification further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the human-computer interaction mode display method are implemented as described above.

One embodiment of the specification realizes that a human-computer interaction mode suitable for a human-computer interaction scene is dynamically decided by a decision model based on scene characteristics of the human-computer interaction scene where a user is and interaction characteristics of different human-computer interaction modes.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

fig. 1 is a schematic diagram of an application scenario provided in the present specification;

fig. 2 is a schematic flowchart of a human-computer interaction mode display method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating the steps of modal processing provided in one embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating steps provided in an embodiment of the present disclosure for collecting environmental information;

FIG. 5 is a flowchart illustrating a step of parsing scene features according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a human-computer interaction mode display device according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in this description belong to the protection scope of this document.

An application scenario of the present specification is exemplarily described below with reference to fig. 1.

The first application scenario comprises: a user 101 and a terminal device 102 having a human-computer interaction function, wherein:

a user 101 operates a terminal device 102 to enter an interactive interface so as to trigger the terminal device 102 to provide a man-machine interaction service;

the terminal equipment 102 collects and analyzes scene information of a human-computer interaction scene; and deciding a man-machine interaction mode which is centralized and suitable for the man-machine interaction scene based on the analysis result, and providing the service of the man-machine interaction mode for the user 101.

The second application scenario includes: a user 101, a terminal device 102 with a man-machine interaction function, and a server (not shown in the figure), wherein:

the terminal equipment 102 collects scene information of a human-computer interaction scene and sends the scene information to the server;

the server analyzes the scene information and decides a man-machine interaction mode which is centralized and suitable for the man-machine interaction scene based on an analysis result; and instructing the terminal equipment 102 to provide the service of the man-machine interaction mode for the user 101.

The scene information of the human-computer interaction scene refers to an information set composed of physical environment information related to human-computer interaction between the user 101 and the terminal device 102, behavior information of the user 101 and other surrounding users, and the like; the human-computer interaction mode set refers to a set of human-computer interaction modes that can be selected by the user 101 for human-computer interaction with the terminal device 102, and includes: keyboard, biometric, etc.; the terminal device 102 refers to a device that can provide a human-computer interaction service for a user to handle related services, and may be a computer device fixed at a specified location, for example: the self-service terminal disposed in a hospital, a bank, or the like may also be a computer device that can be used in a mobile, such as: mobile phones, notebooks, tablet computers, POS machines, etc.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 2 is a schematic flow chart of a human-computer interaction mode display method provided in an embodiment of the present specification, and referring to fig. 2, the method may specifically include the following steps:

202, acquiring a plurality of scene characteristics of a human-computer interaction scene, wherein the human-computer interaction scene comprises a scene in which a user and equipment perform human-computer interaction, and the scene characteristics are used for representing scene understanding of the human-computer interaction scene;

the device refers to a device that performs human-computer interaction with the user, for example: terminal device 102 in fig. 1; the feature dimensions of the scene features are set based on interactive features of a human-computer interaction mode, for example: the interactive characteristics of the voice interactive mode comprise: a quiet environment is required, and the correspondingly set characteristic dimensions at least comprise: noise characteristics in a human-computer interaction scene; another example is: the interactive characteristics of the face recognition interactive mode comprise: and if the face of the user is required to be complete, the correspondingly set feature dimensions at least comprise: whether the face is covered.

It should be noted that, one implementation manner of step 202 may be:

step S1, acquiring physical environment information and behavior environment information in the human-computer interaction scene;

wherein the behavioral environment information includes: behavior information of the user and other users within the preset range of the user; the physical environment information includes: natural physical environment information and artificial physical environment information, the natural physical environment information including: natural sound environment, vibration environment, electromagnetic environment, radiation environment, light environment, thermal environment, etc., and the artificial physical environment information includes: artificial noise environments, artificial vibration environments, artificial electromagnetic environments, artificial radiation environments, artificial light environments, artificial heat environments, and the like.

Note that, the implementation manner of step S1 may be:

acquiring sensor data and multimedia data of the equipment, wherein the sensor data comprises environment information of the human-computer interaction scene sensed by the equipment, and the multimedia data comprises multimedia information in the human-computer interaction scene acquired by the equipment; and obtaining physical environment information and behavior environment information in the human-computer interaction scene based on the sensor data and the multimedia data. Referring to fig. 4, specific examples may be:

example 1, an environmental image or a video of a preset peripheral area of the terminal device 102 is acquired by an image acquisition device installed in the terminal device 102 or by calling an image acquisition device of a third party.

The image capturing device includes, but is not limited to, one or more of a grayscale camera, a color camera, an infrared camera, a binocular stereo camera, a depth perception camera, and the like.

Example 2, audio information of a preset peripheral area of the terminal device 102 is acquired through an audio acquisition device installed in the terminal device 102 or an audio acquisition device of a third party is called; the audio information includes: ambient sounds and human speech.

The audio capture device includes, but is not limited to, one or more microphones, microphone arrays, and the like.

Example 3, sensing data in a preset peripheral area of the smart terminal 102 by a sensor installed in the terminal device 102 or a sensor of a third party is called, and obtaining physical environment information except the data collected in examples 1 and 2.

Wherein, the physical signal acquisition device includes but is not limited to one or more barometers, thermometers, anemometers, PM2.5 detectors, etc.; accordingly, the collected physical environment signals include, but are not limited to, air pressure information, temperature information, wind speed information, PM2.5, and other sensor data.

Example 4, for other types of environment information, a corresponding information acquisition device may also be flexibly configured for the terminal device 102 according to specific scene requirements, business requirements, and the like.

The sensor data and the multimedia data in the scene information collected in the above examples 1 to 4 are comprehensively analyzed and normalized to obtain the physical environment information and the behavior environment information in the human-computer interaction scene, for example: and regulating the images and videos containing the user and the surrounding users into behavior environment information, and regulating the physical environment signals sensed by the sensor into physical environment information.

Based on this, in the embodiments of the present specification, a plurality of types of information acquisition devices are configured for the terminal device 102 to acquire a plurality of types of environment information in a human-computer interaction scene, so that scene information that can embody the human-computer interaction scene in detail is obtained, and sufficient data support is provided for subsequent decisions.

Step S2, inputting the physical environment information and the behavior environment information into an environment understanding model to obtain a plurality of scene characteristics of the man-machine interaction scene, wherein the environment understanding model is used for analyzing the physical environment information and the behavior environment information to obtain the scene characteristics. Referring to fig. 3, the implementation of step S2 may be:

step 302, generating single-mode feature vectors corresponding to data of a plurality of preset data modes in the physical environment information and the behavior environment information; wherein the preset data modality at least comprises: text, pictures, audio-visual, etc.;

step 304, fusing the single-mode feature vectors to obtain multi-mode feature vectors; the modal fusion refers to a process of performing cross fusion on different single-modal data such as texts, pictures, videos and audios;

for step 302 and step 304, specific examples thereof may be:

firstly, generating respective single-mode feature vectors for data of different data modes through different algorithms; then, a plurality of independent single-mode feature vectors are fused into a multi-mode feature vector through a feature fusion algorithm. The algorithm corresponding to the generation of the monomodal feature vector and the feature fusion algorithm can be realized by machine learning algorithm distribution such as a neural network and a graph model.

Step 306, inputting the multi-modal feature vectors into the environment understanding model, and understanding scene features in the human-computer interaction scene by the input environment understanding model by using one or more understanding algorithms, for example: the method comprises the steps of identifying and processing image data of a single mode, and understanding scene characteristics such as 'whether a user is raining' and 'whether people queue up behind the user' and the like; alternatively, it is possible to understand the scene characteristics of whether the person currently speaking is the user' using the image data of the single modality and the voice data of the single modality.

The scene features at least comprise the physical environment, the interactive behavior between the physical environment and the user, the interactive behavior between the user and other types of scene features.

To obtain a plurality of different types of scene features to provide sufficient data support for the following, referring to fig. 5, one implementation of step 306 may be:

determining user state features in the human-computer interaction scene based on the behavior environment information; determining environmental state characteristics in the human-computer interaction scene based on the physical environment information; and obtaining a plurality of scene characteristics of the man-machine interaction scene based on the user state characteristics and the environment state characteristics.

Specifically, the method includes identifying behavior environment information such as images and videos by using algorithms such as image identification and face identification to obtain user state features of the user and other users in a preset range around the user, for example: 'whether the face is covered', 'whether both hands are occupied', and 'whether someone is peeping behind' etc. The audio information, the physical environment signal and the like are identified by technical algorithms such as noise identification, image identification and the like, and the physical state characteristics of the environment where the user is located are identified, for example: 'whether the noise is too loud', 'whether it is raining', etc.

Further, because the information formats, the information qualities and the like acquired by different acquisition devices have differences, in order to avoid the problem of incompatibility of data use in the subsequent process, before inputting the data into the environment understanding model, the method further comprises the following steps: a data regularizing step, wherein the data regularizing step is used for regularizing data of each mode based on a preset regularizing rule to obtain standardized data; wherein the rules of regulation comprise at least:

correcting the distorted image, and performing definition enhancement on the image with unqualified definition to improve the image quality; performing down-sampling processing on the video frame rate to reduce the data volume required to be processed subsequently; unifying the data structure of the text data and unifying the language type of the text data so as to improve the subsequent data processing efficiency.

Based on this, the embodiment of the present specification performs modality fusion processing on the physical environment information and the behavior environment information of a single modality in the human-computer interaction scene, projects information of different modalities such as language, visual and auditory into a common subspace associated with each other, realizes multi-modality data collaborative representation of a knowledge level, and further supports knowledge acquisition of multi-modality fusion, so that the environment understanding model can accurately and fully understand the environment data of the human-computer interaction scene. In addition, in the embodiment of the present specification, the physical environment information and the behavior environment information are further normalized to obtain standardized data, so as to reduce difficulty in understanding the scene by the environment understanding model.

Step 204, inputting the scene features into a decision model to predict a first human-computer interaction mode corresponding to the scene features, wherein the decision model is obtained by training based on historical sample data and corresponding human-computer interaction mode labels, and the historical sample data has the features with the same dimensionality as the scene features;

it will be appreciated that prior to step 204, the method further comprises: a model training step, wherein the model training step specifically comprises:

firstly, collecting historical sample data and a corresponding human-computer interaction mode label; the historical sample data may include historical samples corresponding to a sample user group, where the historical samples include a plurality of training features corresponding to physical environment information and behavior environment information in a human-computer interaction scene in which a sample user performs a human-computer interaction behavior, and the training features and the scene features belong to the same or similar dimensions, for example: the plurality of training features and the plurality of scene features each include: whether the rain falls, whether people exist behind, whether the noise exceeds the standard, and the like; the human-computer interaction mode label is used for indicating the human-computer interaction mode actually selected by the sample user; the decision-making model can evaluate the influence of the scene characteristics on each human-computer interaction mode, obtain the corresponding score of each human-computer interaction mode and display the human-computer interaction mode with the highest score to the user.

Among them, the history sample data may be exemplified as including: scene features of 'a user wearing sunglasses', 'a child is held in the hands', 'very quiet surroundings', 'no other user is behind', and the like, and a corresponding human-computer interaction mode label can be exemplified as a voice interaction mode.

Secondly, training a decision model based on historical sample data and corresponding human-computer interaction mode labels, and learning influence data of scene characteristics of a human-computer interaction scene on human-computer interaction modes with different interaction characteristics by the decision model, wherein the influence data at least comprises: the influence weight of the scene characteristics of each dimension on different man-machine interaction modes and the influence weight of the scene characteristics of a plurality of dimensions on the same man-machine interaction mode. Specific examples can be:

example 1 weight of impact of same scene characteristics on different human-computer interaction modes

The influence weight of the scene characteristics of the sunglasses worn by the user on the voice interaction mode and the fingerprint interaction mode is small, but the human face of the user cannot be accurately scanned by equipment possibly, so that the human-computer interaction effect is influenced, and the influence weight on the human face interaction mode is large; alternatively, the first and second electrodes may be,

the influence weight of the scene characteristics of 'peeping behind the user' on the voice interaction mode and the fingerprint interaction mode is small, but the leakage of the user password can be caused, and the safety of human-computer interaction is influenced, so that the influence weight on the human-computer interaction mode input by the keyboard is large;

example 2 impact weight of multiple scene features on the same Voice interaction mode

For a voice man-machine interaction mode, the influence weight corresponding to 'a user wearing sunglasses' and 'whether a person is behind' is small, but the influence weight corresponding to 'surrounding noise' which may influence the device to collect voice signals sent by the user is large.

Based on the model training step, one implementation of step 204 may be:

determining the selectable human-computer interaction mode of the user, wherein the selectable human-computer interaction mode is determined by the service type handled by the user and the equipment capability; and inputting the scene characteristics and the selectable human-computer interaction mode into a decision model, and deciding a first human-computer interaction mode from the decision model. The device capability is used for indicating a human-computer interaction mode which can be supported by the device. The implementation manner may specifically be exemplified as follows:

assuming that the service type handled by the user is login service, the login service can support keyboard login, fingerprint login, voice login and the like, but the device can only support keyboard login and face-brushing login, and the selectable human-computer interaction mode is keyboard login and face-brushing login. And inputting the human-computer interaction modes of keyboard login and face-brushing login into the decision model, grading the influence weight of the human-computer interaction modes of keyboard login and face-brushing login by the decision model based on the scene characteristics, and selecting the human-computer interaction mode suitable for (with the highest grade) the human-computer interaction scene from the decision model and the human-computer interaction mode as the first human-computer interaction mode.

On the basis of the previous implementation, another implementation of step 204 may be:

firstly, determining preference data of the user to an optional human-computer interaction mode, wherein the optional human-computer interaction mode is determined by the service type handled by the user and the equipment capability; preferably, the preference data is used for representing tendency weights of the user selecting different human-computer interaction modes, and the more the times of the user using a certain human-computer interaction mode, the larger the corresponding tendency weight is.

Then, the preference data is also input into the decision model, and the decision model is combined with the preference data to decide a first human-computer interaction mode. Therefore, after the selectable human-computer interaction mode pair is evaluated based on the scene characteristics, the trend weight corresponding to the selectable human-computer interaction mode can be further considered, and the score of the selectable human-computer interaction mode can be comprehensively evaluated.

If the preference data of the user to the selectable human-computer interaction mode is not detected, the preference data of the equipment service group to the selectable human-computer interaction mode is used as the preference data of the user to the selectable human-computer interaction mode, and the equipment service group comprises users of the equipment history service, namely a user set which has performed human-computer interaction with the equipment within a preset history time range and has selected the human-computer interaction mode.

Based on this, the embodiment of the specification utilizes the decision model to learn the influence of the scene characteristics of different human-computer interaction scenes on the human-computer interaction modes with different interaction characteristics, so as to dynamically decide the optimal human-computer interaction mode suitable for the current human-computer interaction scene, thereby effectively improving the convenience and the safety of human-computer interaction of the user; in addition, the embodiment of the specification also introduces preference data of the user to different human-computer interaction modes, comprehensively considers objective influence of scene characteristics and subjective preference of the user, and decides the optimal human-computer interaction mode. In addition, the embodiment of the specification also introduces preference data of a device service group to fill up the blank of the preference data of a 'new user' as correctly as possible and provide data support for model decision.

And step 206, displaying a human-computer interaction interface corresponding to the first human-computer interaction mode to the user.

The man-machine interaction interface is an interface of input/output equipment for establishing contact and exchanging information between people and equipment, and comprises the following steps: keyboard, face scanning frame, adapter, broadcaster etc..

Through the man-machine interaction interface, a user can input information required for transacting related services so as to complete the related services. Taking the login service as an example, a user can input an account and a password through a human-computer interaction interface of a virtual keyboard displayed by equipment to complete the login service; or, the user can place the face in the face scanning frame through a human-computer interaction interface of the face scanning frame displayed by the equipment, so that the equipment can perform face recognition verification to complete the login service.

In addition, while the human-computer interaction interface corresponding to the first human-computer interaction mode is displayed, human-computer interaction interfaces of other human-computer interaction modes can be displayed for the user, and the other human-computer interaction modes comprise human-computer interaction modes which are selectable by the user and are not the first human-computer interaction mode.

Specifically, on one hand, the device can prominently display the human-computer interaction interface corresponding to the first human-computer interaction mode in modes of amplification display, labeling and the like, and on the other hand, can synchronously display the human-computer interaction interfaces of other human-computer interaction modes for the user to select the human-computer interaction mode by himself.

Further, the method also comprises the following steps: a model iteration training step, which may specifically be:

determining a second man-machine interaction mode selected by the user; comparing the first human-computer interaction mode with the second human-computer interaction mode to generate decision reward and punishment data, wherein the decision reward and punishment data are used for representing reward and punishment coefficients of decisions made on the decision model; and iteratively training the decision model based on the decision reward and punishment data. The second human-computer interaction mode is a human-computer interaction mode selected by a user from the first human-computer interaction mode and the other human-computer interaction modes displayed by equipment; the decision reward and penalty data includes: decision penalty data for characterizing penalty coefficients for decisions made to the decision model, and decision reward data for characterizing reward coefficients for decisions made to the decision model.

The model iterative training step may specifically be exemplified by:

example 1, assuming that a first human-computer interaction mode decided by a decision model is different from a second human-computer interaction mode actually selected by a user, decision penalty data is generated, and adaptive penalty iterative update is performed on the decision model by using the decision penalty data so as to adapt to a current human-computer interaction scene and a user group.

Example 2, assuming that a first human-computer interaction mode decided by a decision model is the same as a second human-computer interaction mode actually selected by a user, decision reward data is generated, and adaptive reward iteration updating is performed on the decision model by using the decision reward data so as to be more adaptive to a current human-computer interaction scene and a user group.

Based on this, the first human-computer interaction mode and other selectable human-computer interaction modes are displayed to the user together for the user to select by himself, so that the situation that the user is forced to select the first human-computer interaction mode is avoided; by setting a feedback mechanism, comparing the man-machine interaction mode actually selected by the user with the man-machine interaction mode decided by the decision model, and performing iterative training on the decision model according to the comparison result, so that the decision model can adapt to the man-machine interaction scene where the user is located, and the decision precision of the decision model can be effectively improved.

In summary, in the embodiments of the present description, the influence of the multiple scene features on different human-computer interaction modes is evaluated by using the multiple scene features of the human-computer interaction scene where the user is located and the interaction characteristics of the different human-computer interaction modes, so that a human-computer interaction mode suitable for the human-computer interaction scene is dynamically determined, and the convenience and the security of the human-computer interaction behavior performed by the user are effectively improved.

In addition, for the sake of simplicity, the above method embodiments are described as a series of acts or combinations, but it should be understood by those skilled in the art that the method embodiments are not limited by the described acts or sequences, as some steps may be performed in other sequences or simultaneously according to the method embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the described embodiments.

Fig. 6 is a schematic structural diagram of a human-computer interaction mode display device provided in an embodiment of the present specification, and referring to fig. 6, the device may specifically include: an obtaining module 602, a decision module 604, and a presentation module 606, wherein:

an obtaining module 602, configured to obtain multiple scene features of a human-computer interaction scene, where the human-computer interaction scene includes a scene in which a user performs human-computer interaction with a device, and the multiple scene features are used to represent scene understanding of the human-computer interaction scene;

a decision module 604, configured to input the multiple scene features into a decision model to predict a first human-computer interaction manner corresponding to the multiple scene features, where the decision model is obtained by training based on historical sample data and corresponding human-computer interaction manner labels, and the historical sample data has features with dimensions the same as those of the multiple scene features;

and a display module 606 for displaying the human-computer interaction interface corresponding to the first human-computer interaction mode to the user.

Optionally, the obtaining module 602 includes:

the acquiring unit acquires physical environment information and behavior environment information in the human-computer interaction scene, wherein the behavior environment information comprises: behavior information of the user and other users within the preset range of the user;

the first input unit is used for inputting the physical environment information and the behavior environment information into an environment understanding model to obtain a plurality of scene characteristics of the man-machine interaction scene, and the environment understanding model is used for analyzing the physical environment information and the behavior environment information to obtain the scene characteristics.

Optionally, the obtaining unit includes:

the acquisition subunit acquires sensor data and multimedia data of the equipment, wherein the sensor data comprises environment information of the human-computer interaction scene sensed by the equipment, and the multimedia data comprises multimedia information in the human-computer interaction scene acquired by the equipment;

and the processing subunit is used for obtaining physical environment information and behavior environment information in the human-computer interaction scene based on the sensor data and the multimedia data.

Optionally, the input unit includes:

the single-mode subunit generates single-mode feature vectors corresponding to data of a plurality of preset data modes in the physical environment information and the behavior environment information;

and the multi-mode subunit fuses the single-mode feature vectors to obtain multi-mode feature vectors and inputs the multi-mode feature vectors into the environment understanding model.

Optionally, the decision module 604 includes:

the determining unit is used for determining the selectable human-computer interaction mode of the user, and the selectable human-computer interaction mode is determined by the service type transacted by the user and the equipment capability;

and the second input unit is used for inputting the scene characteristics and the selectable human-computer interaction modes into a decision model, and the decision model is used for deciding the first human-computer interaction mode.

Optionally, the apparatus further comprises:

the first preference processing module is used for determining preference data of the user on a selectable human-computer interaction mode, wherein the selectable human-computer interaction mode is determined by the service type transacted by the user and the equipment capability; and inputting the preference data into the decision-making model, and combining the preference data with the decision-making model to decide a first human-computer interaction mode.

Optionally, the apparatus further comprises:

and the second preference processing module is used for taking the preference data of the equipment service group to the selectable human-computer interaction mode as the preference data of the user to the selectable human-computer interaction mode if the preference data of the user to the selectable human-computer interaction mode is not detected, wherein the equipment service group comprises the users of the equipment historical service.

Optionally, the display module 606 displays human-computer interaction interfaces of other human-computer interaction modes to the user, where the other human-computer interaction modes include human-computer interaction modes that are selectable by the user and are different from the first human-computer interaction mode.

Optionally, the apparatus further comprises:

the model iteration module is used for determining a second human-computer interaction mode selected by the user; comparing the first human-computer interaction mode with the second human-computer interaction mode to generate decision reward and punishment data, wherein the decision reward and punishment data are used for representing reward and punishment coefficients of decisions made on the decision model; and iteratively training the decision model based on the decision reward and punishment data.

In addition, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment. It should be noted that, in the respective components of the apparatus of the present specification, the components therein are logically divided according to the functions to be implemented thereof, but the present specification is not limited thereto, and the respective components may be newly divided or combined as necessary.

Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, and referring to fig. 7, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may also include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form the human-computer interaction mode display device on the logic level. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

The network interface, the processor and the memory may be interconnected by a bus system. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

The memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The Memory may include a Random-Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory.

The processor is used for executing the program stored in the memory and specifically executing:

The method executed by the human-computer interaction mode display apparatus or manager (Master) node according to the embodiment shown in fig. 6 of the present specification can be applied to a processor, or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The man-machine interaction mode display device can also execute the methods of figures 2-3 and realize the method executed by the manager node.

Based on the same innovation, the embodiment of the present specification further provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and when the one or more programs are executed by an electronic device including a plurality of application programs, the electronic device is caused to execute the human-computer interaction mode presentation method provided by the corresponding embodiment in fig. 2 to 3.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A man-machine interaction mode display method comprises the following steps:

2. The method of claim 1, wherein the obtaining a plurality of scene features of a human-computer interaction scene comprises:

acquiring physical environment information and behavior environment information in the human-computer interaction scene, wherein the behavior environment information comprises: behavior information of the user and other users within the preset range of the user;

and inputting the physical environment information and the behavior environment information into an environment understanding model to obtain a plurality of scene characteristics of the man-machine interaction scene, wherein the environment understanding model is used for analyzing the physical environment information and the behavior environment information to obtain the scene characteristics.

3. The method of claim 2, wherein the acquiring physical environment information and behavioral environment information in the human-computer interaction scene comprises:

acquiring sensor data and multimedia data of the equipment, wherein the sensor data comprises environment information of the human-computer interaction scene sensed by the equipment, and the multimedia data comprises multimedia information in the human-computer interaction scene acquired by the equipment;

and obtaining physical environment information and behavior environment information in the human-computer interaction scene based on the sensor data and the multimedia data.

4. The method of claim 2, the inputting the physical environment information and the behavioral environment information into an environment understanding model, comprising:

generating single-mode feature vectors corresponding to data of a plurality of preset data modes in the physical environment information and the behavior environment information;

and fusing the single-mode feature vectors to obtain multi-mode feature vectors and inputting the multi-mode feature vectors into the environment understanding model.

5. The method of claim 1, wherein inputting the plurality of scene features into a decision model to determine a first human-machine interaction comprises:

determining the selectable human-computer interaction mode of the user, wherein the selectable human-computer interaction mode is determined by the service type handled by the user and the equipment capability;

and inputting the scene characteristics and the selectable human-computer interaction mode into a decision model, and deciding a first human-computer interaction mode from the decision model.

6. The method of claim 1, further comprising:

determining preference data of the user to an optional human-computer interaction mode, wherein the optional human-computer interaction mode is determined by the service type handled by the user and the equipment capability;

and inputting the preference data into the decision-making model, and combining the preference data with the decision-making model to decide a first human-computer interaction mode.

7. The method of claim 6, further comprising:

if the preference data of the user to the selectable human-computer interaction mode is not detected, the preference data of a device service group to the selectable human-computer interaction mode is used as the preference data of the user to the selectable human-computer interaction mode, and the device service group comprises users of the device history service.

8. The method of claim 1, further comprising:

and displaying human-computer interaction interfaces of other human-computer interaction modes to the user, wherein the other human-computer interaction modes comprise human-computer interaction modes which are selectable by the user and are not the first human-computer interaction mode.

9. The method of claim 8, further comprising:

determining a second man-machine interaction mode selected by the user;

comparing the first human-computer interaction mode with the second human-computer interaction mode to generate decision reward and punishment data, wherein the decision reward and punishment data are used for representing reward and punishment coefficients of decisions made on the decision model;

and iteratively training the decision model based on the decision reward and punishment data.

10. A human-computer interaction mode display device comprises:

11. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

12. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the operations of: