CN109101108B

CN109101108B - Method and system for optimizing human-computer interaction interface of intelligent cabin based on three decisions

Info

Publication number: CN109101108B
Application number: CN201810823980.3A
Authority: CN
Inventors: 刘群; 张刚强; 王如琪
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2021-06-18
Anticipated expiration: 2038-07-25
Also published as: CN109101108A

Abstract

The invention belongs to the field of intelligent driving, and discloses a method and a system for optimizing a human-computer interaction interface of an intelligent cabin based on three decisions, wherein the method comprises the steps of collecting gesture videos in the cabin, and preprocessing the videos to obtain gesture images; segmenting the gesture and the background of the gesture image to obtain a gesture area image; performing multi-granularity expression on the gesture area image, and extracting multi-granularity characteristics of the gesture area image by using a convolutional neural network; calculating the conditional probability of classifying each granularity gesture area image into each category from coarse granularity to fine granularity, and sequentially finishing gesture recognition by using three decisions; performing semantic conversion on the recognized gesture, and operating the human-computer interaction interface according to a result after the semantic conversion; and obtaining the optimal granularity by adopting a weighted summation mode, and taking the optimal granularity as the finest granularity. The method and the device can not only accurately identify the gestures in the cockpit and execute the gesture commands, but also reduce the interaction time of the man-machine interaction interface of the cockpit and provide more comfortable interaction experience for users.

Description

Method and system for optimizing human-computer interaction interface of intelligent cabin based on three decisions

Technical Field

The invention belongs to the field of intelligent driving, and particularly relates to a method and a system for optimizing a human-computer interaction interface of an intelligent cabin based on three decisions.

Background

With the development of artificial intelligence and deep learning techniques, intelligent driving has attracted attention of many people. Gesture recognition, one of typical human-machine interaction modes in intelligent driving, is very important for the optimal design of a human-machine interaction (HMI) interface in a cabin. Accurate quick gesture recognition not only can provide more comfortable interactive experience, also can improve driver's security.

The current gesture recognition methods mainly have two modes based on sensor equipment and computer vision. Although the former has a better recognition rate, the cost is higher, the interaction experience cannot meet the current requirements, and the latter is easier to acquire gesture images, the former includes: the gesture recognition method based on template matching, geometric feature extraction, hidden Markov model and neural network still has the problems of low model recognition precision or low recognition speed and the like, and cannot well adapt to the current accurate real-time gesture recognition requirement. The main reason of low model identification precision is that the features of the gestures cannot be well extracted, while the main reason of low identification speed is that the models are too complex, and the existing methods cannot solve the two problems at the same time.

Disclosure of Invention

Based on the problems, the optimization problems of low gesture recognition precision and low recognition speed can be solved by selecting proper granularity by utilizing the capability of extracting features of a deep neural network and combining a multi-granularity information expression mode and a three-branch decision making idea.

The invention provides a method for optimizing a man-machine interaction interface of an intelligent cabin based on three decisions, which comprises the following steps:

s1, acquiring a gesture video in the cabin, and preprocessing the gesture video to obtain a static gesture image;

s2, performing segmentation processing on the gestures and the background in the gesture image to obtain a gesture area image;

s3, performing multi-granularity expression on the gesture area image from coarse granularity to fine granularity; extracting multi-granularity characteristics of the gesture area image by using a convolutional neural network;

s4, calculating the conditional probability of classifying each granularity gesture area image into each category from coarse granularity to fine granularity, and sequentially completing gesture recognition by utilizing three decisions;

s5, performing semantic conversion on the recognized gesture, and performing corresponding operation on the human-computer interaction interface according to the gesture recognition result after the semantic conversion;

s6, obtaining the best granularity by adopting a weighted summation mode, and repeatedly executing the steps S3-S5 by taking the best granularity as the finest granularity.

Further, the gesture area image is expressed in a multi-granularity mode from coarse granularity to fine granularity, and for the same gesture area image, the multi-granularity information expression mode is as follows:

wherein A is_iInformation representing different granularities of images of gesture areas, A₁Information indicating the coarse granularity of the image of the gesture area, A_nInformation indicating that the gesture area image is in a fine granularity, namely the fine granularity comprises a coarse granularity; i 1,2, n, n represents the particle size.

Further, the extracting the multi-granularity features of the gesture area image by using the convolutional neural network comprises extracting the multi-granularity image features of the gesture image by using different convolutional kernels in the convolutional neural network.

Further, the step S4 includes extracting coarse-grained features from the gesture area image to make three decisions, if the classification category of the gesture can be determined, not continuing fine-grained feature extraction and further three decisions, otherwise, extracting finer-grained features to make three decisions until the classification category of the gesture area image is determined.

Further, the step S6 includes obtaining a final human-computer interaction interface optimization result for each granularity by a weighted summation method, so as to determine a granularity at which the gesture has the best human-computer interaction interface optimization effect;

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂

wherein Result is the optimal granularity of the gesture area image, Acc represents gesture recognition accuracy, Time represents Time spent in the gesture recognition process, w represents weight, T represents weight₁Representing the time for extracting the multi-granularity features of the gesture area image; t is₂Indicating the time at which the gesture was recognized.

The invention provides a system for optimizing a human-computer interaction interface of an intelligent cockpit based on three decisions, which comprises a camera, a cockpit gesture acquisition module, a gesture image segmentation module, a multi-granularity feature extraction module, a three-decision gesture recognition module, a gesture semantic conversion module and an optimal granularity acquisition module, wherein the camera, the cockpit gesture acquisition module, the gesture image segmentation module, the multi-granularity feature extraction module, the three-decision gesture recognition module, the gesture semantic conversion module and the optimal granularity acquisition module;

the cockpit gesture acquisition module acquires a gesture video in the cockpit through a camera and converts a video frame into a series of static gesture images;

the gesture image segmentation module is used for segmenting the gesture and the background of the gesture image to obtain a gesture area image;

the gesture multi-granularity feature extraction module is used for extracting multi-granularity features of the gesture area image from coarse granularity to fine granularity;

the three-decision gesture recognition module is used for carrying out three-decision on the gesture area image in each granularity according to the extracted multi-granularity features so as to classify the gestures;

the gesture semantic conversion module is used for performing semantic conversion on the classified gestures;

the optimal granularity acquisition module is used for acquiring optimal granularity and sending the optimal granularity to the multi-granularity feature extraction module.

Further, the gesture multi-granularity feature extraction module comprises a convolutional neural network unit, and extracts multi-granularity image features of the gesture area image by using different convolutional kernels in the convolutional neural network unit; the multi-granularity information representation mode is specifically

Further, the three-branch decision gesture recognition module performs three-branch decision on coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, fine-grained feature extraction and further three-branch decision are not continued, otherwise, finer-grained features are extracted to perform three-branch decision until the classification category of the gesture area image is determined.

Further, the optimal granularity acquisition module acquires a final human-computer interaction interface optimization result of each granularity by adopting a weighted summation mode so as to determine the optimal granularity of the gesture area image;

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂

wherein Result is a gesture areaOptimal granularity of a domain image, Acc represents gesture recognition precision, Time represents Time spent in a gesture recognition process, w represents weight, T represents weight₁Representing the time for extracting the multi-granularity features of the gesture area image; t is₂Indicating the time at which the gesture was recognized.

The invention has the beneficial effects that:

the invention utilizes the thought of 'gradual calculation' in particle calculation to construct a multi-granularity information expression mode for a gesture image, utilizes a convolutional neural network to extract the characteristics of the multi-granularity gesture image, and uses a three-decision method to identify the gesture in each granularity from coarse granularity to fine granularity, then carries out corresponding semantic conversion on the identified gesture, and applies the gesture identification result to HMI interface optimization in a cabin.

The method and the device can utilize the characteristics of the acquired gestures with different granularities and combine three decision-making ideas, the gestures are recognized more accurately, and corresponding semantic operations are executed more quickly, so that the interaction time of the HMI interface of the cockpit can be reduced, and more comfortable interaction experience can be provided for users.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of multi-granularity feature extraction employed in the present invention;

FIG. 3 is a flow chart of three decision gesture recognition employed in the present invention.

FIG. 4 is an HMI interface optimization design method employed by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

To better illustrate the specific implementation steps of the method, the following is illustrated by way of example in conjunction with fig. 1:

example 1

The invention comprises the following steps:

s5, performing semantic conversion on the recognized gesture area image, and operating the human-computer interaction interface according to the gesture recognition result after the semantic conversion;

the gesture area image is expressed in a multi-granularity mode from coarse granularity to fine granularity, and for the same gesture area image, the multi-granularity information expression mode is as follows:

The extracting of the multi-granularity features of the gesture area image by using the convolutional neural network includes extracting the multi-granularity image features of the gesture image by using different convolutional kernels in the convolutional neural network, and as shown in fig. 2, extracting n granularity features (in turn, coarse granularity to fine granularity features) of the gesture area image by using the convolutional neural network CNN.

Further, the step S4 includes performing three decisions on coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, not continuing fine-grained feature extraction and the further three decisions, otherwise, extracting more fine-grained features and performing three decisions until the classification category of the gesture area image is determined.

The flow chart of the three-branch decision is shown in fig. 3, and the input data set is used for extracting the multi-granularity features of the gesture area image, calculating the conditional probability and performing the three-branch decision.

Selecting a softmax function to calculate a conditional probability, wherein the conditional probability for classifying the gesture x into the category j is as follows:

wherein, l is 1, 2.. k, k represents the total number of categories of the gesture area image; θ is a parameter vector.

The three-branch decision model uses a group of decision thresholds alpha, beta and gamma to draw the gesture object into a positive domain (POS), a boundary domain (BND) and a negative domain (NEG), adopts an acceptance and rejection rule for the positive domain and the negative domain to directly obtain a gesture recognition result, adopts a delay decision for the boundary domain, and continues to apply three-branch decisions when more information is obtained at finer granularity.

The expressions for the positive, boundary and negative domains are as follows:

POS_(α,β)＝{x∈U|p(X|[x])≥α}

BND_(α,β)＝{x∈U|β<p(X|[x])<α}

NEG_(α,β)＝{x∈U|p(X|[x])≤β}

where p (X [ X ]) is the conditional probability of classification, and [ X ] is the equivalence class containing X.

Threshold alpha of three decisionsⁱ，βⁱ，γⁱThe calculation method of (c) is as follows:

respectively, are loss functions that take different actions,

respectively representing a penalty function of taking an accept, delay and reject decision respectively when the ith granularity gesture X belongs to category X,

respectively representing the loss functions of taking acceptance, delay and rejection decisions respectively when the ith granularity gesture X does not belong to the category X, the loss functions of each granularity being given by experts according to experience.

The multi-granularity three-branch decision threshold is set in such a way that finer-grained decisions are made only if necessary or beneficial. This provides the basis for setting the three decision thresholds at different granularities, i.e. the coarse granularity selects a larger acceptance threshold and a smaller rejection threshold, i ═ 1,2, …, n-1 represents the sequence from coarse granularity to fine granularity, and then the thresholds at different granularities are specifically described as follows:

0≤β_i<α_i≤1,1≤i<n,

β₁≤β₂≤…≤β_i<α_i≤…≤α₂≤α₁

when i is n granularity, the three-branch decision becomes the two-branch decision, and the decision threshold calculation mode is as follows:

the three-branch decision is a decision mode conforming to human thinking, and compared with the traditional two-branch decision, a choice of no commitment is added, namely, a third delay decision is adopted when the information is not enough to be accepted or rejected. The two-branch decision making process is quick and simple, but the three-branch decision making is more suitable when the obtained information is insufficient or the obtained information needs a certain cost. The purpose of selecting three decisions for gesture recognition is that time spent for acquiring gesture features of different granularities is different, and for HMI interface operation with high real-time requirement, it is very necessary to consider time cost. In three-branch decision gesture recognition, the key steps are extracting multi-granularity features, and calculating threshold value pairs and conditional probabilities of three-branch decisions.

Example 2

On the basis of steps S1-S5, the embodiment further adds step S6, obtains the optimal granularity by means of weighted summation, and repeatedly executes steps S3-S5 with the optimal granularity as the finest granularity.

The HMI interface optimization design method is as shown in fig. 4, a final human-computer interaction interface optimization result of each granularity is obtained by adopting a weighted summation mode, so that the optimal granularity of the gesture area image is determined, the optimal granularity is used as the finest granularity, a convolutional neural network is utilized to extract multi-granularity characteristics of a new gesture, and three decisions are sequentially carried out;

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂

This embodiment can save more time resources and have less computational complexity than embodiment 1, for example, in the case of embodiment 1, instead of using the optimal granularity, it takes 100 time to set the feature of extracting 5 granularities, and if it is known that the effect of 3 granularities is slightly worse than the recognition effect of 5 granularities, but the time is 40, then it is considered comprehensively that it is actually more suitable for practical application than 5 to 1 granularities at 5 to 3 granularities.

And the optimal granularity is calculated and then used as the finest granularity of subsequent gesture image processing. Different recognition results can be obtained due to different information amounts of the extracted features with different granularities, time spent by fine-grained feature extraction is more than that spent by coarse-grained feature extraction, gesture recognition accuracy and recognition time are considered in a weighting mode, and the most appropriate granularity can be selected for gesture feature extraction so as to meet the gesture-based HMI interface optimization design target in the cockpit.

Further, the three-branch decision gesture recognition module performs three-branch decision on coarse-grained features of the gesture area image, if the classification type of the gesture can be determined, fine-grained feature extraction and further three-branch decision are not continued, otherwise, more fine-grained features are extracted to perform three-branch decision until the classification type of the gesture area image is determined.

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂

Those skilled in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be performed by hardware associated with program instructions, and that the program may be stored in a computer-readable storage medium, which may include: ROM, RAM, magnetic or optical disks, and the like.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for optimizing a human-computer interaction interface of an intelligent cabin based on three decisions is characterized by comprising the following steps:

s1, acquiring gesture videos in the cabin, and preprocessing the gestures to obtain a series of static gesture images;

2. The method for optimizing the human-computer interaction interface of the intelligent cockpit based on three decisions according to claim 1, wherein the images of the gesture area are expressed in a multi-granularity mode from a coarse granularity mode to a fine granularity mode, and for the same image of the gesture area, the multi-granularity information expression mode is as follows:

3. The method for optimizing the human-computer interaction interface of the intelligent cockpit based on the three-branch decision as claimed in claim 2, wherein the extracting the multi-granularity features of the image of the gesture area by using the convolutional neural network comprises extracting the multi-granularity features of the image of the gesture area by using different convolutional kernels in the convolutional neural network.

4. The method for optimizing the human-computer interaction interface of the intelligent cockpit based on the three-branch decision as claimed in claim 1, wherein the step S4 includes performing three-branch decision from coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, the fine-grained feature extraction and the further three-branch decision are not continued, otherwise, the finer-grained features are extracted to perform the three-branch decision until the classification category of the gesture area image is determined.

5. The method for optimizing the human-computer interaction interface of the intelligent cockpit according to claim 1, wherein the step S6 includes obtaining the final human-computer interaction interface optimization result for each granularity by means of weighted summation, so as to determine the optimal granularity of the gesture area image;

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂

6. A system for optimizing a human-computer interaction interface of an intelligent cabin based on three decisions is characterized by comprising a camera, a cabin gesture acquisition module, a gesture image segmentation module, a multi-granularity feature extraction module, a three-decision gesture recognition module, a gesture semantic conversion module and an optimal granularity acquisition module which are electrically connected;

7. The system for optimizing the human-computer interaction interface of the intelligent cabin based on the three-branch decision as claimed in claim 6, wherein the gesture multi-granularity feature extraction module comprises a convolutional neural network unit, and extracts multi-granularity image features of a gesture area image by using different convolutional kernels in the convolutional neural network unit; the multi-granularity information representation mode is specifically

8. The system for optimizing the human-computer interaction interface of the intelligent cockpit according to claim 6, wherein the three-decision gesture recognition module performs three decisions on coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, fine-grained feature extraction and further three decisions are not continued, otherwise, finer-grained features are extracted to perform three decisions until the classification category of the gesture area image is determined.

9. The system for optimizing the human-computer interaction interface of the intelligent cockpit according to claim 6, wherein the optimal granularity obtaining module obtains the final human-computer interaction interface optimization result of each granularity by adopting a weighted summation mode, so as to determine the optimal granularity of the gesture area image;

Result＝w×Acc+(1-w)×Time

Time＝T₁+T₂