CN113469048A

CN113469048A - Passenger state determining method and device, computer equipment and storage medium

Info

Publication number: CN113469048A
Application number: CN202110743144.6A
Authority: CN
Inventors: 张旭龙; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-01

Abstract

The invention discloses a method and a device for determining the state of a passenger, computer equipment and a storage medium, which are used in the field of artificial intelligence, relate to the field of block chains, and comprise the following steps: acquiring video data and audio data of passengers on a vehicle, identifying the video data and the audio data of the passengers to generate state information of the passengers, performing abnormity detection on posture information and voice text information, determining whether the state information of the passengers has abnormal information, generating question sentences according to terms in a preset term set if the state information of the passengers has the abnormal information, asking the passengers according to the question sentences to obtain response data of the passengers to the question sentences, and determining the states of the passengers according to the response data; according to the invention, the state of the passenger is determined according to the posture characteristic and the voice text characteristic of the passenger, the defect of wrong state recognition caused by the fact that the single-frame static image emotion is not corresponding to the voice emotion is overcome, meanwhile, questioning measures are added, and the accuracy of passenger state recognition is improved.

Description

Passenger state determining method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a passenger state determining method, a passenger state determining device, computer equipment and a storage medium.

Background

In real life, the state of passengers (including a driver and passengers) in a vehicle, especially the state of the driver, is an important factor affecting the driving safety problem. In order to ensure the safety of the vehicle in the driving process, the states of the driver and the passengers need to be monitored and identified, so that the driving state of the vehicle is coordinately controlled according to the abnormal state identification results of the driver and the passengers.

In the prior art, for identifying the abnormal state of the driver, a camera is generally used for acquiring a static driving image of the driver, an audio acquisition device is used for acquiring the voice of the driver, and then an audio and visual emotion identification network is used for identifying the emotional state of the voice and the static driving image of the driver, so as to determine whether the state of the driver is abnormal or not according to the identified emotional state of the driver. However, in this method, a single frame of static image and voice are used to identify the emotional state of the driver, the image information is insufficient, and it is easy for the image emotion and the voice emotion to be inconsistent, for example, when the voice of the driver is cheerful, but the facial expression of the driver in the frame of static image is cool, the audio visual emotion identification network is difficult to accurately identify the emotional state of the driver, which easily results in inaccurate state identification of the driver.

Disclosure of Invention

The invention provides a passenger state determination method, a passenger state determination device, computer equipment and a storage medium, which are used for solving the problem that in the prior art, the emotional state of a driver is easily identified by adopting a single-frame static image and voice, so that the state identification of the driver is inaccurate.

An occupant state determination method, comprising:

acquiring video data and audio data of passengers on a vehicle;

recognizing the video data and audio data of the occupant to generate state information of the occupant, the state information including posture information and voice text information of the occupant;

performing abnormality detection on the posture information and the voice text information, and determining whether abnormal information exists in the state information of the passenger;

if the state information of the passenger has abnormal information, generating a question sentence according to the entries in a preset entry set, and asking the passenger questions according to the question sentence;

and acquiring response data of the passenger to the question, and determining the state of the passenger according to the response data.

An occupant state determination device comprising:

the acquisition module is used for acquiring video data and audio data of passengers on the vehicle;

a recognition module to recognize video data and audio data of the occupant to generate state information of the occupant, the state information including posture information and voice text information of the occupant;

the detection module is used for carrying out abnormity detection on the posture information and the voice text information and determining whether the state information of the passenger has abnormal information;

the question module is used for generating a question sentence according to the entries in the preset entry set if the state information of the passenger has abnormal information, and asking the passenger for a question according to the question sentence;

and the determining module is used for acquiring response data of the passenger to the question and determining the state of the passenger according to the response data.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the occupant status determination method described above when executing the computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned occupant state determination method.

In one scheme provided by the passenger state determining method, the passenger state determining device, the computer equipment and the storage medium, video data and audio data of passengers on a vehicle are acquired, then the video data and the audio data of the passengers are identified to generate passenger state information, the state information comprises posture information and voice text information of the passengers, then abnormal detection is carried out on the posture information and the voice text information to determine whether the passenger state information has abnormal information, if the passenger state information has the abnormal information, a question is generated according to terms in a preset term set and is asked for the passengers according to the question, the preset term set is stored in a block chain database, finally response data of the passengers to the question are acquired, and the passenger state is determined according to the response data; according to the method and the device, the state of the passenger is determined according to the posture characteristic and the voice text characteristic of the passenger, the defect of wrong state recognition caused by the fact that the single-frame static image emotion and the voice emotion are not corresponding is overcome, the accuracy of the state recognition of the passenger is improved, meanwhile, questioning measures are added, the response data of the passenger can be obtained in time, the passenger state judgment is further carried out on the response data, and the accuracy of the state recognition of the passenger is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a method for determining occupant status according to an embodiment of the present invention;

FIG. 2 is a flow chart of an occupant status determination method in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an implementation of step S20 in FIG. 2;

FIG. 4 is a flowchart illustrating an implementation of step S21 in FIG. 3;

FIG. 5 is a flowchart illustrating an implementation of step S211 in FIG. 4;

FIG. 6 is a flowchart illustrating an implementation of step S30 in FIG. 2;

FIG. 7 is a flowchart illustrating an implementation of step S50 in FIG. 2;

FIG. 8 is a schematic flow chart of another implementation of step S50 in FIG. 2;

FIG. 9 is a schematic view of a structure of an occupant state determination device in an embodiment of the invention;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The occupant state determination method provided by the embodiment of the invention can be applied to the application environment shown in fig. 1, wherein the terminal device communicates with the server through the network. The method comprises the steps that a server acquires video data and audio data of passengers on a vehicle, which are sent by a terminal device, identifies the video data and the audio data of the passengers to generate state information of the passengers, the state information comprises posture information and voice text information of the passengers, then performs abnormal detection on the posture information and the voice text information to determine whether the state information of the passengers has abnormal information, if the state information of the passengers has the abnormal information, a question is generated according to terms in a preset term set, questions are asked for the passengers according to the question, the preset term set is stored in a block chain database, finally response data of the passengers to the question are acquired, and the states of the passengers are determined according to the response data; according to the invention, the video data is used for gesture recognition, more passenger gestures can be extracted, the state of a passenger is determined according to the gesture characteristics and the voice text characteristics of the passenger, the defect of state recognition error caused by the fact that single-frame static image emotion and voice emotion do not correspond is overcome, the accuracy of passenger state recognition is improved, meanwhile, questioning measures are added, after abnormal information of the state information of the passenger is determined, questioning is carried out on the passenger, the response data of the passenger can be timely obtained, further passenger state judgment is carried out on the response data, the accuracy of passenger state recognition is further improved, the state of the passenger can be accurately recognized without manual participation, the manual intelligence of passenger state recognition is improved, and the safety of riding is improved.

The blockchain database in this embodiment is stored in a blockchain network, and is used to store data used and generated in the occupant status determination method, such as video data, audio data, posture information, and voice text information of the occupant. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The database is deployed in the blockchain, so that the safety of data storage can be improved.

The terminal device may be a vehicle-mounted terminal device, and may also be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

Wherein a terminal device mounted on the vehicle and a server form an occupant status determination system to monitor the status of a vehicle occupant. The terminal equipment comprises a voice acquisition module, a video acquisition module, a call module and the like, wherein the voice acquisition module is used for acquiring audio data of passengers on the vehicle, the video acquisition module is used for acquiring video data of the passengers on the vehicle, and the call module is used for interacting (asking questions) the passengers on the vehicle.

In an embodiment, as shown in fig. 2, an occupant status determining method is provided, which is described by taking the example of the method applied to the server in fig. 1, and includes the following steps:

s10: video data and audio data of occupants in a vehicle are acquired.

During the running of the vehicle, the terminal equipment monitors the behavior and words of passengers on the vehicle. The voice acquisition module on the vehicle can acquire the audio data of passengers on the vehicle, the video acquisition module on the vehicle can acquire the video data of the passengers on the vehicle, and after the video data and the audio data of the passengers on the vehicle are acquired, the video data and the audio data can be sent to the server in real time, so that the server can receive the video data and the audio data of the passengers on the vehicle.

S20: the video data and audio data of the occupant are recognized to generate occupant status information, which includes occupant posture information and speech text information.

After the server acquires the video data and the audio data of the passengers on the vehicle, the server identifies the video data and the audio data of the passengers to generate the state information of the passengers. Wherein the state information includes posture information and voice text information of the occupant. Namely, a gesture recognition module in the server performs gesture extraction on video data of the passenger to acquire a plurality of gestures in the video data to form gesture information; and the voice recognition module in the server can recognize the voice of the passenger in the audio data, convert the voice of the passenger into a text and obtain voice text information.

S30: and carrying out abnormity detection on the posture information and the voice text information, and determining whether the state information of the passenger has abnormal information.

After the state information of the passenger is acquired, abnormal detection is carried out on the posture information and the voice text information in the state information, and whether abnormal information exists in the state information of the passenger is determined. Namely, a decision module in the server respectively detects the gesture information and the voice text information, determines whether the gesture information has an abnormal gesture, and determines whether the voice text information has an abnormal text.

S40: and if the state information of the passengers has abnormal information, generating a question according to the entries in the preset entry set, and asking the passengers for questions according to the question.

After determining whether the state information of the passenger has abnormal information or not, if the state information of the passenger has the abnormal information and indicates that the posture or the words of the passenger are abnormal, the server generates a question according to random entries in a preset entry set and sends the question to the terminal equipment, so that the terminal equipment asks the passenger according to the question.

The preset entry set is a set of information entries which are designed in advance and related to a vehicle scene, and the preset entry set is stored in the block chain database. The entries in the preset entry set comprise: the related terms of the riding scenes such as the destination, the departure place, the number of passengers, the appearance characteristics of the passengers and the like also comprise related terms of diseases, safety and the like.

After determining whether the state information of the passenger has abnormal information, if the state information of the passenger does not have the abnormal information and indicates that the posture and the words of the passenger do not have the abnormality, the next video data and the next audio data of the passenger are not continuously acquired so as to carry out state information judgment.

S50: and acquiring response data of the occupant to the question, and determining the state of the occupant according to the response data.

After the terminal equipment asks the passengers according to the question, the voice acquisition module on the vehicle acquires audio data of the passengers on the vehicle, the video acquisition module on the vehicle acquires video data of the passengers on the vehicle, then the acquired audio data and the acquired video data are sent to the server, and the server takes the received audio data and the received video data as response data of the passengers to the question. After the response data of the passenger to the question is acquired, the server continues to perform gesture recognition and/or voice text recognition on the response data so as to determine whether the state of the passenger is abnormal or not according to the gesture and/or voice text in the response data, thereby improving the accuracy of passenger state recognition.

In the embodiment, video data and audio data of passengers on a vehicle are acquired, then the video data and the audio data of the passengers are identified to generate state information of the passengers, the state information comprises posture information and voice text information of the passengers, then abnormal detection is carried out on the posture information and the voice text information, whether the state information of the passengers has abnormal information or not is determined, if the state information of the passengers has the abnormal information, a question is generated according to terms in a preset term set, questions are asked for the passengers according to the question, the preset term set is stored in a block chain database, finally response data of the passengers to the question are acquired, and the states of the passengers are determined according to the response data; the method has the advantages that the video data is used for carrying out gesture recognition, more occupant gesture characteristics can be extracted, the state of an occupant is determined according to the occupant gesture characteristics and the voice text characteristics, the defect of state recognition error caused by the fact that single-frame static image emotion and voice emotion do not correspond is overcome, accuracy of occupant state recognition is improved, meanwhile, questioning measures are added, after abnormal information of the occupant is determined, questioning is carried out on the occupant, response data of the occupant can be obtained in time, then passenger state judgment is carried out on the response data, and accuracy of occupant state recognition is further improved.

In one embodiment, as shown in fig. 3, in step S20, the identifying the video data and the audio data of the occupant to generate the occupant status information specifically includes the following steps:

s21: and performing posture extraction on the video data by adopting a preset posture recognition model to obtain the posture information of the passenger.

After the video data of the passengers on the vehicle are acquired, the video data of the passengers are processed to split the video data of the passengers into a plurality of frames of images, then each frame of image is input into a preset gesture recognition model, so that the preset gesture recognition model recognizes and extracts the postures of the passengers in each frame of image, the postures of each frame of image output by the preset gesture recognition model are acquired, and the posture information of the passengers is acquired.

The preset gesture recognition model is a deep learning model obtained according to different types of in-vehicle gesture image data and corresponding gesture training, and the in-vehicle gesture image data and the corresponding gesture training preset gesture recognition model are adopted, so that the gesture recognition precision of the model is improved, and the accuracy of gesture information can be further improved.

S22: and performing text conversion on the audio data by adopting a preset speech recognition model to obtain the speech text information of the passenger.

After audio data of passengers on a vehicle are obtained, preprocessing such as denoising and filtering is carried out on the audio data of the passengers to obtain filtered voice data, and then the filtered voice data is input into a preset voice recognition model, so that the preset voice recognition model carries out text conversion on the voice data to obtain voice text information.

The preset voice recognition model is a deep learning model obtained according to voice data training under different in-vehicle scenes. The preset voice recognition model is trained by adopting voice data under different in-vehicle scenes, so that the voice recognition precision of the model is improved, and the accuracy of voice text information can be improved.

In the embodiment, the posture of the passenger is obtained by extracting the posture of the video data by adopting a preset posture recognition model, wherein the preset posture recognition model is a deep learning model obtained according to different types of in-vehicle posture image data and corresponding posture training; and text conversion is carried out on the audio data by adopting a preset speech recognition model to obtain speech text information of passengers, the preset speech recognition model is a deep learning model obtained by training according to the speech data in different in-vehicle scenes, the step of recognizing the video data and the audio data of the passengers to generate the state information of the passengers is refined, and a basis is provided for subsequently determining whether the state of the passengers is abnormal.

In an embodiment, as shown in fig. 4, in step S21, performing gesture extraction on the video data by using a preset gesture recognition model to obtain the occupant gesture information, the method specifically includes the following steps:

s211: and extracting key frames of the video data to obtain a plurality of key frames.

After the video data of the passenger is obtained, the server extracts key frames from the video data to obtain a plurality of key frames, wherein the key frames are frames with obvious changes in the video data. Firstly, a server splits video data into multi-frame images, and then adopts a key frame extraction technology based on a clustering algorithm to extract frames with obvious changes from the multi-frame images so as to obtain a plurality of key frames. The obvious change means that the frame of image has obvious image difference with the previous frame or several previous frames of images, wherein the image difference comprises light difference, character difference and motion difference.

S212: and inputting each key frame into a preset gesture recognition model to obtain the gesture of each key frame.

After a plurality of key frames are obtained, the key frames are respectively and sequentially input into a preset gesture recognition model, and gestures corresponding to the key frames are obtained, namely, one key frame corresponds to one gesture.

S213: occupant posture information is generated from the posture of each key frame.

After the posture of each key frame is acquired, the posture of each key frame is aligned and held at the time of the key frame, and posture information of the occupant is generated. The posture information of the occupant may include not only the posture of the occupant at different times but also a change in the posture of the occupant to some extent so as to determine whether or not the occupant has an abnormality in the behavior of the occupant in the video data based on the posture information.

In the embodiment, a plurality of key frames are obtained by extracting key frames from video data, the key frames are frames with obvious changes in the video data, then each key frame is input into a preset gesture recognition model to obtain the gesture of each key frame, finally, the gesture information of a passenger is generated according to the gesture of each key frame, the specific process of extracting the gesture of the passenger by adopting the preset gesture recognition model to obtain the gesture information of the passenger is clarified, and the gesture information is generated by recognizing the gesture in the key frame by taking the frames with obvious changes in the video data as the key frames, so that on the basis of ensuring that the gesture characteristics are sufficient, the data processing amount is reduced, the gesture recognition speed is improved, and the passenger state determination efficiency is improved.

In an embodiment, as shown in fig. 5, in step S211, performing key frame extraction on the video data to obtain a plurality of key frames, specifically including the following steps:

a. the method comprises the steps of splitting video data into multiple frames, carrying out tensor conversion on image data of all the frames, and obtaining tensors of all the frames.

After the video data of the passengers are acquired, the video data are divided into multi-frame images, tensor conversion is carried out on the image data of all the frames, each frame of image is converted into a tensor capable of being rapidly identified and compared, and the tensor of each frame is obtained.

b. The first frame of the video data is taken as a first key frame and taken as a reference frame.

Meanwhile, for the convenience of subsequent comparison, the first frame of the video data is taken as a first key frame and is taken as a reference frame for the subsequent comparison of the differences of the frames.

c. The tensor difference value of the reference frame and the next frame of the reference frame is determined.

After the tensors of the frames in the video data are obtained, images of frames next to the reference frame are determined in chronological order, and then a tensor difference between the tensors of the reference frame and the frames next to the reference frame is determined.

d. And determining whether the tensor difference value is larger than a preset threshold value.

After determining a tensor difference value between the tensor of the reference frame and the tensor of the next frame of the reference frame, determining whether the tensor difference value between the tensor of the reference frame and the tensor of the next frame of the reference frame is larger than a preset threshold value so as to determine whether obvious image difference exists between the reference frame and the next frame of the reference frame.

e. And if the tensor difference value is larger than the preset threshold value, determining that the next frame of the tensor difference value frame is a second key frame, and taking the second key frame as a reference frame.

After determining whether the tensor difference value is larger than a preset threshold value, if the tensor difference value is larger than the preset threshold value and indicates whether obvious image difference exists between the reference frame and the next frame of the reference frame, determining the next frame of the tensor difference value frame as a second key frame, and taking the second key frame as the reference frame to continue difference comparison.

f. Repeating steps c-e until all frames are traversed to obtain a plurality of key frames.

After the second key frame is used as a reference frame, the steps c to e are repeated, the difference change between every two adjacent frames is compared in sequence, and the key frames are determined in sequence until all frames of the video data are traversed to obtain a plurality of key frames. The key frames are sequentially determined by sequentially comparing the difference change between every two adjacent frames, and the number of the key frames is increased under the condition of ensuring the validity of the key frames, so that the condition that the posture of a passenger is abnormal according to the posture of the key frames is subsequently determined, and the accuracy of judgment is increased.

In the embodiment, the video data is split into multiple frames, tensor conversion is performed on image data of all the frames to obtain tensors of all the frames, then the first frame of the video data is used as a first key frame and used as a reference frame, a tensor difference value of the reference frame and a next frame of the reference frame is determined, whether the tensor difference value is larger than a preset threshold value or not is determined, if the tensor difference value is larger than the preset threshold value, the next frame of the tensor difference value frame is determined to be a second key frame, the second key frame is used as the reference frame until all the frames are traversed to obtain multiple key frames, key frame extraction is performed on the video data, a specific process of obtaining the multiple key frames is determined, differences among different frames are determined through the tensors, the speed of obtaining the key frames is increased, the speed of obtaining posture information is increased, and the accuracy of subsequent posture abnormity judgment is increased.

In one embodiment, as shown in fig. 6, in step S30, namely, performing abnormality detection on the posture information and the speech text information, and determining whether there is abnormal information in the state information of the occupant, the method specifically includes the following steps:

s31: detecting whether an abnormal gesture exists in the gesture information, and detecting whether an abnormal text exists in the voice text information.

After the posture information and the voice text information of the occupant are acquired, it is detected whether an abnormal posture exists in the posture information and whether an abnormal text exists in the voice text information.

After acquiring the posture information of the passenger, acquiring preset posture label data stored in a blockchain database in advance, and acquiring preset text label data stored in the blockchain database in advance, wherein the preset posture label data comprises labels of all postures and corresponding postures, and the preset text label data comprises labels of all texts (keywords or key sentences) and corresponding texts, and the labels are used for indicating whether the postures or the texts are abnormal or not. Then matching the posture information of the passenger with preset posture label data to determine whether the posture information of the passenger has an abnormal posture; meanwhile, the voice text information of the passenger is matched with the preset text label data, and whether the abnormal text exists in the voice text information of the passenger is determined.

In other embodiments, the preset gesture tag data includes each gesture and a corresponding gesture tag indicating whether the gesture is abnormal, and a type of gesture abnormality. The posture abnormality types include illegal driving behaviors, illegal behaviors (such as coercion and behaviors damaging others), body disease related behaviors and the like. Therefore, when the posture information is matched with the preset posture label data, the corresponding posture abnormity type can be output, different reaction measures can be taken in time, alarm, rescue calling and the like can be carried out in time, and the safety of passengers can be protected.

After acquiring the posture label data, matching each posture in the occupant posture information with preset posture label data, and determining the posture label of each posture in the occupant posture information to determine whether each posture in the occupant posture information is abnormal.

For example, after each key frame in the video data of the passenger is input into the preset gesture recognition model, the preset gesture recognition model recognizes that the gesture of one key frame is 'one-hand holding of the steering wheel', the gesture information of the passenger includes the gesture of one-hand holding of the steering wheel, after the gesture information of the passenger is matched with the preset gesture tag data, the gesture of 'one-hand holding of the steering wheel' in the gesture information is determined, the gesture of 'one-hand holding of the steering wheel' in the gesture information is matched with the corresponding gesture tag in the preset gesture tag data, the tag of 'one-hand holding of the steering wheel' in the preset gesture tag data is abnormal, and the gesture abnormal type is illegal driving behavior, the abnormal gesture in the gesture information of the passenger is determined, and the information of 'illegal driving behavior, one-hand holding of the steering wheel' can be output; on the other hand, if it is not recognized that the posture information of the occupant includes an abnormal posture, it is determined that the abnormal posture is not present in the posture information of the occupant.

In other embodiments, the preset text label data includes each text (keyword or key sentence) and a text label corresponding to each text, and the text label is used to indicate whether the text is abnormal or not and the text abnormal type. The text exception type comprises alarm text, body disease related text and the like. Therefore, when the voice text information is matched with the preset text label data, the corresponding text abnormal type can be output, different reaction measures can be taken in time, and the safety of passengers is protected.

For example, after the audio data of the occupant is input into the preset speech recognition model, the preset speech recognition model recognizes that text "i'm heart is uncomfortable" exists in the audio data, and the posture information of the occupant includes the text: if the heart of the passenger is uncomfortable, after the voice text information of the passenger is matched with the preset text label data, determining that a text with the heart uncomfortable exists in the voice text information, and matching the text with a corresponding text label in the preset text label data, wherein the label of the text with the preset text label data in the preset text label data is abnormal, and the text abnormal type is a text related to the body disease, determining that the abnormal text exists in the voice text information of the passenger, and outputting the information of the body disease abnormality and the heart disease; on the contrary, if the abnormal text is not recognized in the voice text information of the passenger, it is determined that the abnormal posture is not present in the voice text information of the passenger.

In this embodiment, the gesture and the corresponding output information in the video data, the text and the corresponding output information in the audio data are merely exemplary, and in other embodiments, the video data may be other gestures, and the corresponding output information may be other information; the text in the audio data may also be other text, and the corresponding output information may also be other information, which is not described herein again.

S32: and if the abnormal posture exists in the posture information or the abnormal text exists in the voice text information, determining that the abnormal information exists in the state information of the passenger.

After detecting whether an abnormal posture exists in the posture information and detecting whether an abnormal text exists in the voice text information, if an abnormal posture exists in the posture information or an abnormal text exists in the voice text information, it is determined that abnormal information exists in the state information of the occupant.

S33: if the abnormal posture does not exist in the posture information and the abnormal text does not exist in the voice text information, it is determined that the abnormal information does not exist in the state information of the occupant.

After detecting whether an abnormal posture exists in the posture information and detecting whether an abnormal text exists in the voice text information, if the abnormal posture does not exist in the posture information and the abnormal text does not exist in the voice text information, it is determined that the abnormal information does not exist in the state information of the occupant.

In other embodiments, a question is prevented from being asked by mistake due to misjudgment of the posture information and the voice text information, after whether an abnormal posture exists in the posture information is detected and whether an abnormal text exists in the voice text information is detected, it is required to detect that the abnormal posture exists in the posture information and determine that the abnormal information exists in the state information of the passenger when the abnormal text exists in the voice text information, so that the accuracy of passenger state recognition is improved, and the user experience is improved.

In other embodiments, to further ensure the accuracy of the occupant state recognition, when detecting whether an abnormal posture exists in the posture information, it is necessary to detect that a plurality of abnormal postures exist in the posture information, and then the state information of the occupant is determined to have abnormal information, so as to avoid misjudgment.

In the embodiment, by detecting whether an abnormal posture exists in the posture information and detecting whether an abnormal text exists in the voice text information, if an abnormal posture exists in the posture information or the abnormal text exists in the voice text information, it is determined that abnormal information exists in the state information of the occupant, if an abnormal posture does not exist in the posture information and an abnormal text does not exist in the voice text information, it is determined that abnormal information does not exist in the state information of the occupant, the specific process of performing abnormal detection on the posture information and the voice text information is clarified, and it is determined whether abnormal information exists in the state information of the occupant, when an abnormality exists in any one of the posture information and the voice text information, a question is asked for the occupant, and further, the state of the occupant is determined according to the response data of the occupant, so that the reaction speed of the state detection can be increased, and the detection omission can be avoided.

In an embodiment, in step S50, that is, acquiring response data of the occupant to the question, the method specifically includes the following steps: determining whether the question is a gesture-related question; if the question is a posture-related question, acquiring video response data of the passenger to the question; and if the question is not the gesture-related question, acquiring audio response data and video response data of the passenger to the question.

Among them, the type of question may be a gesture-related question for requesting a gesture-back response by the occupant to detect whether there is a sudden illness in the driver or other situations where a gesture cannot be responded to.

After the terminal device asks the passenger in accordance with the question, the server determines whether the question is a posture-related question or not, so as to take different processing measures according to the determination result. If the question is a posture-related question, acquiring video response data of the passenger to the question; and if the question is not the posture-related question, acquiring audio response data and video response data of the passenger to the question so as to judge whether the state of the passenger is abnormal or not according to the audio response data and the video response data. In the embodiment, the step of acquiring the response data of the passengers to the question is clarified, and different data acquisition measures are taken according to the type of the question, so that the data processing amount can be reduced, and the resource occupation can be reduced.

In one embodiment, as shown in fig. 7, the step S50 of determining the state of the occupant according to the response data specifically includes the following steps:

s51: the type of the response data is determined.

After the passenger is asked with questions according to the question and the server acquires the response data of the passenger to the question, the server determines the type of the response data. Wherein the types of the response data include video response data and audio response data.

S52: and if the response data are video response data, determining the response posture information of the passenger according to the video response data.

After the type of the response data is determined, if the response data is video response data, determining response posture information of the occupant according to the video response data. The video response data is required to be firstly split into multi-frame images, then the multi-frame images can be subjected to key frame extraction, a plurality of acquired key frames are input into a preset gesture recognition model, gestures in the key frames are recognized, and the gestures of the key frames are obtained and used as response gesture information of passengers. Or directly inputting the multi-frame images into a preset gesture recognition model respectively to obtain the gesture of each frame as the response gesture information of the passenger.

S53: it is determined whether the occupant's response posture information matches the posture entry in the question.

After determining the response posture information of the occupant, determining whether the response posture information of the occupant is matched with the posture vocabulary entry in the question, and determining the state of the occupant according to the matching condition.

S54: if the occupant's response posture information does not match the posture vocabulary entry, the occupant's state is determined to be an abnormal state.

After determining whether the response posture information of the occupant is matched with the posture vocabulary entry in the question, if no response posture matched with the posture vocabulary entry exists in the response posture information of the occupant, determining that the response posture information of the occupant is not matched with the posture vocabulary entry, and the state of the occupant is an abnormal state; if the response posture information of the occupant includes a response posture matching the posture entry, it is determined that the response posture information of the occupant matches the posture entry and the state of the occupant is a normal state.

For example, the gesture entries contained in the question: and a hand-lifting unit configured to determine that a response posture matching the posture entry exists in the occupant response posture information if the occupant response posture information includes the hand-lifting posture, and that the occupant state is a normal state, and to determine that the occupant response posture information does not match the posture entry if the occupant response posture information does not include the hand-lifting posture, and that the occupant state is an abnormal state.

After the video response data of the passengers are acquired, the passengers are found to have no obvious change in posture according to the video response data, the passengers are shown to be in possible outbreak of diseases or in other abnormal conditions if the passengers do not perform posture response, if the abnormal label type of the response posture information is identified to be a body disease related behavior type, the passengers are determined to be possible outbreak of diseases, the vehicle movement needs to be cooperatively controlled in time, medical support is called emergently, and driving safety and passenger safety are guaranteed.

In the embodiment, by determining the type of the response data, if the response data is the video response data, determining the response posture information of the occupant according to the video response data, and then determining whether the response posture information of the occupant matches with the posture vocabulary entry in the question, if the response posture information of the occupant does not match with the posture vocabulary entry, determining that the state of the occupant is an abnormal state, and defining the specific step of determining the state of the occupant according to the response video data, the accuracy of occupant state identification is improved.

In an embodiment, as shown in fig. 8, after the step S51, that is, after the type of the response data is determined, the method specifically includes the following steps:

s55: and if the response data are audio response data, determining the response text information of the passenger according to the audio response data.

After the type of the response data is determined, if the response data is audio response data, the audio response data is input into a preset voice recognition model so as to perform text conversion on the audio response data and obtain response text information of the passenger.

S56: and determining whether the response text information contains a preset text.

After determining the response text information of the occupant from the audio response data, it is determined whether a preset text is contained in the response text information. The preset text is a text in the preset text label data, and can also be a preset sensitive keyword.

S57: and if the response text information contains a preset text, determining that the state of the passenger is an abnormal state.

After determining whether the response text information contains the preset text, if the response text information contains the preset text, determining that the state of the passenger is an abnormal state. If the response text information contains preset sensitive keywords, an alarm mechanism is directly triggered to alarm on a related alarm platform.

In this embodiment, after the type of the response data is determined, if the response data is audio response data, the response text information of the occupant is determined according to the audio response data, and then it is determined whether the response text information includes a preset text, and if the response text information includes the preset text, the state of the occupant is determined to be an abnormal state, so that the specific step of determining the state of the occupant according to the response audio data is clarified, and the accuracy of occupant state identification is improved.

In an embodiment, after determining that the state of the occupant is an abnormal state, the method specifically further includes the following steps:

A. recording the times of abnormal states and the times of questioning;

B. continuing to randomly generate a next question according to the entries in the preset entry set, and asking the passenger questions according to the next question;

C. acquiring response data of the passenger to the next question to determine whether the state of the passenger is an abnormal state according to the response data of the next question;

D. if the state of the passenger is determined to be an abnormal state, the steps A-C are repeatedly executed until the state of the passenger is determined to be the abnormal state

Stopping asking the questions of the passengers when the number of times of asking the questions is a preset number;

E. determining whether the number of times of the abnormal state is more than one half of the number of times of questioning;

F. and if the number of times of the abnormal state is greater than one half of the questioning number, generating abnormal state alarm information, and sending the abnormal state of the passenger and the abnormal state alarm information to an alarm platform.

In the present embodiment, after it is determined that there is abnormal information in the occupant status information, a repeated question is made a plurality of times based on the status result at the position identified by the response data fed back from the question, and the final occupant abnormal status and abnormal status alarm information are output based on the status identification result at each question. The process of multiple repeat questioning is as described above.

For example, after a question is asked for the passenger for the first time and the state of the passenger is determined to be abnormal according to response data, the number of times of the abnormal state and the number of times of the question asking need to be recorded, then a new question is randomly generated according to the terms in the preset term set again, the question asking continues to determine the state of the passenger according to new response data, if the state of the passenger is determined to be abnormal, the number of times of the abnormal state and the number of times of the question asking are recorded, the question is continuously generated and asked, and the question asking is stopped after the number of times of the question asking reaches 10 times after iteration is carried out for a plurality of times. And finally, outputting the final state of the passenger by adopting a majority voting mechanism according to the state judgment result of each question, generating abnormal state alarm information according to the actual condition of the abnormal state when the final state of the passenger is the abnormal state, and sending the abnormal state and the abnormal state alarm information of the passenger to an alarm platform for timely processing.

In one embodiment, in the process of repeatedly asking questions, the questions can be asked for the same question for multiple times, and if the response data of the questions asked for multiple times are inconsistent, the abnormal state of the passenger is determined.

In the present embodiment, after the state of the occupant is determined to be an abnormal state, an iterative determination mechanism is added: A. recording the times of abnormal states and the times of questioning; B. continuing to randomly generate a next question according to the entries in the preset entry set, and asking the passenger questions according to the next question; C. acquiring response data of the passenger to the next question to determine whether the state of the passenger is an abnormal state according to the response data of the next question; D. if the state of the passenger is determined to be an abnormal state, the steps A-C are repeatedly executed until the questions are asked for the passenger for a preset number of times; E. determining whether the number of times of the abnormal state is more than one half of the number of times of questioning; F. if the number of times of the abnormal state is greater than one half of the questioning number, generating abnormal state alarm information, and sending the abnormal state of the passenger and the abnormal state alarm information to an alarm platform; and repeatedly questioning the passengers for many times so as to carry out iterative judgment on the states of the passengers for many times, and outputting related information of abnormal states by adopting a voting mechanism, thereby further ensuring the accuracy of state judgment.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In one embodiment, an occupant state determination device is provided, which corresponds one-to-one to the occupant state determination method in the above-described embodiment. As shown in fig. 9, the occupant state determination device includes an acquisition module 901, an identification module 902, a detection module 903, a question module 904, and a determination module 905. The functional modules are explained in detail as follows:

an obtaining module 901, configured to obtain video data and audio data of an occupant in a vehicle;

a recognition module 902 for recognizing the video data and audio data of the occupant to generate state information of the occupant, the state information including posture information and voice text information of the occupant;

a detection module 903, configured to perform abnormality detection on the posture information and the speech text information, and determine whether there is abnormal information in the state information of the occupant;

a question module 904, configured to generate a question sentence according to the entry in the preset entry set if the state information of the passenger has abnormal information, and ask the passenger a question according to the question sentence;

a determining module 905, configured to acquire response data of the occupant to the question, and determine a state of the occupant according to the response data.

Further, the identifying module 902 is specifically configured to:

performing gesture extraction on the video data by adopting a preset gesture recognition model to obtain gesture information of the passenger, wherein the preset gesture recognition model is a deep learning model obtained according to different types of in-vehicle gesture image data and corresponding gesture training;

and performing text conversion on the audio data by adopting a preset speech recognition model to obtain speech text information of the passenger, wherein the preset speech recognition model is a deep learning model obtained by training according to speech data in different in-vehicle scenes.

Further, the identifying module 902 is further specifically configured to:

extracting key frames from the video data to obtain a plurality of key frames, wherein the key frames are obviously changed frames in the video data;

inputting each key frame into the preset gesture recognition model to obtain the gesture of each key frame;

generating the posture information of the occupant according to the posture of each of the key frames.

Further, the identifying module 902 is further specifically configured to:

a. splitting the video data into a plurality of frames, and carrying out tensor conversion on the image data of all the frames to obtain the tensor of each frame;

b. taking a first frame of the video data as a first key frame and a reference frame;

c. determining a tensor difference value of the reference frame and a frame next to the reference frame;

d. determining whether the tensor difference value is larger than a preset threshold value;

e. if the tensor difference value is larger than a preset threshold value, determining that the next frame of the tensor difference value frame is a second key frame, and taking the second key frame as the reference frame;

f. repeating steps c-e until all frames are traversed to obtain the plurality of key frames.

Further, the detection module 903 is specifically configured to:

detecting whether abnormal postures exist in the posture information and detecting whether abnormal texts exist in the voice text information;

if an abnormal posture exists in the posture information or an abnormal text exists in the voice text information, determining that abnormal information exists in the state information of the passenger;

and if the abnormal posture does not exist in the posture information or the abnormal text does not exist in the voice text information, determining that the abnormal information does not exist in the state information of the passenger.

Further, the determining module 905 is specifically configured to:

determining a type of the response data;

if the response data are video response data, determining the response posture information of the passenger according to the video response data;

determining whether the occupant's response gesture information matches a gesture term in the question;

and if the response posture information of the passenger does not match with the posture vocabulary entry in the question, determining that the state of the passenger is an abnormal state.

Further, after determining the type of the response data, the determining module 905 is further specifically configured to:

if the response data is audio response data, determining the response text information of the passenger according to the audio response data;

determining whether the response text information contains a preset text;

and if the response text information contains a preset text, determining that the state of the passenger is an abnormal state.

For the specific definition of the occupant state determination device, reference may be made to the above definition of the occupant state determination method, which is not described herein again. The respective modules in the above occupant state determination device may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing relevant data of the occupant state determination method, such as video data, audio data, posture information, voice text information, response data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an occupant status determination method.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring video data and audio data of passengers on a vehicle;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring video data and audio data of passengers on a vehicle;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An occupant state determination method, characterized by comprising:

acquiring video data and audio data of passengers on a vehicle;

if the state information of the passengers has abnormal information, generating a question according to entries in a preset entry set, and asking the passengers questions according to the question, wherein the preset entry set is stored in a block chain database;

2. The occupant status determination method of claim 1, wherein said identifying video data and audio data of the occupant to generate the occupant status information comprises:

3. The occupant status determination method according to claim 2, wherein the performing gesture extraction on the video data by using a preset gesture recognition model to obtain the gesture information of the occupant comprises:

4. The occupant status determination method according to claim 3, wherein said performing key frame extraction on the video data to obtain a plurality of key frames comprises:

5. The occupant state determination method according to claim 1, wherein the performing abnormality detection on the posture information and the speech text information to determine whether there is abnormal information in the occupant state information includes:

and if the abnormal posture does not exist in the posture information and the abnormal text does not exist in the voice text information, determining that the abnormal information does not exist in the state information of the passenger.

6. The occupant state determination method according to any one of claims 1 to 5, wherein the determining the state of the occupant from the response data includes:

determining a type of the response data;

7. The occupant status determination method according to claim 6, wherein after said determining the type of said response data, said method further comprises:

determining whether the response text information contains a preset text;

8. An occupant state determination device characterized by comprising:

9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the occupant status determination method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the occupant status determination method according to any one of claims 1 to 7.