CN115482485A

CN115482485A - Video processing method and device, computer equipment and readable storage medium

Info

Publication number: CN115482485A
Application number: CN202211078027.3A
Authority: CN
Inventors: 张伟; 何得淮; 蒋静文; 何行知; 姚佳; 颜泳涛; 王垒; 路浩
Original assignee: Sichuan Provincial Prison Administration; West China Hospital of Sichuan University
Current assignee: Sichuan Provincial Prison Administration; West China Hospital of Sichuan University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-12-16

Abstract

The embodiment of the invention discloses a video processing method, a video processing device, computer equipment and a readable storage medium, wherein the video processing method comprises the following steps: acquiring a video file; preprocessing the video file according to a preset image normalization rule to obtain a target image sequence; respectively carrying out face detection and human body detection on the target image sequence to obtain face key points and human body key points of the target object; recognizing the emotional state and the drowsiness state of the corresponding target object based on the face key points; identifying the limb posture and the limb motion state of the corresponding target object based on the human body key points; judging whether the behavior set of the target object comprises a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state, and sending prompt information to a preset terminal when the behavior set of the target object comprises the preset behavior. According to the method, the behavior events and the emotional states of the target objects in the video are intelligently identified, so that hidden danger behaviors can be predicted more stably and accurately.

Description

Video processing method and device, computer equipment and readable storage medium

Technical Field

The invention relates to the technical field of intelligent security, in particular to a video processing method and device, computer equipment and a readable storage medium.

Background

The video analysis technology for the surveillance video is a key technology in the fields of intelligent security and the like, in the existing video analysis scheme for the surveillance video, the traditional image algorithm is used for identifying the light and shadow change condition in an image picture, target detection and target tracking are carried out according to the light and shadow change condition, and the defects of weak anti-interference capability, high false alarm rate and the like are overcome when the abnormal behavior of a target object is identified according to the traditional image identification algorithm.

In the existing video monitoring scheme, an identification means for the emotional state of the target object is lacked, whether the target is in a dangerous behavior state or not is judged only by depending on a behavior template of the target object, and the problem of error identification is easy to occur.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide a video processing method, an apparatus, a computer device, and a readable storage medium, and the specific scheme is as follows:

in a first aspect, an embodiment of the present application provides a video processing method, including:

acquiring a video file;

preprocessing the video file according to a preset image normalization rule to obtain a target image sequence;

respectively carrying out face detection and human body detection on the target image sequence to obtain face key points and human body key points of a target object;

identifying an emotional state and a drowsiness state of a corresponding target object based on the face key points;

identifying limb postures and limb motion states of corresponding target objects based on the human body key points;

judging whether the behavior set of the target object comprises a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state, and sending prompt information to a preset terminal when the behavior set of the target object comprises the preset behavior.

According to a specific implementation manner of the embodiment of the present application, the performing face detection and human body detection on the target image sequence respectively to obtain a face key point and a human body key point of the target object includes:

detecting and intercepting a face area of each frame of image in the target image sequence through a haar feature recognition model; identifying face key points in the face region based on a face key point detection network to obtain face key points of the target object;

detecting and intercepting a human body region of each frame of image in the target image sequence through a Faster RCNN model; and identifying the human body key points of the human body area through a high-resolution network to obtain the human body key points of the target object.

According to a specific implementation manner of the embodiment of the present application, the method further includes:

recognizing the face area based on a preset fine-grained head pose estimation network to obtain the head pose of the target object;

the determining whether the behavior set of the target object includes a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state further includes:

judging whether the behavior set of the target object comprises preset behaviors according to the emotional state, the drowsiness state, the head posture, the limb posture and the limb movement state.

According to a specific implementation manner of the embodiment of the present application, the identifying an emotional state of the corresponding target object based on the face key point includes:

dividing a first number of face key points into a second number of face key areas, wherein the first number is larger than the second number;

and identifying the key human face area based on a preset annular neural network model, and obtaining the emotional state corresponding to the target object.

According to a specific implementation manner of the embodiment of the present application, the identifying a drowsiness state of a corresponding target object based on the face key points includes:

calculating the eye closing rate of the corresponding target object based on the human face key points of the eye area and a preset eye closing rate calculation formula;

if the eye closing rate is larger than a first threshold value, identifying that the target object is in a deep sleep state;

if the eye closing rate is smaller than the first threshold and larger than a second threshold, identifying the target object as having a drowsiness state, wherein the first threshold is larger than the second threshold;

and if the eye closing rate is smaller than the second threshold value, identifying that the target object is in a waking state.

According to a specific implementation manner of the embodiment of the present application, the preprocessing the video file according to a preset image normalization rule to obtain a target image sequence includes:

reading the video file according to the time sequence to obtain an initial image sequence with a plurality of frames;

and carrying out RGB image normalization processing on each frame of object to obtain the target image sequence.

According to a specific implementation manner of the embodiment of the present application, the identifying the limb posture and the limb movement state of the corresponding target object based on the human body key points includes:

recognizing the limb posture of the corresponding target object according to the position coordinates of the key points of the human body;

and calculating the motion amplitude and the motion speed of each limb part of the corresponding target object according to the change condition of the position coordinates of the key points of the human body in the continuous multi-frame images so as to obtain the limb motion state of the target object.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

the acquisition module is used for acquiring a video file;

the preprocessing module is used for preprocessing the video file according to a preset image normalization rule to obtain a target image sequence;

the key point detection module is used for respectively carrying out face detection and human body detection on the target image sequence so as to obtain face key points and human body key points of a target object;

the face recognition module is used for recognizing the emotion state and the drowsiness state of the corresponding target object based on the face key points;

the human body identification module is used for identifying the limb posture and the limb motion state of the corresponding target object based on the human body key points;

and the behavior recognition module is used for judging whether the target object comprises a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state and sending prompt information to a preset terminal when the target object comprises the preset behavior.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program, when running on the processor, executes the video processing method according to any one of the first aspect and the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program runs on a processor, the computer program performs the video processing method according to any one of the first aspect and the embodiments of the first aspect.

The embodiment of the application provides a video processing method, a video processing device, computer equipment and a readable storage medium, and the method comprises the following steps: acquiring a video file; preprocessing the video file according to a preset image normalization rule to obtain a target image sequence; respectively carrying out face detection and human body detection on the target image sequence to obtain face key points and human body key points of a target object; identifying an emotional state and a drowsiness state of a corresponding target object based on the face key points; identifying limb postures and limb motion states of corresponding target objects based on the human body key points; judging whether the behavior set of the target object comprises a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state, and sending prompt information to a preset terminal when the behavior set of the target object comprises the preset behavior. According to the method, the behavior events and the emotional states of the target objects in the video are intelligently identified, so that hidden danger behaviors can be predicted more stably and accurately.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

Fig. 1 is a schematic method flow diagram illustrating a video processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating an identification effect of a face key point in a video processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an identification effect of a human body key point in a video processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a head pose recognition effect of a video processing method according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating recognition effects of a human body region and a human face region in a video processing method according to an embodiment of the present application;

fig. 6 shows a device module schematic diagram of a video processing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as terms defined in a commonly used dictionary) will be construed to have the same meaning as the contextual meaning in the related art and will not be construed to have an idealized or overly formal meaning unless expressly so defined in various embodiments of the present invention.

Referring to fig. 1, a schematic method flow diagram of a video processing method provided in an embodiment of the present application is shown, where as shown in fig. 1, the video processing method includes:

step S101, acquiring a video file;

in a specific embodiment, the video file may be acquired by any image pickup device for shooting a video, and the image pickup device may shoot the video file in real time and then send the video file to the video processing apparatus provided in this embodiment; after a video file is shot for a period of time, the stored video file can be transferred to the video processing device provided by the embodiment, and the mode of acquiring the video file can be adaptively replaced according to the actual application scene.

In this embodiment, the image capturing apparatus may be set in an area where video monitoring is required, and specifically, the image capturing apparatus is a monitoring camera.

For example, the image pickup apparatus may be provided in public areas such as hospitals, prisons, schools, and the like, for monitoring abnormal behaviors of the target object.

Step S102, preprocessing the video file according to a preset image normalization rule to obtain a target image sequence;

in a specific embodiment, after the video file is obtained, the video file is decoded to obtain a video stream with a time axis.

After the multi-frame images in the video file are extracted, preset image normalization processing is carried out on each frame of image, multi-frame images in a unified RGB format can be obtained, and the target image sequence is formed on the basis of the multi-frame images in the unified RGB format.

Specifically, the decoding mode of the video file may be adaptively replaced according to an actual application scene, and the target image sequences are arranged according to a time sequence of a time axis.

According to a specific implementation manner of the embodiment of the present application, the step of preprocessing the video file according to a preset image normalization rule to obtain a target image sequence includes:

In a specific embodiment, the images in the video file are extracted frame by frame according to the time axis of the video file to obtain an initial image sequence with multiple frames. Specifically, the number of frames of the initial image sequence is adaptively changed according to the time length of the video file, and the more the time length of the video file is, the larger the number of frames of the initial image sequence is.

After the initial image sequence is obtained, converting each frame image into an RGB space, and normalizing color data in three color channels of Red (Red), green (Green) and Blue (Blue) in the RGB space.

For example, if the mean values of the three channels of red, green and blue in RGB are 0.6016,0.4808 and 0.4286, respectively, the variance is 0.2950,0.2709 and 0.2624. The gray values of three channels of red, green and blue in an input image at the position of any pixel point of the first frame image are respectively 0.7333,0.5176 and 0.4902. After RGB image normalization, the gray values of the pixel points in three channels in the image are-0.5106, -0.0666 and 0.6084 respectively.

In a specific implementation process, after a frame of image is extracted, RGB image normalization processing may be performed on the current frame of image, which is not limited herein.

According to the embodiment, the RGB image normalization processing is carried out on each frame of image in the initial image sequence, so that the influence of the light degree received in the image recognition process can be effectively avoided, and the robustness of the video processing method is effectively improved.

Step S103, respectively carrying out face detection and human body detection on the target image sequence to obtain face key points and human body key points of a target object;

in a specific embodiment, face detection and human body detection are performed on each frame of image in the target image sequence to obtain face key points and human body key points of a human body appearing in the image.

The target object is a human body or a human face in the image.

In an actual implementation process, a frame of image may include a plurality of target objects, may not include a target object, and may include a part of the target object.

When a frame of image comprises a plurality of target objects, face detection and human body detection are respectively carried out on each target object, and face key points and human body key points corresponding to each target object are obtained.

When a frame of image comprises partial target objects, face detection and human body detection are carried out on each target object, and key points which cannot be identified are marked as unavailable points.

When the target object is not included in one frame of image, the current frame of image is skipped, and whether the target object is included in the next frame of image is detected.

In this embodiment, the detected key points of the face of the target object may be as shown in fig. 2, and the detected key points of the body of the target object may be as shown in fig. 3.

According to a specific implementation manner of the embodiment of the present application, the step of performing face detection and human body detection on the target image sequence to obtain face key points and human body key points of the target object respectively includes:

In a specific embodiment, the Haar-like features (Haar-like features) recognition model is obtained by performing model training according to a large number of face training data sets.

When each frame image in the target image sequence is obtained through processing, sending a current frame image to the haar feature recognition model so as to recognize the face in the current frame image through the haar feature recognition model.

And after the face region identified by the haar feature identification model is intercepted from the current frame image, face key point detection is further carried out on the face region based on a face key point detection network (PLFD-Net) so as to obtain the face key points of the corresponding target object.

In this embodiment, the number of face key points identified based on the face key point detection network is 98, as shown in fig. 2.

It should be noted that the number of the face key points can be adaptively modified according to the training mode of the face key point detection network, and adaptively replaced according to the actual application scenario.

In practical applications, each face key point has corresponding region coordinates, for example, as shown in the face key point layout of fig. 2, the 1 st point key point coordinate is [0.05529,0.35596], the 10 th point key point coordinate is [0.21497,0.23040], the 30 th point key point coordinate is [0.40970,0.41727], the 50 th point key point coordinate is [ 0.5657, 0.56448], the 70 th point key point coordinate is [0.65954,0.65997], the 90 th point key point coordinate is [ 0.8084543, 0.80811], and the 98 th point key point coordinate is [ 0.077, 0.44416].

In a specific embodiment, the Faster RCNN model is obtained by performing neural network training based on a large number of human body image training sets.

And when each frame image in the target image sequence is obtained through processing, sending a current frame image to the Faster RCNN model so as to identify the human body in the current frame image through the Faster RCNN model.

And after the human body region identified by the Faster RCNN model is intercepted from the current frame image, further carrying out human body key point identification on the human body region based on a high-resolution network (Hr-Net) so as to obtain human body key points corresponding to the target object.

In this embodiment, the number of the human key points identified based on the high resolution network is 17, as shown in fig. 3. Specifically, the key points of the human body include 0 nose, 1 left eye, 2 right eye, 3 left ear, 4 right ear, 5 left shoulder, 6 right shoulder, 7 left elbow, 8 right elbow, 9 left wrist, 10 right wrist, 11 left hip, 12 right hip, 13 left knee, 14 right knee, 15 left ankle and 16 right ankle.

It should be understood that the human body key points may also be adaptively replaced according to the actual application scenario, which is not limited herein.

According to a specific implementation of the embodiments of the present application, the method further includes:

recognizing the face area based on a preset fine-grained head pose estimation network to obtain the head pose of a target object;

in a specific embodiment, after the corresponding face region is identified, the three-axis angle estimated in the three-dimensional head space can be further identified based on a fine-grained head pose estimation network (Hope-Net) to obtain head pose data.

Specifically, three-axis angles of the head in three-dimensional space are shown in fig. 4, including a yaw angle, a roll angle, and a pitch angle.

Step S104, recognizing the emotional state and the drowsiness state of the corresponding target object based on the face key points;

in a specific embodiment, the face key points of the target object are analyzed by the recognition model according to the preset neural network model and the drowsiness state calculation model, so as to obtain the emotion state and the drowsiness state of the corresponding target object.

In practical applications, the drowsy state may also be a mental state, which is used to indicate the degree of the target object approaching the sleeping state.

According to a specific implementation manner of the embodiment of the application, the step of identifying the emotional state of the corresponding target object based on the face key points includes:

In a specific embodiment, after 98 face key points are obtained as shown in fig. 2, the face is divided into 20 face key areas according to the actual positions of the face key points in the face area.

In this embodiment, the key regions of the human face include left corner of the eye, left eyebrow, left eye socket, left cheek, left mouth corner, left half mouth, left lower jaw, eyebrow, nose, nasolabial sulcus, chin, right eye corner, right eyebrow, right eye socket, right cheek, right mouth corner, right half mouth, and right lower jaw.

The key human face area is a key facial area motion area.

It should be understood that the specific values of the first quantity and the second quantity may also be set adaptively according to the actual application scenario.

After the second number of face key areas are divided, each face key area corresponding to the target object is identified based on a preset circular convolution neural network model (ACNN) obtained by carrying out neural network training in advance according to a large number of image training sets representing various types of emotion states, and the emotion state of the target object can be accurately identified.

In the present embodiment, the emotional state is classified into calm, happy, sad and other four types. It should be appreciated that other types of emotional states may include anger, irritability, and the like.

After the human face key area of the target object is sent to the annular convolutional neural network model, the annular convolutional neural network model can derive probability values of various emotional states. For example, when the probability values of the types of emotional states of the target object are recognized as calm 0.6941, happy 0.0073, sad 0.2629 and other 0.0357, the emotional state of the target object is determined to be calm.

According to a specific implementation manner of the embodiment of the present application, the step of identifying the drowsy state of the corresponding target object based on the face key points includes:

In a specific embodiment, the formula for calculating the eye closing rate is:

where op denotes the degree of eye openness, p _{l_top} Representing the left eye upper coordinate point, p _{r_top} Represents the coordinate point on the upper right eye, p _{l_down} Denotes the left eye lower coordinate point, p _{r_down} Represents the coordinate point, width, of the lower side of the right eye _l Width of left eye _r Indicates the width of the right eye, c indicates whether the eye is open, t indicates the calculation time window, and r indicates the eye-open rate.

Specifically, the calculation time window is a time value required for calculating the one-time eye closing rate, in this embodiment, the calculation time window may take 30s, or may take other values, and the specific value of the calculation time window is adaptively replaced according to the actual application scenario.

As shown in fig. 2, select p _{l_top} And p _{r_top} The coordinate points can be obtained according to the face key points 61, 62, 63, 69, 70 and 71.

Selection of p _{l_down} And p _{r_down} When the coordinate points are obtained, the face key points 67, 66, 65, 75, 74 and 73 can be obtained.

In particular embodiments, the distance between the face

key points

62 and 66 may be calculated to determine the degree of right eye closure of the target object; the distance between the face

key points

70 and 74 is calculated to determine the degree of closure of the left eye of the target subject.

The current eye closing rate of the target object is the average value of the left eye closing degree and the right eye closing degree of the target object.

Specifically, when calculating the current eye closing rate of the target user, a period of time image frames need to be acquired to improve the accuracy of eye closing rate calculation.

In this embodiment, the first threshold may be set to 0.8, and the second threshold may be set to 0.126.

When the eye closing rate is greater than 0.8, the target object is in a deep sleep state; when the eye closing rate is less than 0.8 and greater than 0.126, the target object is in a drowsy state; when the eye closing rate is less than 0.126, the target object is in a waking state.

The embodiment provides the method for identifying the emotional state and the drowsy mental state, whether the user can execute the abnormal behavior can be effectively and accurately predicted according to the emotion and the mental state of the target object, and the accuracy of the video processing method is further improved.

Step S105, recognizing the limb posture and the limb motion state of the corresponding target object based on the human body key points;

in a specific embodiment, the current limb posture and the limb motion state of the target object can be accurately identified according to the position coordinates of the key points of the human body and the pre-trained neural network model.

According to a specific implementation manner of the embodiment of the application, the step of identifying the limb posture and the limb movement state of the corresponding target object based on the human body key points includes:

In a specific embodiment, the moving range of the target object in the video shooting area can be defined in advance, and whether the user exceeds the preset moving range can be effectively identified by detecting the position of the human body key point in the image, so that whether the user has abnormal behaviors or not is judged.

In addition, according to the relation between the position coordinates of the key points of the human body, the body posture of the target object can be identified and judged, wherein the body posture comprises standing, squatting, lying down and the like.

The motion amplitude and the motion speed of each human body key point can be calculated through the motion trail of each human body key point of the same target object in continuous multi-frame images, and then the motion state of each limb part of the target object can be obtained according to the motion amplitude and the motion speed of the human body key of the corresponding limb part.

If the motion state of any limb part of the target object is abnormal, namely the motion amplitude exceeds a preset amplitude threshold value or the motion speed exceeds a preset speed threshold value, the state that the target object executes abnormal behaviors can be output, and then a message prompting step is carried out.

And S106, judging whether the behavior set of the target object comprises a preset behavior according to the emotional state, the drowsiness state, the limb posture and the limb movement state, and sending prompt information to a preset terminal when the behavior set of the target object comprises the preset behavior.

In a specific embodiment, the preset behavior may be set adaptively according to an actual application scenario.

As shown in fig. 5, the recognition results of the target object face region and the human body region can be obtained.

For example, in a hospital, if the emotional state of the user is angry, and the sleepy state is awake, the body posture is upright, and the body movement state exceeds a preset body amplitude threshold and a body velocity threshold, it can be recognized that the target object is in a medical alarm condition, and at this time, prompt information is sent to a mobile terminal used by security guards in time to inform the security guards of handling the medical alarm condition in time.

If the emotional state of the user is in a sad state and the sleepy state is in a sleepy state, the body posture is in a squatting state, and the body motion state exceeds a preset body amplitude threshold value, sudden diseases possibly occurring in the target object can be identified, and at the moment, prompt information is timely sent to a mobile terminal used by medical staff to inform the medical staff to check the state of the target object in time.

In the prison, if the emotion state of a user is in an angry state, the drowsiness state is a waking state, the limb posture is in an upright state, the limb motion state exceeds a preset limb amplitude threshold value and a limb speed threshold value, and the motion state of the user exceeds a preset motion area range, the target object is identified to be possibly in a prison escaping state, and at the moment, prompt information is sent to a mobile terminal used by a prison manager in time to inform the prison manager to check the state of the target object in time.

The step of judging whether the behavior set of the target object comprises preset behaviors according to the emotional state, the drowsiness state, the limb posture and the limb movement state further comprises:

Furthermore, when the preset behavior is set, the preset behavior can be set by referring to the head posture, and the identification accuracy of the preset behavior can be effectively improved.

In summary, in the video processing method provided by this embodiment, a deep learning model is trained on a plurality of large-scale public databases to realize detection of faces and human bodies in videos; then, 98 key point positions of the face are estimated in the face area, 20 face motion area extraction is realized through face key point identification, expression identification is further realized, and meanwhile, the target drowsiness state is judged through the opening and closing frequency and time of the eyes; finally, for the human body area, the estimation of 17 key points of the human body is realized through a human body key point detection network trained by large-scale data, the body state of the human body is obtained, and the behaviors of leaving the visual field controllable area and having overlarge limb actions are detected and reminded; the video processing method provided by the invention can realize high noise immunity multi-scene applicable video multi-target detection, expression recognition, body state and motion state recognition through a plurality of artificial intelligence models trained by big data.

Referring to fig. 6, a schematic diagram of device modules of a video processing device 600 according to an embodiment of the present application is shown, where, as shown in fig. 6, the video processing device 600 according to the embodiment of the present application includes:

an obtaining module 601, configured to obtain a video file;

a preprocessing module 602, configured to preprocess the video file according to a preset image normalization rule to obtain a target image sequence;

a key point detection module 603, configured to perform face detection and human body detection on the target image sequence, respectively, to obtain a face key point and a human body key point of the target object;

a face recognition module 604 for recognizing an emotional state and a drowsy state of the corresponding target object based on the face key points;

a human body recognition module 605, configured to recognize a limb posture and a limb movement state of the corresponding target object based on the human body key points;

a behavior recognition module 606, configured to determine whether the target object includes a preset behavior according to the emotional state, the drowsiness state, the limb posture, and the limb movement state, and send a prompt message to a preset terminal when the target object includes the preset behavior.

In addition, an embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program, when running on the processor, executes the video processing method in the foregoing embodiment.

An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a processor, the computer program performs the video processing method in the above embodiment.

In addition, for the specific implementation processes of the video processing apparatus, the computer device, and the computer-readable storage medium mentioned in the foregoing embodiments, reference may be made to the specific implementation processes of the foregoing method embodiments, and details are not repeated here.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A video processing method, comprising:

acquiring a video file;

2. The video processing method according to claim 1, wherein the performing face detection and body detection on the target image sequence to obtain face key points and body key points of the target object respectively comprises:

3. The video processing method of claim 2, wherein the method further comprises:

4. The video processing method according to claim 1, wherein said identifying an emotional state of a corresponding target object based on the face keypoints comprises:

5. The video processing method according to claim 1, wherein the identifying a drowsiness state of a corresponding target object based on the face keypoints comprises:

6. The video processing method according to claim 1, wherein the preprocessing the video file according to a preset image normalization rule to obtain a target image sequence comprises:

7. The video processing method according to claim 6, wherein the identifying of the limb posture and the limb movement state of the corresponding target object based on the human body key points comprises:

8. A video processing apparatus, comprising:

the acquisition module is used for acquiring a video file;

the human body recognition module is used for recognizing the limb postures and the limb motion states of the corresponding target objects based on the human body key points;

9. A computer device, characterized in that it comprises a processor and a memory, said memory storing a computer program which, when run on said processor, performs the video processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when run on a processor, performs the video processing method of any of claims 1 to 7.