CN112347849A - Video conference processing method, electronic device and storage medium - Google Patents

Video conference processing method, electronic device and storage medium Download PDF

Info

Publication number
CN112347849A
CN112347849A CN202011051217.7A CN202011051217A CN112347849A CN 112347849 A CN112347849 A CN 112347849A CN 202011051217 A CN202011051217 A CN 202011051217A CN 112347849 A CN112347849 A CN 112347849A
Authority
CN
China
Prior art keywords
face
picture
symbolic
area
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011051217.7A
Other languages
Chinese (zh)
Other versions
CN112347849B (en
Inventor
李康敬
陈望都
徐思捷
朱敏
丁凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Video Technology Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Video Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd, MIGU Video Technology Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202011051217.7A priority Critical patent/CN112347849B/en
Publication of CN112347849A publication Critical patent/CN112347849A/en
Application granted granted Critical
Publication of CN112347849B publication Critical patent/CN112347849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention provides a video conference processing method, electronic equipment and a storage medium; the method comprises the following steps: carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; determining the range of a cutting area according to the area of the symbolic face of the first picture; and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures. Because the cut picture only occupies a part of the original picture, the consumption of coding resources and the consumption of bandwidth during transmission are reduced. The cut picture only contains the area near the human face, so that the privacy information is not exposed.

Description

Video conference processing method, electronic device and storage medium
Technical Field
The present invention relates to the field of video technologies, and in particular, to a video conference processing method, an electronic device, and a storage medium.
Background
Video conferencing has become common practice in everyday work as a means of communicating over long distances.
In the prior art, a client of a video conference calls a local camera to acquire a video signal, and then encodes and uploads the acquired video signal to a cloud server. No additional processing is done on the locally acquired images in this process.
The above image acquisition process of the video conference client has certain defects, such as: after a user starts a local camera in a video conference, all pictures shot by the camera are shared in the conference; unimportant information in the picture shot by the camera can be coded and transmitted, so that extra coding consumption and bandwidth consumption are caused; private information is exposed.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a video conference processing method, an electronic device, and a storage medium.
An embodiment of a first aspect of the present invention provides a video conference processing method, including:
carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; wherein the identification degree of the symbolic face is higher than a preset threshold value;
determining the range of a cutting area according to the area of the symbolic face of the first picture;
and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures.
In the above technical solution, when the first picture includes at least two faces, the determining the landmark face of the first picture according to the result of the face recognition includes:
determining the number of the human face characteristic points contained in the first picture and the area of the range of the human face according to the human face recognition result, and calculating weight scores for all the human faces contained in the first picture according to the number of the human face characteristic points and the area of the range of the human face;
and selecting a face from the at least two faces as a symbolic face of the first picture according to the weight score.
In the above technical solution, the calculating a weight score for all faces included in the first picture according to the number of the face feature points and the area in the range where the face is located includes:
calculating the number of human face characteristic points which can be detected by a first human face to obtain a first parameter value; the first face is any one face contained in the first picture;
calculating the proportion of the size of the area in the range of the first face to the total picture size of the first picture to obtain a second parameter value;
calculating the offset degree between the central position of the area in the range of the first face and the central position of the first picture to obtain a third parameter value;
and calculating a weight score for the first face according to the first parameter value, the second parameter value, the third parameter value, the preset tolerance degree of the user to the face leaving the picture, the tolerance degree of the user to the face leaving the camera and the tolerance degree of the user to the face deviating from the camera.
In the above technical solution, the calculating a weight score for the first face according to the first parameter value, the second parameter value, the third parameter value, and a preset tolerance level of the user to the face leaving the picture, a tolerance level of the user to the face leaving the camera, and a tolerance level of the user to the face deviating the camera includes:
calculating a weight score for the first face according to a calculation formula of the weight score, wherein the calculation formula of the weight score is as follows:
Fa(c,a,d)=c/n×x1+a×x2+d×x3;
where c represents a first parameter value, a represents a second parameter value, d represents a third parameter value, n represents the total number of human face feature points, x1 represents the tolerance of the user to the human face leaving the picture, x2 represents the tolerance of the user to the human face leaving the camera, and x3 represents the tolerance of the user to the human face leaving the camera.
In the above technical solution, when the first picture only includes one face, the determining the landmark face of the first picture according to the result of the face recognition includes:
and taking the unique face contained in the first picture as a symbolic face of the first picture.
In the above technical solution, the determining the range of the clipping region according to the region where the landmark face is located includes:
determining the size of the cutting area according to the size of the area where the symbolic face is located;
and determining the position of the cutting area according to the position of the area where the symbolic face is located in the first picture and the size of the cutting area, so that the cutting area completely comprises the area where the symbolic face is located, and the area where the symbolic face is located at the preset position of the cutting area.
In the above technical solution, the encoding and transmitting the cut picture includes:
mapping the length of the cut picture to a preset first pixel value, mapping the width of the cut picture to a preset second pixel value, and obtaining the mapped cut picture with the length of the first pixel value and the width of the second pixel value;
and coding and transmitting the mapped cut picture.
In the above technical solution, when the first picture only includes one face, the determining the landmark face of the first picture according to the result of the face recognition further includes:
determining the number of characteristic points of the symbolic face of the first picture and the area of the range of the face according to the face recognition result, and calculating a weight score for the symbolic face of the first picture according to the number of characteristic points of the face and the area of the range of the face;
correspondingly, after the step of encoding and transmitting the cropped picture, the method further comprises:
every other preset time period, carrying out face recognition on a second picture in a video stream collected at the current place of the video conference, and calculating a weight score for a face contained in the second picture according to the face recognition result;
determining a symbolic face of a second picture according to the weight score of the face contained in the second picture;
when the symbolic face of the second picture is different from the symbolic face of the first picture, determining the range of a cutting area according to the symbolic face of the second picture, cutting the second picture and pictures behind the second picture in the video stream according to the range of the cutting area, and coding and transmitting the cut pictures;
when the symbolic face of the second picture does not exist and the continuous pictures in the video stream, the time sequence of which is behind the second picture, do not have the symbolic face, stopping the acquisition and uploading of the audio stream and the video stream of the video conference site;
and when the symbolic face of the second picture is the same as the symbolic face of the first picture, the weight score of the symbolic face of the second picture is smaller than that of the symbolic face of the first picture, and the difference value between the two weight scores reaches a preset threshold value, stopping the acquisition and uploading of the audio stream of the video conference site.
In a second embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the video conference processing method according to the first embodiment of the present invention are implemented.
An embodiment of the third aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the video conference processing method according to the embodiment of the first aspect of the present invention.
According to the video conference processing method, the electronic device and the storage medium provided by the embodiment of the invention, the pictures in the video stream are subjected to face recognition, and the cutting area is determined according to the face recognition result, so that the pictures are cut. Because the cut picture only occupies a part of the original picture, the consumption of coding resources and the consumption of bandwidth during transmission are reduced. The cut picture only contains the area near the human face, so that the privacy information is not exposed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a video conference processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an implementation manner of determining a range of a clipping region according to a region in which a landmark face is located in the embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating another implementation manner of determining a range of a clipping region according to a region in which a landmark face is located in the embodiment of the present invention;
fig. 4 is a schematic diagram of a video conference processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a video conference processing method provided in an embodiment of the present invention, and as shown in fig. 1, the video conference processing method provided in the embodiment of the present invention is applied to a video conference client, and the method includes:
step 101, performing face recognition on a first picture in a video stream acquired at the current place of a video conference, and determining a symbolic face of the first picture according to a face recognition result.
Those skilled in the art can easily understand that when the video conference client is started, the video conference client starts a local camera, and can acquire picture information of a video conference site, so as to form a video stream.
The video stream consists of individual pictures. In this step, face recognition may be sequentially performed on the pictures that make up the video stream according to the time sequence in the video stream. In the embodiment of the present invention, a picture corresponding to a moment when face recognition starts in the video stream is recorded as a first picture.
Face recognition of the first picture is common knowledge of a person skilled in the art and therefore will not be repeated here.
And after the face recognition is carried out on the first picture, a face recognition result can be obtained. The face recognition result comprises information such as the number, the position and the area of the face.
After a face is recognized for the first time in a video stream, a landmark face is determined in a first picture in which the face is recognized. A landmark face is also called a face priority feature set identifier (FLF), which is a face with a recognition level higher than a preset threshold in a picture. In a picture, there is generally only one landmark face, but in some special cases, if a plurality of faces are gathered in a small area, the number of landmark faces may be more than one. In the embodiment of the present invention, a case where at most one landmark face exists in one picture is taken as an example, and how to determine the landmark face is described.
The specific implementation steps for determining the landmark face comprise:
and step S1, determining the number of the human faces in the first picture according to the human face recognition result. When the number of faces is greater than 1, step S2 is performed, and when the number of faces is 1, step S3 is performed.
Step S2, calculating weight scores for all faces contained in the first picture according to the number of feature points of the faces contained in the first picture obtained by the face recognition result and the area of the range where the faces are located; and selecting one face from all faces contained in the first picture as the symbolic face of the first picture according to the weight score.
In the embodiment of the present invention, the region in which the face is located is a concept different from the face region. The face region generally refers to a face region of a person, and the shape and size of the face region depend on the shape and size of a naturally generated face. The area of the range of the face is an area which can completely cover the face area. In the embodiment of the present invention, the region of the range where the face is located is a rectangular region, and in other embodiments of the present invention, the region of the range where the face is located may also be in other shapes, such as a circle, a polygon, and the like. Obviously, the human face area is different from the area in which the human face is located in shape and size. The area of the region in which the face is located is larger than the area of the face region.
The feature points of the human face refer to the parts of the human face region, which can distinguish different human faces. Theoretically, the number of the human face characteristic points contained in the human face area is 98, and the human face characteristic points are respectively distributed on 9 parts of the lower jaw line, the left eyebrow, the right eyebrow, the left eye, the right eye, the internal nasal line, the external nasal line, the internal mouth and the external mouth of the human face. The specific distribution of these 98 facial feature points is well known to those skilled in the art and will not be further described in the examples of the present invention.
In actual detection, 98 personal face feature points are not necessarily detected due to the influence of external factors such as occlusion and illumination, and therefore, in the embodiment of the present invention, the number of face feature points included in the actually detected face region is obtained from the face recognition result.
Calculating weight scores for all faces contained in the first picture according to the number of feature points of the faces contained in the first picture obtained from the face recognition result and the region of the range where the faces are located, and specifically comprising the following steps:
calculating the number of human face characteristic points which can be detected by a first human face to obtain a first parameter value; the first face is any one face contained in the first picture;
calculating the proportion of the size of the area in the range of the first face to the total picture size of the first picture to obtain a second parameter value;
calculating the offset degree between the central position of the area in the range of the first face and the central position of the first picture to obtain a third parameter value;
and calculating a weight score for the first face according to the first parameter value, the second parameter value, the third parameter value, the preset tolerance degree of the user to the face leaving the picture, the tolerance degree of the user to the face leaving the camera and the tolerance degree of the user to the face deviating from the camera.
According to the description of the steps, the calculation formula for scoring the face is as follows:
FW(facial_weight)=Fa(chara_regions,acreage_weight,distance_weight)。
wherein chara _ regions represents the number of detected face feature points. As mentioned above, the number of face feature points is theoretically 98, but it is not always detected in face recognition. In the embodiment of the invention, according to the face recognition result, 1 is added to the number of chara _ regions when each feature point is detected on the face.
The image _ weight represents the proportion of the area size of the range of the human face to the total picture size of the video image frame. Assuming that the area of the range where the human face is located is a rectangular area, the length and the width of the rectangular area are x and y respectively, and the unit is a pixel, then the image _ weight is x × y/the total number of pixels in the video image frame. The total number of pixels of the video image frame is ox oy; ox represents the length of the video image frame and oy represents the width of the video image frame. For example, for a video image frame with a video display format of 1080P, the total number of pixels is 1960 × 1080.
distance _ weight represents the degree of displacement of the center position of the area of the range where the human face is located from the center position of the video image frame. Assuming that the area of the range where the face is located is a rectangular area, the coordinates at the upper left corner of the area are (tx, ty), and the coordinates at the center position of the video image frame are (0,0), calculating the deviation degree of the area center position of the range where the face is located from the center position of the video image frame by using the length and width (x, y) of the area of the range where the face is located, the distance data (tx, ty) and the length and width (ox, oy) of the video image frame: distance _ weight ((tx + x/2)/ox) x ((ty + y/2)/oy). The larger the distance _ weight value, the higher the degree of offset.
The parameters chara _ regions, access _ weight, and distance _ weight are abbreviated as c, a, and d, respectively, to obtain:
Fa(c,a,d)=c/n×x1+a×x2+d×x3。
wherein x1, x2 and x3 are user setting items, which respectively represent the tolerance of the user to the face leaving the picture, the tolerance of the user to the face leaving the camera and the tolerance of the user to the face leaving the camera, the defaults are all 1, and the values are [0,2 ], namely, the left-closed right-open interval from 0 to 2; n is the total number of facial feature points, which may be 98.
And after the weight score is obtained, the symbolic face of the first picture can be obtained according to the weight score. For example, 5 faces are detected in the first picture, and the face with the highest weight score is used as the landmark face of the first picture.
And step S3, taking the unique face in the first picture as the symbolic face of the first picture.
Since there is only one face in the first picture, this unique face is marked as a landmark face.
And step 102, determining the range of the cutting area according to the area where the symbolic face of the first picture is located.
In a previous step, a landmark face in the first picture has been determined. In this step, the range of the clipping region is determined according to the region where the landmark face is located.
As mentioned in the foregoing description, the face region and the region in which the face is located can be obtained from the face recognition result. The human face area and the area in the range of the human face are different in shape and size. In the embodiment of the present invention, for convenience of processing, the region in the range where the face is located is used as the region where the face is located. In other embodiments of the present invention, the face area may also be used as the area where the face is located according to actual needs.
The range of the cutting area comprises the size of the cutting area and the position of the cutting area.
The size of the cutting area can be determined according to the size of the area in which the landmark face is located. In the embodiment of the invention, the area of the cutting area is set to be 3 times of the area of the range where the symbolic human face is located. If the ratio of the height of the clipping area to the height of the area in which the symbolic face is located is 2, the ratio of the width of the clipping area to the width of the area in which the symbolic face is located is 1.5. In other embodiments of the present invention, the ratio of the length and/or width of the region in which the cropping region and the landmark face are located may also be adjusted according to actual needs.
After determining the size of the cutting area, the position of the cutting area can be further determined. For example, assuming that the region in which the landmark face is located is a rectangular region, based on the position of the region in which the landmark face is located in the video image frame, the top edge of the region in which the landmark face is located is taken as a starting point, and the region extends downward by 2 times the height of the region in which the landmark face is located, so as to determine the height range of the clipping region. The leftmost side of the region in the range of the symbolic face is taken as a starting point, and the region extends leftwards according to 0.25 times of the width of the region in the range of the symbolic face; and taking the rightmost side of the region in the range of the symbolic face as a starting point, and extending rightwards according to 0.25 times of the width of the region in the range of the symbolic face to determine the width range of the cutting region. The cutting area determined in the above manner is shown in fig. 2.
In some cases, the position of the cropping area cannot be determined in the aforementioned manner due to the position relationship of the area in the video image frame where the landmark human face is located. For example, it is still assumed that the region in the range where the landmark face is located is a rectangular region, and the height below the region in the range where the landmark face is located is not enough to extend 1 time below the region in the range where the landmark face is located according to the height of the region in the range where the landmark face is located, at this time, the height extending downward may be shortened, and the remaining height may extend above the region in the range where the landmark face is located. For example, the top edge of the region in the range of the symbolic face is used as the starting point, the region extends downwards according to 1.5 times of the height of the region in the range of the symbolic face, and the top edge of the region in the range of the symbolic face is used as the starting point, and the region extends upwards according to 0.5 times of the height of the region in the range of the symbolic face. The cutting area determined in the above manner is shown in fig. 3.
For another example, the width of the left side of the region in which the landmark face is located is insufficient, and the left side cannot be extended to the left by 0.25 times the width of the region in which the landmark face is located, starting from the leftmost side of the region in which the landmark face is located. In this case, the width extending to the left can be reduced, and the remaining width can be extended to the right side of the region in which the landmark face is located. For example, the leftmost side of the region in the range of the symbolic face is used as the starting point, the left side extends according to 0.1 time of the width of the region in the range of the symbolic face, and the rightmost side of the region in the range of the symbolic face is used as the starting point, and the right side extends according to 0.35 time of the width of the region in the range of the symbolic face. Similar processing is also performed when the width of the right side of the landmark face is insufficient.
It should be noted that, when there are a plurality of landmark faces in a picture, a plurality of landmark faces may be used as a whole to determine a clipping region. The specific implementation process is not substantially different from the process of determining the clipping region according to a symbolic face, and therefore, the description is not repeated here.
And 103, cutting the first picture and pictures behind the first picture in the video stream according to the range of the cutting area, and coding and transmitting the cut pictures.
After the range of the cutting area is determined, cutting a first picture and pictures in the video stream, the time sequence of which is behind the first picture, according to the range of the cutting area. How to crop these pictures and how to encode and transmit the cropped pictures are common knowledge of those skilled in the art, and therefore, the description thereof is not repeated here.
The video conference processing method provided by the embodiment of the invention determines the cutting area according to the face recognition result by carrying out face recognition on the picture in the video stream, and further cuts the picture. Because the cut picture only occupies a part of the original picture, the consumption of coding resources and the consumption of bandwidth during transmission are reduced. The cut picture only contains the area near the human face, so that the privacy information is not exposed.
Based on any of the above embodiments, in an embodiment of the present invention, the encoding and transmitting the cropped picture includes:
mapping the length of the cut picture to a preset first pixel value, mapping the width of the cut picture to a preset second pixel value, and obtaining the mapped cut picture with the length of the first pixel value and the width of the second pixel value;
and coding and transmitting the mapped cut picture.
Since the range of the cropping zone is determined according to the range of the landmark face zone, and the size of the landmark face zone varies according to the actual situation, in the previous embodiment of the present invention, the range of the cropping zone is not fixed, and the size of the cropped picture obtained according to the range of the cropping zone is not a fixed value. This is detrimental to the performance of the encoding operation. Therefore, in the embodiment of the present invention, a normalization operation is performed on the size of the cropped picture, that is, the length of the cropped picture is mapped to a preset first pixel value, and the width of the cropped picture is mapped to a preset second pixel value. If the height of the normalized picture is set to 160 pixels, the width is set to 90 pixels.
After the normalization operation is performed on the cropped picture size, the size of the picture to be encoded and transmitted is significantly reduced, for example, the picture size is transformed from 1920 × 1080 to 160 × 90.
The video conference processing method provided by the embodiment of the invention obviously reduces the size of the coded and transmitted pictures through normalization operation, and is beneficial to reducing the consumption of coding resources and the consumption of bandwidth during transmission.
Based on any of the foregoing embodiments, in an embodiment of the present invention, when the first picture only includes one face, the determining, according to the result of the face recognition, a landmark face of the first picture further includes:
determining the number of characteristic points of the symbolic face of the first picture and the area of the range of the face according to the face recognition result, and calculating a weight score for the symbolic face of the first picture according to the number of characteristic points of the face and the area of the range of the face;
correspondingly, after step 103, the method further comprises:
every other preset time period, carrying out face recognition on a second picture in a video stream collected at the current place of the video conference, and calculating a weight score for a face contained in the second picture according to the face recognition result;
determining a symbolic face of a second picture according to the weight score of the face contained in the second picture;
when the symbolic face of the second picture is different from the symbolic face of the first picture, determining the range of a cutting area according to the symbolic face of the second picture, cutting the second picture and pictures behind the second picture in the video stream according to the range of the cutting area, and coding and transmitting the cut pictures;
when the symbolic face of the second picture does not exist and the continuous pictures in the video stream, the time sequence of which is behind the second picture, do not have the symbolic face, stopping the acquisition and uploading of the audio stream and the video stream of the video conference site;
and when the symbolic face of the second picture is the same as the symbolic face of the first picture, the weight score of the symbolic face of the second picture is smaller than that of the symbolic face of the first picture, and the difference value between the two weight scores reaches a preset threshold value, stopping the acquisition and uploading of the audio stream of the video conference site.
In the embodiment of the present invention, the preset time period may be set according to the requirement of the user, for example, the length of the time period is set to 2 minutes or other time length values.
When the first picture only contains one face, the only face is used as the symbolic face of the first picture, the number of the face characteristic points of the symbolic face of the first picture and the area of the range where the face is located need to be obtained according to the face recognition result, and then the weight score is calculated for the symbolic face of the first picture according to the number of the face characteristic points and the area of the range where the face is located. The calculated weight scores may be used in subsequent case discrimination.
In the embodiment of the present invention, a picture corresponding to the moment when face recognition is performed again in the video stream is recorded as a second picture. The specific implementation steps of performing face recognition on the second picture, further calculating a weight score for a face included in the second picture according to a face recognition result, and determining a landmark face of the second picture according to the weight score are not repeated because there is no substantial area in the related operation of the first picture described in the previous embodiment of the present invention.
After the symbolic face of the second picture is obtained, it is compared with the symbolic face of the first picture, and a plurality of different situations may occur, which are respectively described as follows:
in case 1, the landmark face of the second picture is different from the landmark face of the first picture.
The situation may occur in various ways, such as a new participant is added in the video conference site, and if the position of the original participant in the video conference site is adjusted, the participant corresponding to the symbolic face of the first picture leaves the video conference site.
For such cases, the range of the cropping area may be determined according to the symbolic face of the second picture, the second picture and the picture in the video stream that is subsequent to the second picture in time sequence are cropped according to the range of the cropping area, and the cropped picture is encoded and transmitted.
Case 2, the landmark face of the second picture does not exist.
Such a situation may occur because the participants are away from the camera, such as the participants are away from the video conference site.
For such situations, the operations of face recognition and symbolic face judgment can be performed on a plurality of continuous pictures in the video stream, for example, a plurality of continuous pictures corresponding to the video stream with the duration length of 30 seconds, and if no symbolic face still exists, the acquisition and uploading of the audio stream and the video stream of the video conference site is stopped.
If the user wishes to continue the videoconference, re-access is required.
It should be noted that, in the process of performing face recognition and landmark face judgment on a plurality of continuous pictures in the video stream, the time sequence of which is after the second picture, pictures in the video conference site are not collected and uploaded any more, and the server maintains the face picture existing at last.
And in case 3, the symbolic face of the second picture is the same as the symbolic face of the first picture, and the weight score of the symbolic face of the second picture is smaller than the weight score of the symbolic face of the first picture.
Such situations may arise because the participant is far from the camera (the face area becomes smaller) or stands, turns (the number of face features is reduced), etc. At this time, the landmark face can still be found in the second picture, but the weight score of the landmark face is reduced compared with the weight score of the landmark face of the first picture. This also means that the integrity of the face features, the size of the area in which the face is located, and the distance of the area in which the face is located from the center of the video vary.
If the weight score is decreased by more than a predetermined threshold, for example, to 20% of the weight score of the landmark face of the first picture, it means that the user may intentionally leave the shooting range of the camera (the feature integrity is decreased), or the user may intentionally leave the camera (the size of the area in the range of the face is decreased), or the user may intentionally leave the camera (the distance of the area in the range of the face from the center of the video is changed). At this time, in order to avoid exposure of the privacy content of the user, acquisition and uploading of the audio stream of the video conference site are stopped. And when the weight score of the symbolic face of another picture is recovered, the audio stream uploaded to the video conference site is collected again.
Case 4, the remaining case where the landmark face of the second picture is the same as the landmark face of the first picture.
The aforementioned case 3 is a special case that the symbolic face of the second picture is the same as the symbolic face of the first picture, and for the remaining cases that the symbolic face of the second picture is the same as the symbolic face of the first picture, the picture can be continuously cropped according to the previous cropping range, and the cropped picture is encoded and transmitted.
The video conference processing method provided by the embodiment of the invention is beneficial to timely adjusting the change of the video conference scene and protecting the privacy of the participants by detecting the symbolic face of the picture again after a certain time period.
Based on any of the above embodiments, fig. 4 is a schematic diagram of a video conference processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the video conference processing apparatus according to the embodiment of the present invention includes:
the symbolic face determining module 401 is configured to perform face recognition on a first picture in a video stream acquired at a current location of a video conference, and determine a symbolic face of the first picture according to a result of the face recognition; wherein the identification degree of the symbolic face is higher than a preset threshold value;
a cropping area determining module 402, configured to determine a range of a cropping area according to an area where a landmark face of the first picture is located;
and a cropping and coding transmission module 403, cropping the first picture and pictures in the video stream that are subsequent to the first picture in time sequence according to the range of the cropping area, and coding and transmitting the cropped pictures.
The video conference processing device provided by the embodiment of the invention determines the cutting area according to the face recognition result by carrying out face recognition on the picture in the video stream, and further cuts the picture. Because the cut picture only occupies a part of the original picture, the consumption of coding resources and the consumption of bandwidth during transmission are reduced. The cut picture only contains the area near the human face, so that the privacy information is not exposed.
Fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may call logic instructions in memory 530 to perform the following method:
carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; wherein the identification degree of the symbolic face is higher than a preset threshold value;
determining the range of a cutting area according to the area of the symbolic face of the first picture;
and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 510, the communication interface 520, the memory 530, and the communication bus 540 shown in fig. 5, where the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540, and the processor 510 may call the logic instructions in the memory 530 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising:
carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; wherein the identification degree of the symbolic face is higher than a preset threshold value;
determining the range of a cutting area according to the area of the symbolic face of the first picture;
and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including:
carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; wherein the identification degree of the symbolic face is higher than a preset threshold value;
determining the range of a cutting area according to the area of the symbolic face of the first picture;
and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A video conference processing method, comprising:
carrying out face recognition on a first picture in a video stream collected in the current place of a video conference, and determining a symbolic face of the first picture according to the face recognition result; wherein the identification degree of the symbolic face is higher than a preset threshold value;
determining the range of a cutting area according to the area of the symbolic face of the first picture;
and according to the range of the cutting area, cutting the first picture and pictures behind the first picture in the video stream in time sequence, and coding and transmitting the cut pictures.
2. The video conference processing method according to claim 1, wherein when the first picture includes at least two faces, the determining the landmark face of the first picture according to the result of the face recognition comprises:
determining the number of the human face characteristic points contained in the first picture and the area of the range of the human face according to the human face recognition result, and calculating weight scores for all the human faces contained in the first picture according to the number of the human face characteristic points and the area of the range of the human face;
and selecting a face from the at least two faces as a symbolic face of the first picture according to the weight score.
3. The video conference processing method according to claim 2, wherein the calculating a weight score for all faces contained in the first picture according to the number of the face feature points and the area in which the face is located comprises:
calculating the number of human face characteristic points which can be detected by a first human face to obtain a first parameter value; the first face is any one face contained in the first picture;
calculating the proportion of the size of the area in the range of the first face to the total picture size of the first picture to obtain a second parameter value;
calculating the offset degree between the central position of the area in the range of the first face and the central position of the first picture to obtain a third parameter value;
and calculating a weight score for the first face according to the first parameter value, the second parameter value, the third parameter value, the preset tolerance degree of the user to the face leaving the picture, the tolerance degree of the user to the face leaving the camera and the tolerance degree of the user to the face deviating from the camera.
4. The video conference processing method according to claim 3, wherein the calculating the weight score for the first face according to the first parameter value, the second parameter value, the third parameter value, and preset tolerance of the user to the face leaving the picture, tolerance of the user to the face leaving the camera, and tolerance of the user to the face leaving the camera includes:
calculating a weight score for the first face according to a calculation formula of the weight score, wherein the calculation formula of the weight score is as follows:
Fa(c,a,d)=c/n×x1+a×x2+d×x3;
where c represents a first parameter value, a represents a second parameter value, d represents a third parameter value, n represents the total number of human face feature points, x1 represents the tolerance of the user to the human face leaving the picture, x2 represents the tolerance of the user to the human face leaving the camera, and x3 represents the tolerance of the user to the human face leaving the camera.
5. The video conference processing method according to claim 1, wherein when the first picture only includes one face, the determining a landmark face of the first picture according to the result of the face recognition includes:
and taking the unique face contained in the first picture as a symbolic face of the first picture.
6. The video conference processing method according to claim 1, wherein the determining the range of the cropping area according to the area where the landmark face is located comprises:
determining the size of the cutting area according to the size of the area where the symbolic face is located;
and determining the position of the cutting area according to the position of the area where the symbolic face is located in the first picture and the size of the cutting area, so that the cutting area completely comprises the area where the symbolic face is located, and the area where the symbolic face is located at the preset position of the cutting area.
7. The video conference processing method of claim 1, wherein the encoding and transmitting the cropped picture comprises:
mapping the length of the cut picture to a preset first pixel value, mapping the width of the cut picture to a preset second pixel value, and obtaining the mapped cut picture with the length of the first pixel value and the width of the second pixel value;
and coding and transmitting the mapped cut picture.
8. The video conference processing method according to any one of claims 1 to 7, wherein when the first picture only includes one face, the determining a landmark face of the first picture according to the result of the face recognition further includes:
determining the number of characteristic points of the symbolic face of the first picture and the area of the range of the face according to the face recognition result, and calculating a weight score for the symbolic face of the first picture according to the number of characteristic points of the face and the area of the range of the face;
correspondingly, after the step of encoding and transmitting the cropped picture, the method further comprises:
every other preset time period, carrying out face recognition on a second picture in a video stream collected at the current place of the video conference, and calculating a weight score for a face contained in the second picture according to the face recognition result;
determining a symbolic face of a second picture according to the weight score of the face contained in the second picture;
when the symbolic face of the second picture is different from the symbolic face of the first picture, determining the range of a cutting area according to the symbolic face of the second picture, cutting the second picture and pictures behind the second picture in the video stream according to the range of the cutting area, and coding and transmitting the cut pictures;
when the symbolic face of the second picture does not exist and the continuous pictures in the video stream, the time sequence of which is behind the second picture, do not have the symbolic face, stopping the acquisition and uploading of the audio stream and the video stream of the video conference site;
and when the symbolic face of the second picture is the same as the symbolic face of the first picture, the weight score of the symbolic face of the second picture is smaller than that of the symbolic face of the first picture, and the difference value between the two weight scores reaches a preset threshold value, stopping the acquisition and uploading of the audio stream of the video conference site.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the video conference processing method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the video conference processing method according to any one of claims 1 to 8.
CN202011051217.7A 2020-09-29 2020-09-29 Video conference processing method, electronic equipment and storage medium Active CN112347849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011051217.7A CN112347849B (en) 2020-09-29 2020-09-29 Video conference processing method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011051217.7A CN112347849B (en) 2020-09-29 2020-09-29 Video conference processing method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112347849A true CN112347849A (en) 2021-02-09
CN112347849B CN112347849B (en) 2024-03-26

Family

ID=74361243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011051217.7A Active CN112347849B (en) 2020-09-29 2020-09-29 Video conference processing method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112347849B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499612A (en) * 2021-06-18 2022-12-20 海信集团控股股份有限公司 Video communication method and device
WO2023225910A1 (en) * 2022-05-25 2023-11-30 北京小米移动软件有限公司 Video display method and apparatus, terminal device, and computer storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014047090A2 (en) * 2012-09-21 2014-03-27 Cisco Technology, Inc. Transition control in a videoconference
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN105512615A (en) * 2015-11-26 2016-04-20 小米科技有限责任公司 Picture processing method and apparatus
CN105938551A (en) * 2016-06-28 2016-09-14 深圳市唯特视科技有限公司 Video data-based face specific region extraction method
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN108038422A (en) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 Camera device, the method for recognition of face and computer-readable recording medium
WO2018113523A1 (en) * 2016-12-24 2018-06-28 深圳云天励飞技术有限公司 Image processing method and device, and storage medium
CN108269250A (en) * 2017-12-27 2018-07-10 武汉烽火众智数字技术有限责任公司 Method and apparatus based on convolutional neural networks assessment quality of human face image
CN108596140A (en) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 A kind of mobile terminal face identification method and system
CN108960047A (en) * 2018-05-22 2018-12-07 中国计量大学 Face De-weight method in video monitoring based on the secondary tree of depth
CN109558764A (en) * 2017-09-25 2019-04-02 杭州海康威视数字技术股份有限公司 Face identification method and device, computer equipment
CN109858426A (en) * 2019-01-27 2019-06-07 武汉星巡智能科技有限公司 Face feature extraction method, device and computer readable storage medium
CN110955912A (en) * 2019-10-29 2020-04-03 平安科技(深圳)有限公司 Privacy protection method, device and equipment based on image recognition and storage medium thereof
CN111651632A (en) * 2020-04-23 2020-09-11 深圳英飞拓智能技术有限公司 Method and device for outputting voice and video of speaker in video conference

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
WO2014047090A2 (en) * 2012-09-21 2014-03-27 Cisco Technology, Inc. Transition control in a videoconference
CN105512615A (en) * 2015-11-26 2016-04-20 小米科技有限责任公司 Picture processing method and apparatus
CN105938551A (en) * 2016-06-28 2016-09-14 深圳市唯特视科技有限公司 Video data-based face specific region extraction method
WO2018113523A1 (en) * 2016-12-24 2018-06-28 深圳云天励飞技术有限公司 Image processing method and device, and storage medium
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN109558764A (en) * 2017-09-25 2019-04-02 杭州海康威视数字技术股份有限公司 Face identification method and device, computer equipment
CN108038422A (en) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 Camera device, the method for recognition of face and computer-readable recording medium
WO2019100608A1 (en) * 2017-11-21 2019-05-31 平安科技(深圳)有限公司 Video capturing device, face recognition method, system, and computer-readable storage medium
CN108269250A (en) * 2017-12-27 2018-07-10 武汉烽火众智数字技术有限责任公司 Method and apparatus based on convolutional neural networks assessment quality of human face image
CN108596140A (en) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 A kind of mobile terminal face identification method and system
CN108960047A (en) * 2018-05-22 2018-12-07 中国计量大学 Face De-weight method in video monitoring based on the secondary tree of depth
CN109858426A (en) * 2019-01-27 2019-06-07 武汉星巡智能科技有限公司 Face feature extraction method, device and computer readable storage medium
CN110955912A (en) * 2019-10-29 2020-04-03 平安科技(深圳)有限公司 Privacy protection method, device and equipment based on image recognition and storage medium thereof
CN111651632A (en) * 2020-04-23 2020-09-11 深圳英飞拓智能技术有限公司 Method and device for outputting voice and video of speaker in video conference

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499612A (en) * 2021-06-18 2022-12-20 海信集团控股股份有限公司 Video communication method and device
WO2023225910A1 (en) * 2022-05-25 2023-11-30 北京小米移动软件有限公司 Video display method and apparatus, terminal device, and computer storage medium

Also Published As

Publication number Publication date
CN112347849B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
JP6283108B2 (en) Image processing method and apparatus
KR102277048B1 (en) Preview photo blurring method and device and storage medium
US9196071B2 (en) Image splicing method and apparatus
EP2556464B1 (en) Skin tone and feature detection for video conferencing compression
US8773498B2 (en) Background compression and resolution enhancement technique for video telephony and video conferencing
WO2017016030A1 (en) Image processing method and terminal
US9305331B2 (en) Image processor and image combination method thereof
CN112347849A (en) Video conference processing method, electronic device and storage medium
WO2021057689A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium, and electronic device
JP2005303991A (en) Imaging device, imaging method, and imaging program
US9992450B1 (en) Systems and methods for background concealment in video conferencing session
CN107862658B (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
US8269819B2 (en) Image generating apparatus for generating three-dimensional image having high visibility
US11917158B2 (en) Static video recognition
WO2014169653A1 (en) Method and device for optimizing image synthesis
CN107622497B (en) Image cropping method and device, computer readable storage medium and computer equipment
CN111880711B (en) Display control method, display control device, electronic equipment and storage medium
CN112446254A (en) Face tracking method and related device
US9113153B2 (en) Determining a stereo image from video
JP2015191358A (en) Central person determination system, information terminal to be used by central person determination system, central person determination method, central person determination program, and recording medium
CN115314658A (en) Video communication method and system based on three-dimensional display
CN113947708A (en) Lighting device lamp efficiency control method, system, device, electronic device and medium
CN114385847A (en) Picture data processing method and device, computer equipment and storage medium
US20220245864A1 (en) Generating method of conference image and image conference system
CN109819318B (en) Image processing method, live broadcast method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant