CN117294945A - Intelligent conference method capable of automatically aligning face of speaker through guide rail camera - Google Patents

Intelligent conference method capable of automatically aligning face of speaker through guide rail camera Download PDF

Info

Publication number
CN117294945A
CN117294945A CN202311215403.3A CN202311215403A CN117294945A CN 117294945 A CN117294945 A CN 117294945A CN 202311215403 A CN202311215403 A CN 202311215403A CN 117294945 A CN117294945 A CN 117294945A
Authority
CN
China
Prior art keywords
face
camera
conference
speaker
guide rail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311215403.3A
Other languages
Chinese (zh)
Inventor
曾泳豪
朱正辉
明德
余吉昌
池旺钊
区文焯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Baolun Electronics Co ltd
Original Assignee
Guangdong Baolun Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Baolun Electronics Co ltd filed Critical Guangdong Baolun Electronics Co ltd
Priority to CN202311215403.3A priority Critical patent/CN117294945A/en
Publication of CN117294945A publication Critical patent/CN117294945A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to the field of video conferences, and particularly discloses an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera, which comprises the steps of establishing a virtual model of a conference table and the guide rail camera, and marking the position coordinates of a conference seat on the virtual model; recognizing conference seats of speakers, controlling a moving end to move to a guide rail position closest to corresponding position coordinates of the conference seats, and controlling a steering mechanism to enable a camera to always face the position coordinates; recognizing a human face in the camera and a plane thereof by using a human face detection algorithm; calculating the center coordinate of the face, and controlling the steering mechanism to drive the camera to face the center coordinate; the control guide rail movement assembly drives the camera to move to be perpendicular to the plane. The invention saves the cost of the camera equipment, can move to face any one participant, and can automatically track the faces of the participants to keep the faces in the center of the image, thereby improving the shooting effect and the participant experience of the video conference.

Description

Intelligent conference method capable of automatically aligning face of speaker through guide rail camera
Technical Field
The invention relates to the field of video conferences, in particular to an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera.
Background
With the development of network technology, feasibility is provided for real-time video communication and video conference, and related video conference equipment is gradually perfected, and development is advanced to equipment and technology aspects of higher resolution, lower transmission delay and the like. In order to improve the video quality of the video conference at present, the adopted professional image pickup equipment has higher and higher pixel and frame number, and the equipment cost is also higher and higher.
In the existing video conference system, the position of the camera is usually fixed, so that the shooting angle is fixed, the camera does not have the function of automatically adjusting the azimuth, and the figure slightly moves or is positioned at the center of the image, so that a conference speaker can only sit straight for a long time or stand at a position opposite to the camera in order not to deviate from the center of the image, the speaker is difficult to relax, and the burden of the conference is increased. Meanwhile, when the camera is not provided with at least one conference seat separately, part of speakers cannot face the camera when speaking on the seats. The panoramic camera in common use can only shoot the side surfaces of most participants, and the burden of the participants can be increased when the participants speak sideways.
In order to solve the above problems, the existing scheme adopts a multi-camera mode to shoot a speaker, and although a few visual angles are increased, a plurality of camera devices have extremely high cost, and the switching between images and the alignment position of the cameras are still inflexible. In addition, when a plurality of persons speak in turns on the conference table, if the number of cameras is insufficient, it is difficult to face all speakers, and the cost is too high to configure the imaging device for each speaker's seat separately.
Disclosure of Invention
In order to solve the problems that partial speakers of the existing video conference system cannot face towards the cameras when speaking on seats of the speakers, and the cost of arranging a plurality of cameras is too high, the invention provides an intelligent conference method for automatically aligning the faces of the speakers through guide rail cameras.
The invention provides an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera, wherein the guide rail camera comprises the following steps: the camera is rotationally arranged on the moving end through the steering mechanism;
an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera comprises the following steps:
establishing virtual models of a conference table and a guide rail camera, and marking position coordinates of a conference seat on the virtual models;
identifying conference seats of speakers, controlling the motion ends to move to the guide rail positions closest to the corresponding position coordinates of the conference seats, and simultaneously controlling the steering mechanism to enable the cameras to always face the position coordinates;
recognizing a face in the camera by using a face detection algorithm;
calculating the center coordinates of the face, and controlling a steering mechanism to drive the camera to face the center coordinates according to the position relation of the center coordinates relative to the camera;
recognizing the plane of a face in the camera by using a face detection algorithm;
and calculating a normal vector of the central coordinate of the face relative to the plane of the face, calculating a guide rail coordinate of the vertical plane of the normal vector and intersecting the guide rail, and controlling the guide rail motion assembly to drive the camera to move to the guide rail coordinate.
Preferably, the face detection algorithm is used for identifying the face in the camera, specifically:
connecting the cameras and capturing a video stream by using a python programming language and a library, and acquiring a plurality of frame images in the video stream;
for each acquired frame of image, detecting whether a face exists in the image or not through a Haar classifier;
if so, determining the position and the size of the face in the image by using a cascade classifier on the image;
otherwise, continuing to detect the next frame of image.
Preferably, the step of determining the position and the size of the face in the image by using a cascade classifier comprises the following specific implementation steps:
converting the image into a gray scale map: converting the image into a gray image using a cv2.cvtdcolor () function of OpenCV;
face detection using a Haar classifier: detecting the face position on the gray image using a Haar classifier by a detectMultiScale () function that returns a rectangular list containing detected face position information;
drawing a human face frame: for each detected face position, a rectangular box is drawn on the original image using the cv2.Rectangle () function of OpenCV as the position and size information of the face.
Preferably, the conference seat for identifying the speaker specifically includes:
a speaking key is respectively arranged on the conference table opposite to each conference seat;
when one of the talk buttons is triggered, the conference seat opposite to the talk button is identified as the conference seat where the speaker is located.
Preferably, the guide rail camera further comprises an omnidirectional pickup microphone, and the omnidirectional pickup microphone is fixedly connected with the moving end.
Preferably, the conference seat for identifying the speaker specifically includes:
positioning the sound source direction of a speaker relative to the motion end through the omnidirectional pickup microphone by adopting a sound source positioning algorithm;
and calculating the conference seat where the speaker is located through the direction of the sound source and the current position of the motion end in the virtual model.
Preferably, the conference seat for identifying the speaker specifically includes:
at least one panoramic camera is fixedly arranged in a conference room, and faces of all participants are shot through the panoramic camera;
marking the position coordinates of each panoramic camera in the virtual model;
recognizing the faces of all participants in the image shot by the panoramic camera through a face detection algorithm;
identifying the current speaker from the participants through a mouth shape identification intelligent algorithm;
and calculating the conference seat where the speaker is located through the position of the speaker in the shot image and the position coordinates of the panoramic camera in the virtual model.
Preferably, the method further comprises the steps of:
when the moving end is detected to be in a moving state, the image of the video conference is switched to the image shot by the panoramic camera.
Preferably, the face detection algorithm is used for recognizing the face in the camera, and the method further comprises the following steps:
when the images shot by the cameras are recognized to contain a plurality of faces, the faces corresponding to the speakers are calculated and judged according to the positions of the cameras in the virtual model and conference seats where the speakers are located.
Preferably, the method further comprises the steps of:
and recognizing the face recognition data of the speaker through a face recognition algorithm, inquiring the matched identity information through the face recognition data, and displaying the identity information in the video image.
The beneficial effects of the invention are as follows:
(1) Through establish the camera on the slide rail, slide on the conference table surface, the rotation of cooperation camera only needs a camera can make the camera just to arbitrary meeting person on the conference table to make the meeting person on every conference table homoenergetic just speak to the camera, practiced thrift the cost of the required camera equipment of multi-camera shooting again, also practiced thrift the manpower input of manual movement camera equipment shooting and the embarrassment of meeting person when facing the camera engineer.
(2) Through face identification algorithm, make the camera move health and face at the in-process that the speaker was spoken, anyway, homoenergetic control camera motion, the human face position of real-time tracking makes its moment keep in image center, very big improvement the practical experience of video conference of meeting person, make the speaker can relax the health simultaneously, needn't pay attention to constantly whether deviate from the orientation of camera, user experience is more comfortable, meeting visual image is more intelligent.
Preferably, the direction of the speaker on the conference table is automatically identified by the omnidirectional pickup microphone and the sound positioning algorithm, so that the camera is automatically opposite to the speaking participants.
Preferably, the camera is switched to the image of the panoramic camera in the moving process of the camera, and the camera is switched to the image of the camera after moving to the position opposite to the speaker, so that the phenomenon that the image shakes and other non-speaking participants are shot when the camera moves is avoided, and the effect of a video conference is inconsistent and dizziness is caused.
Drawings
The invention will be further described with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a method according to a first embodiment of the invention;
fig. 2 is a schematic view of a conference table according to a second embodiment of the present invention.
In the figure: 1. a conference table; 2. an omnidirectional pickup microphone; 3. a steering mechanism; 4. a camera; 5. and a guide rail movement assembly.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The face recognition algorithm is that after the face is detected and the key feature points of the face are located, the main face area can be cut out, and the main face area is fed into the recognition algorithm at the rear end after preprocessing. The face recognition algorithm is mature at present, and is a mature prior art, so that a plurality of existing modules can be directly called in the market, and various implementation modes can be achieved, and the face detection algorithm of the scheme needs to detect the plane and the center point of the face, so that the following embodiments are not repeated.
Referring to fig. 1, as a first embodiment of the present invention, specifically, an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera is disclosed, where the guide rail camera includes: steering mechanism 3, camera 4 and guide rail motion subassembly 5, guide rail motion subassembly 5 include guide rail and motion end, and the guide rail is fixed on conference table 1, and the motion end links to each other with the guide rail is sliding, and camera 4 rotates through steering mechanism 3 and locates on the motion end.
Meanwhile, the intelligent conference all-in-one machine of the scheme further comprises display equipment and sound equipment, wherein the display equipment is arranged in a conference room and can be a display screen, a desktop display or a projector.
The intelligent conference method for automatically aligning the face of the speaker through the guide rail camera comprises the following steps:
s1, establishing virtual models (a plane model or a three-dimensional model) of a conference table 1 and a guide rail camera, and marking position coordinates of a conference seat on the virtual models;
s2, identifying conference seats of speakers, controlling a moving end to move to a guide rail position closest to corresponding position coordinates of the conference seats, and simultaneously controlling a steering mechanism 3 to enable a camera 4 to always face the position coordinates;
s3, recognizing a human face in the camera 4 by using a human face detection algorithm;
s4, calculating the center coordinates of the face, calculating the position relation of the center coordinates relative to the camera 4, and controlling the steering mechanism 3 to drive the camera 4 to face the center coordinates according to the position relation;
s5, recognizing a plane where a face in the camera 4 is located by using a face detection algorithm;
s6, calculating a normal vector of the center coordinate of the face relative to the plane of the face, calculating a guide rail coordinate of the vertical plane of the vector and the guide rail, and controlling the guide rail motion assembly 5 to drive the camera 4 to move to the guide rail coordinate.
Preferably, in step S2 of this embodiment, the conference seat of the speaker is identified, specifically:
s211, setting a speaking key on each conference table 1 opposite to each conference seat;
and S212, when one of the speaking keys is triggered, identifying the conference seat opposite to the speaking key as the conference seat where the speaker is located.
Preferably, step S3 of this embodiment specifically includes the following sub-steps:
s31, connecting a camera 4 and capturing a video stream by using a python programming language and a library (such as OpenCV), and acquiring a plurality of frames (each frame or interval sampling) images in the video stream;
s32, for each acquired frame of image, detecting whether a face exists in the image through a Haar classifier;
s33, if so, determining the position and the size of a face in the image by using a cascade classifier on the image;
and S34, if not, continuing to detect the next frame of image.
Preferably, in step S33, a cascade classifier is used on the image to determine the face position and size in the image, and the specific implementation steps are as follows:
s331, converting the image into a gray scale map: converting the image into a gray image using a cv2.cvtdcolor () function of OpenCV;
s332, performing face detection by using a Haar classifier: detecting the face position on the gray image by using a pre-trained Haar classifier through a detectMultiScale () function, the detectMultiScale () function returning a rectangular list containing detected face position information;
s333, drawing a face frame: for each detected face position (represented by (x, y, w, h)), a rectangular box is drawn on the original image using the cv2.Rectangle () function of OpenCV as the position and size information of the face.
An example of a code for the above substeps is as follows:
import cv2
library of import camera control library
# loading Haar classifier
face_cascade=cv2.CascadeClassifier('haarcascade frontalface default.xml')
# initializing camera and camera 4 control
camera=cv2.VideoCapture(0)
camera_control=camera_control_library
while True:
ret,frame=camera.read()
gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
Face detection using Haar classifier #
faces=face_cascade.detectMultiScale(gray,scaleFactor=1.1,minNeighbors=5)
Continuous calculation and control camera
for(x,y,w,h)in faces:
# calculate the offset of face center and screen center
face_center_x=x+w//2
screen_center_x=frame.shape[1]//2
offset=face_center_x-screen_center_x
Steering camera 4 to keep face centered
camera_control.adjust_position(offset)
cv2.imshow('Face Detection',frame)
if cv2.waitKey(1)&0xFF==ord('q'):
break
camera.release()
cv2.destroyAllWindows()
According to the embodiment, the camera 4 can control the camera 4 to move regardless of the movement of the body and the face in the speaking process of the speaker through the face recognition algorithm, the face position of the human body is tracked in real time, so that the video conference practical experience of the participants is greatly improved, the speaker can relax, the speaker does not need to pay attention to the direction deviating from the camera 4 or not, the user experience is more comfortable, the conference visual image is more intelligent, and the image acquisition of the speaker in the conference is clearer and accords with the user expectation.
Referring to fig. 2, as a second embodiment of the present invention, this embodiment differs from the first embodiment in that it includes: the steering mechanism 3, the camera 4, the guide rail movement assembly 5 and the control terminal;
the guide rail movement assembly 5 comprises a guide rail, a driving motor and a movement end, wherein the guide rail is fixedly connected with the upper surface of the conference table 1, the movement end is slidably connected with the guide rail, the driving motor is used for driving the movement end to move along the guide rail, and the driving motor is in transmission connection with the movement end;
the steering mechanism 3 comprises a steering base, a horizontal steering frame, a vertical steering frame, a horizontal driving motor and a vertical driving motor, wherein the horizontal driving motor is used for driving the horizontal steering frame to rotate along a vertical axis relative to the steering base, the vertical driving motor is used for driving the horizontal steering frame to rotate along the horizontal axis relative to the vertical steering frame, and the camera 4 is fixedly connected with the vertical steering frame;
the driving motor, the horizontal driving motor, the vertical driving motor and the camera 4 are respectively and electrically connected with the control terminal.
The control terminal of the present embodiment is a server for operation of the videoconferencing system, and controls movement and operation of the steering mechanism 3, the camera 4, and the rail movement assembly 5.
Preferably, the control terminal is connected to the camera 4 by wireless signal communication. For example, wifi signals or 5G signals, reduce the connection of wires, avoid the problem of signal transmission and output caused by poor contact and wire breakage after bending and abrasion of long wires which move frequently
Preferably, the guide rail is any one of a linear guide rail, a rolling guide rail or a sliding guide rail. In contrast, the moving end can be a component matched with the guide rail, such as a sliding block or a small rail car, and the moving mode is not limited to traction transmission or wheel transmission.
The guide rail camera of this embodiment includes: all-directional pickup microphone 2, steering mechanism 3, camera 4 and guide rail motion subassembly 5, guide rail motion subassembly 5 include guide rail and motion end, and the guide rail is fixed on conference table 1, and the motion end links to each other with the guide rail is sliding, and all-directional pickup microphone 2 links to each other with the motion end is fixed, and camera 4 rotates through steering mechanism 3 and locates on the all-directional pickup microphone 2.
The omnidirectional pickup microphone 2 of this embodiment is a columnar structure, the steering mechanism 3 is divided into horizontal 360 ° steering and 30 ° pitch angle rotation in the vertical direction, and is driven by two independent servo motors, stepping motors or gear motors, and the rotation angle is feedback controlled by the included angle between the image of the camera 4 and the recognized face position.
The guide rail motion assembly 5 of this embodiment further includes haulage rope and quiet pulley, and driving motor links to each other with the one end of guide rail is fixed, and driving motor's pivot is fixed to be equipped with the main pulley, and quiet pulley links to each other with the other end rotation of guide rail, and the haulage rope encircles on main pulley and quiet pulley, and the motion end links to each other with the haulage rope is fixed.
In step S2 of this embodiment, the conference seat of the speaker is identified, and the specific implementation steps are as follows:
s221, positioning the sound source direction of a speaker relative to a moving end through an omnidirectional pickup microphone 2 by adopting a sound source positioning algorithm;
s222, calculating the conference seat where the speaker is located through the direction of the sound source and the position of the current moving end in the virtual model.
According to the method, the device and the system, the azimuth and the position information of the speaker are primarily judged through sound, the function of automatically identifying and moving to the front of the speaker can be achieved, and the conference system is more intelligent.
In step S3 of the present embodiment, the face detection algorithm is used to identify the face in the camera 4, and the steps are as follows:
and S311, when the images shot by the cameras 4 are recognized to contain a plurality of faces, calculating and judging the faces corresponding to the speakers according to the positions of the cameras 4 in the virtual model and conference seats of the speakers.
So as to prevent the camera 4 from misfacing other conference participants in the back row when more conference participants exist. The speaker correction process should run in real time or lock on the facial features of the current speaker after a confirmation.
The intelligent conference method for automatically aligning the face of the speaker through the guide rail camera in the embodiment further comprises the following steps:
s7, recognizing the face recognition data of the speaker through a face recognition algorithm, inquiring the matched identity information through the face recognition data, and displaying the identity information in the video image.
According to the embodiment, the camera 4 and the omnidirectional pickup microphone 2 are arranged on the sliding rail and slide on the surface of the conference table 1, and the camera 4 can be opposite to any participant on the conference table 1 only by means of the rotation of the camera 4, and the omnidirectional pickup microphone 2 is closer to a speaker, so that the participant on each conference table 1 can speak just opposite to the camera 4, the cost of camera equipment required by multi-camera shooting is saved, and the labor investment for manual moving of the camera equipment and the embarrassing sense when the participant faces a camera operator are also saved; under the same video conference system budget, the scheme can use the total price of a plurality of camera devices of the multi-camera video conference system for purchasing a camera device with higher configuration, thereby improving the image quality of the video conference.
The following is a third embodiment of the present solution, which is different from the first embodiment in that at least one panoramic camera is further provided in a conference room or a conference site.
The guide rail motion assembly 5 of this embodiment further includes a synchronous belt and an idler, wherein the driving motor is fixedly connected with one end of the guide rail, the synchronous wheel is fixedly arranged on the rotating shaft of the driving motor, the idler is rotationally connected with the other end of the guide rail, the synchronous belt is sleeved on the synchronous wheel and the idler, and the motion end is fixedly connected with the synchronous belt.
The upper surface of the conference table 1 of this embodiment is fixedly provided with a plurality of display screens, and the plurality of display screens are all used for displaying the picture of the intelligent conference.
In step S2 of this embodiment, the conference seat of the speaker is identified, and the specific implementation steps are as follows:
s231, at least one panoramic camera is fixedly arranged in the conference room, and faces of all participants are shot through the panoramic camera;
s232, marking the position coordinates of each panoramic camera in the virtual model;
s233, recognizing the faces of all participants in the image shot by the panoramic camera through a face detection algorithm;
s234, identifying the current speaker from the participants through a mouth shape identification intelligent algorithm;
s235, calculating a conference seat where the speaker is located according to the position of the speaker in the shot image and the position coordinates of the panoramic camera in the virtual model.
When the moving end is detected to be in a moving state, the embodiment switches the image of the video conference to the image shot by the panoramic camera. The camera 4 is switched to the image of the panoramic camera in the moving process of the camera 4, and the image of the camera 4 is switched after the camera 4 moves to the position opposite to the speaker, so that the phenomenon that the image shakes and other non-speaking participants are shot when the camera 4 moves is avoided, and the effect of a video conference is inconsistent and dizziness is caused.
It should be noted that the embodiments of the apparatus and device described above are only schematic, where the units described as separate units may or may not be physically separated, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Claims (10)

1. An intelligent conference method for automatically aligning the face of a speaker through a guide rail camera, wherein the guide rail camera comprises: the camera is rotationally arranged on the moving end through the steering mechanism;
an intelligent conference method for automatically aligning the face of a speaker through a guide rail camera comprises the following steps:
establishing virtual models of a conference table and a guide rail camera, and marking position coordinates of a conference seat on the virtual models;
identifying conference seats of speakers, controlling the motion ends to move to the guide rail positions closest to the corresponding position coordinates of the conference seats, and simultaneously controlling the steering mechanism to enable the cameras to always face the position coordinates;
recognizing a face in the camera by using a face detection algorithm;
calculating the center coordinates of the face, calculating the position relation of the center coordinates relative to the camera, and controlling a steering mechanism to drive the camera to face the center coordinates according to the position relation;
recognizing the plane of a face in the camera by using a face detection algorithm;
and calculating a normal vector of the central coordinate of the face relative to the plane of the face, calculating a guide rail coordinate of the vertical plane of the normal vector and intersecting the guide rail, and controlling the guide rail motion assembly to drive the camera to move to the guide rail coordinate.
2. The intelligent conference method for automatically aligning the face of a speaker through a guide rail camera according to claim 1, wherein the face detection algorithm is used for recognizing the face in the camera, specifically:
connecting the cameras and capturing a video stream by using a python programming language and a library, and acquiring a plurality of frame images in the video stream;
for each acquired frame of image, detecting whether a face exists in the image or not through a Haar classifier;
if so, determining the position and the size of the face in the image by using a cascade classifier on the image;
otherwise, continuing to detect the next frame of image.
3. The intelligent conference method for automatically aligning the face of a speaker through a guide rail camera according to claim 2, wherein the step of determining the position and the size of the face in the image by using a cascade classifier on the image is specifically implemented as follows:
converting the image into a gray scale map: converting the image into a gray image using a cv2.cvtdcolor () function of OpenCV;
face detection using a Haar classifier: detecting the face position on the gray image using a Haar classifier by a detectMultiScale () function that returns a rectangular list containing detected face position information;
drawing a human face frame: for each detected face position, a rectangular box is drawn on the original image using the cv2.Rectangle () function of OpenCV as the position and size information of the face.
4. The intelligent conference method for automatically aligning the face of a speaker through a guide rail camera according to claim 1, wherein the conference seat for identifying the speaker is specifically:
a speaking key is respectively arranged on the conference table opposite to each conference seat;
when one of the talk buttons is triggered, the conference seat opposite to the talk button is identified as the conference seat where the speaker is located.
5. The intelligent conference method according to claim 1, wherein said guideway camera further comprises an omnidirectional pickup microphone, said omnidirectional pickup microphone being fixedly coupled to said moving end.
6. The intelligent conference method for automatically aligning the face of a speaker through a guide rail camera according to claim 5, wherein the conference seat for identifying the speaker is specifically:
positioning the sound source direction of a speaker relative to the motion end through the omnidirectional pickup microphone by adopting a sound source positioning algorithm;
and calculating the conference seat where the speaker is located through the direction of the sound source and the current position of the motion end in the virtual model.
7. The intelligent conference method for automatically aligning the face of a speaker through a guide rail camera according to claim 1, wherein the conference seat for identifying the speaker is specifically:
at least one panoramic camera is fixedly arranged in a conference room, and faces of all participants are shot through the panoramic camera;
marking the position coordinates of each panoramic camera in the virtual model;
recognizing the faces of all participants in the image shot by the panoramic camera through a face detection algorithm;
identifying the current speaker from the participants through a mouth shape identification intelligent algorithm;
and calculating the conference seat where the speaker is located through the position of the speaker in the shot image and the position coordinates of the panoramic camera in the virtual model.
8. The intelligent conference method for automatically aligning the face of a speaker through a guideway camera according to claim 1, further comprising the steps of:
at least one panoramic camera is fixedly arranged in the conference room;
when the moving end is detected to be in a moving state, the image of the video conference is switched to the image shot by the panoramic camera.
9. The intelligent conference method for automatically aligning the face of a speaker through a guideway camera according to claim 1, wherein the face recognition algorithm is used to recognize the face in the camera, further comprising the sub-steps of:
when the images shot by the cameras are recognized to contain a plurality of faces, the faces corresponding to the speakers are calculated and judged according to the positions of the cameras in the virtual model and conference seats where the speakers are located.
10. The intelligent conference method for automatically aligning the face of a speaker through a guideway camera according to claim 1, further comprising the steps of:
and recognizing the face recognition data of the speaker through a face recognition algorithm, inquiring the matched identity information through the face recognition data, and displaying the identity information in the video image.
CN202311215403.3A 2023-09-19 2023-09-19 Intelligent conference method capable of automatically aligning face of speaker through guide rail camera Pending CN117294945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311215403.3A CN117294945A (en) 2023-09-19 2023-09-19 Intelligent conference method capable of automatically aligning face of speaker through guide rail camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311215403.3A CN117294945A (en) 2023-09-19 2023-09-19 Intelligent conference method capable of automatically aligning face of speaker through guide rail camera

Publications (1)

Publication Number Publication Date
CN117294945A true CN117294945A (en) 2023-12-26

Family

ID=89243718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311215403.3A Pending CN117294945A (en) 2023-09-19 2023-09-19 Intelligent conference method capable of automatically aligning face of speaker through guide rail camera

Country Status (1)

Country Link
CN (1) CN117294945A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143313A (en) * 2010-02-02 2011-08-03 鸿富锦精密工业(深圳)有限公司 Camera control system and method and adjusting device with control system
JP2014165565A (en) * 2013-02-22 2014-09-08 Hitachi Ltd Television conference device, system and method
CN210469530U (en) * 2019-09-12 2020-05-05 南京深视光点科技有限公司 Audio and image tracking system for speaking person
CN111191609A (en) * 2019-12-31 2020-05-22 上海能塔智能科技有限公司 Face emotion recognition method and device, electronic equipment and storage medium
WO2020220546A1 (en) * 2019-04-30 2020-11-05 平安科技(深圳)有限公司 Facial recognition-based meeting management method, system, and readable storage medium
CN113140223A (en) * 2021-03-02 2021-07-20 广州朗国电子科技有限公司 Conference voice data processing method, device and storage medium
US20210274129A1 (en) * 2018-06-18 2021-09-02 Eyecon As Video Conferencing System
CN113473066A (en) * 2021-05-10 2021-10-01 上海明我信息技术有限公司 Video conference picture adjusting method
CN114449202A (en) * 2021-12-31 2022-05-06 江铃汽车股份有限公司 Video conference camera control method, device and system
CN115988164A (en) * 2022-12-03 2023-04-18 北京视通科技有限公司 Conference room multimedia control method, system and computer equipment
CN116614598A (en) * 2023-04-20 2023-08-18 北京视通科技有限公司 Video conference picture adjusting method, device, electronic equipment and medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143313A (en) * 2010-02-02 2011-08-03 鸿富锦精密工业(深圳)有限公司 Camera control system and method and adjusting device with control system
JP2014165565A (en) * 2013-02-22 2014-09-08 Hitachi Ltd Television conference device, system and method
US20210274129A1 (en) * 2018-06-18 2021-09-02 Eyecon As Video Conferencing System
WO2020220546A1 (en) * 2019-04-30 2020-11-05 平安科技(深圳)有限公司 Facial recognition-based meeting management method, system, and readable storage medium
CN210469530U (en) * 2019-09-12 2020-05-05 南京深视光点科技有限公司 Audio and image tracking system for speaking person
CN111191609A (en) * 2019-12-31 2020-05-22 上海能塔智能科技有限公司 Face emotion recognition method and device, electronic equipment and storage medium
CN113140223A (en) * 2021-03-02 2021-07-20 广州朗国电子科技有限公司 Conference voice data processing method, device and storage medium
CN113473066A (en) * 2021-05-10 2021-10-01 上海明我信息技术有限公司 Video conference picture adjusting method
CN114449202A (en) * 2021-12-31 2022-05-06 江铃汽车股份有限公司 Video conference camera control method, device and system
CN115988164A (en) * 2022-12-03 2023-04-18 北京视通科技有限公司 Conference room multimedia control method, system and computer equipment
CN116614598A (en) * 2023-04-20 2023-08-18 北京视通科技有限公司 Video conference picture adjusting method, device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN100556079C (en) Camera-control equipment, camera chain, electronic meeting system and video camera control method
CA2157613C (en) Video conference system and method of providing parallax correction and a sense of presence
US5438357A (en) Image manipulating teleconferencing system
CN100525433C (en) Camera controller and teleconferencing system
EP3855731A1 (en) Context based target framing in a teleconferencing environment
US6208373B1 (en) Method and apparatus for enabling a videoconferencing participant to appear focused on camera to corresponding users
CN1457468A (en) Automatic positioning of display depending upon viewer's location
US11803984B2 (en) Optimal view selection in a teleconferencing system with cascaded cameras
CN110324554B (en) Video communication apparatus and method
US20210235024A1 (en) Detecting and tracking a subject of interest in a teleconference
JP2004118314A (en) Utterer detection system and video conference system using same
CN221202635U (en) Intelligent conference all-in-one machine with guide rail cameras
CN117294945A (en) Intelligent conference method capable of automatically aligning face of speaker through guide rail camera
US20220319034A1 (en) Head Pose Estimation in a Multi-Camera Teleconferencing System
JP2737682B2 (en) Video conference system
US11496675B2 (en) Region of interest based adjustment of camera parameters in a teleconferencing environment
JP2003503910A (en) Real-time tracking of objects of interest using hybrid optical and virtual zooming mechanisms
KR100264035B1 (en) Method and device of perceiving a voice in a monitor
CN113163148A (en) Method for automatically capturing and tracking speaker
US20230306618A1 (en) Tracking with Multiple Cameras
KR20220057438A (en) Tracking method for subject face displayed on display area of smart device
CN117714843A (en) Multi-camera view field switching method and system
JPH08317363A (en) Image transmitter
JP2004193661A (en) Two-way communication system, video communication apparatus, and photographing processing control method for video communication apparatus
CN116458151A (en) Image capture system and method for generating eye contact image views of humans

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination