CN118042065A

CN118042065A - Campus audio and video phone system based on face recognition

Info

Publication number: CN118042065A
Application number: CN202410444570.3A
Authority: CN
Inventors: 周俊夫; 谭陈勇; 姜新波
Original assignee: Shenzhen Qingju Intelligent Technology Co ltd
Current assignee: Shenzhen Qingju Intelligent Technology Co ltd
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-05-14

Abstract

The invention relates to the technical field of face recognition and audio-video conversation, in particular to a campus telephone audio-video system based on face recognition.

Description

Campus audio and video phone system based on face recognition

Technical Field

The invention relates to the field of face recognition and audio-video conversation, in particular to a campus audio-video phone system based on face recognition.

Background

The face recognition technology is a computer technology for carrying out identity recognition based on facial features of people. It first judges whether there is a face by analyzing and comparing the inputted face image or video stream, and further gives the position, size and position information of each main face organ. According to the information, the technology further extracts the identity characteristics of each face and compares the identity characteristics with a known face database so as to identify the identity of each face. Although the face recognition technology brings convenience, the face recognition technology has some defects in practical application, including low technical safety, unstable recognition rate, limited capability of coping with complex environments and the like.

The audio and video technology refers to technologies related to audio and video acquisition, coding, transmission, processing, playing and the like, and in the field of communication, audio and video communication is a commonly used mode, including internet telephone, video conference, video chat application and the like. The audio and video technology has made remarkable progress in the continuous development, but in practical application, some disadvantages still exist, including the problems that the transmission quality is difficult to guarantee, the technical threshold is high, the equipment adaptation is to be improved, and the like.

In view of this, it is necessary to design a campus audio/video phone system based on face recognition to solve the above technical problems.

Disclosure of Invention

The invention aims to solve the problems of the background technology and provides a campus audio/video phone system based on face recognition.

The aim of the invention can be achieved by the following technical scheme:

The campus telephone audio and video system based on face recognition comprises a learning module, a database, a central server, a backup server and n call terminals, wherein n is the number of the call terminals, and n=1, 2, … and m; m is the total number of call terminals; the learning module, the database and the central server form a central subsystem, and the central subsystem and the n call terminals are connected in a bidirectional information manner to realize a communication function.

Furthermore, a learning module in the central subsystem is responsible for analyzing and learning the credentials and the face feature shots, and sending the learning result to the database in the form of convolution kernels and weights;

Firstly, a training resource grouping learning module firstly acquires rosters and credentials of all school personnel from a school personnel management platform, acquires face feature screenshots in 1000 videos of the personnel from a backup server, and groups the face feature screenshots; dividing the name, credentials, 1000 face close-up shots and contacts of each person into a group, and marking the group i as a person number, wherein i=1, 2 … x; x is the total number of school personnel;

the learning module randomly extracts 600 out of 1000 face close-up screenshots of each person, and the number of the 600 is the picture ~This set of pictures is referred to as a training set; 300 out of the remaining 400 face feature shots are randomly extracted, and the number is picture/>~/>This group of pictures is called a validation set; the last 100 face feature shots number is picture/>~/>This set of pictures is called a test set.

The learning module compresses all face feature screenshots into face feature pictures with 295px width and 413px height through an image processing algorithm, and performs cutting, amplifying, rotating and graying processing to obtain front black-and-white pictures of the personnel; px denotes a pixel; the preprocessed certificate is stored in a two-dimensional bitmap form; the gray G of each pixel point is analyzed through an image processing algorithm, each pixel point corresponds to a vector (x, y, G), x is the abscissa of the pixel point, y is the ordinate of the pixel point, x is more than or equal to 0 and less than or equal to 295,0 and less than or equal to 413, G is the gray of the pixel point, the numerical range of G is 0 to 100, the gray of a black pixel is 100, and the gray of a white pixel is 0.

Secondly, performing face recognition training, and extracting pictures in a training set of personnelThe learning module randomly presets two training verification elements and weights, and the initial training verification elements comprise distances and vector angles.

Presetting training verification elements: Randomly extracting two pixel points A (xa, ya, ga) and B (xb, yb, ga) with the same gray scale, and obtaining the pixel point by the formula/>Obtain/>Is to preset a training verification elementCorresponding weight factor/>100;

presetting training verification elements By the formula/>Obtain/>Preset training verification element/>Corresponding weight factor/>100.

By the formulaCalculating the judgment factor/>；

Drawing pictures in a training set of personnelIn picture/>Two pixel points C (xc, yc, ga) and D (xd, yd, ga) with gray scale Ga are randomly extracted, and the formula/>Calculating to obtain verification element/>Is a value of (2);

By the formula Calculating to obtain verification element/>Is a value of (2);

By the formula Calculating the judgment factor/>；

When (when)If so, go to the third step, otherwise return to the second step,/>And the learning judgment factor threshold value is preset.

Third step, verifying, randomly extracting a picture from the verification setIn picture/>Two pixel points E (xe, ye, ge), F (xf, yf, gf), where xe=/>，ye=/>，xf=/>，yf=/>By the formula/>，/>Obtain verification element/>And/>By the formula/>Obtaining the judgment factor/>When/>And if not, the step four is carried out, and if not, the step five is carried out.

Fourth step: random evolution, random adjustment of verification elements and weights, and the specific process is as follows: generating a random integer between 1 and 4M, and recording the random integer as an evolution parameter a, wherein M is the number of verification elements; a = multiple of 2, increased by 1; a = multiple of 3, 1 decrease; a = multiple of 5, increased by 1; a = multiple of 7, by 1; and so on; after the completion, the vectors of the pixel points A, B, C and D are saved, and the second step is returned;

Fifth step: naming a convolution kernel, wherein the convolution kernel is a training result, and refers to a method for extracting verification elements from an image in the second step, and the method is specifically expressed as follows: named convolution kernel one is: positioning two pixel points G (xg, yg, gg) and pixel points H (xh, yh, gh) in the new image;

Wherein the method comprises the steps of ，/>，/>，/>By the formula，/>Obtain/>And/>By the formulaObtaining the judgment factor/>

The use of convolution kernels is the use of this method to calculate the verification elements. And the like, generating a calculation method for obtaining innumerable verification elements through infinite loops from the first step to the fourth step, and correspondingly naming innumerable convolution kernels to obtain a convolution kernel I, a convolution kernel II … and a convolution kernel u, wherein u is the number of the convolution kernels.

Sixth step: testing, processing the images by using all convolution check to obtain a judgment factor、/>，…，/>; By the formula/>Obtaining similarity parameter/>；

Randomly extracting pictures from a test setProcessing the images by using all convolution check to obtain a test judgment factor/>、/>，/>; By the formula/>Obtaining test similarity parameter/>；

By the formulaObtain test accuracy/>Wherein p is a preset weight factor;

When (when) If the number of the face recognition training is more than or equal to 90%, the face recognition training is started for the next person, and when/>, the face recognition training is started for the next person by considering that the test is successful, outputting all convolution kernels and weights, deleting training sets of the persons, checking a set test set, enabling i to be equal to i+1, returning to the first stepAnd when the number of the face recognition training is less than 90%, the face recognition training is considered as test failure, the person is maintained unchanged, and the face recognition training is restarted after the first step.

Further, a database in the central subsystem is responsible for storing face recognition learning results, namely convolution kernels and weights, and storing school relation networks of personnel i;

The database gives different authorization grades according to the category of the personnel, and if the personnel are school and school, gives an authorization grade S; if the personnel are a primary office, a courtyard or a secondary courtyard of the school, an authorization level A is given; if the personnel is a senior principal or a subordinate or a counselor or professor, the authorization level B is given; if the personnel is a staff owner or a lecturer or a school worker or a common teaching staff, the authorization level C is given; if the personnel are ordinary students or other service personnel, the authorization level D is given. The database builds a relation network according to the category relation, formulates a relation tree, and stores the relation tree in the database so as to be convenient to call at any time in audio and video call.

The database stores the convolution kernel and the weight corresponding to the person i so as to be called at any time for face recognition operation analysis;

Further, the central server in the central subsystem is responsible for receiving the communication initiation signal of the call terminal, and will call the corresponding face learning result and school relation network from the database, and then send out the communication receipt signal to the call terminal.

After the central server receives a communication initiation signal sent by the first call terminal, firstly shooting the front feature of a person of the first call terminal through a camera to perform analysis operation, if the front feature cannot pass the face recognition operation, sending an identification request to the first call terminal, requiring a user to input a school person number, an identification card and a name after the first call terminal receives the identification request, then performing face recognition operation on the front feature of the person and a certificate in a database by using a convolution kernel and weight operation, marking the person i as a first user every other after the test accuracy is more than or equal to 60% and the school person numbers are corresponding to no errors, sending an acquisition instruction to a standby processor, shooting the face feature of the person i in a fixed time in the subsequent call process, and sending all pictures to an operation module to perform training when the whole 1000 pictures are shot.

If the analysis operation is performed, the person corresponding to the first call terminal is assumed to be the first person through face recognition operation, the authorization level and the relation tree of the first person are called in the database, the relation tree is fed back to the screen of the first call terminal, the object of video and audio is displayed, after the first person confirms that the video and audio request is sent to the second person in the relation tree, the central server sends a communication receiving signal to the second call terminal corresponding to the second person, and the number of the communication currently born by the central server is increased by 1. Communication broadband that the central server can afford isThe number of currently assumed communications is/>When/>When the backup server starts to receive a new communication initiation signal, the central server is helped to share the communication pressure;

the central server retrieves a historical call duration corresponding to the call terminal in the database and a permission level of a person to whom the call terminal belongs, and each call duration corresponds to a preset resource retrieval parameter I Each authority level corresponds to a resource retrieval parameter two/>Using the formula/>Confirm resource fetch parameter/>When/>Greater than a preset resource retrieval parameter threshold/>When the call is configured with 10% of server bandwidth resources of the central server and a backup server, so as to ensure good call quality; when/>Less than a preset resource retrieval parameter threshold/>When the call is configured with only 5% of the server bandwidth resources of the central server.

After receiving the confirmation signal, the central server uses bandwidth resources to send the audio information and the video information of the first call terminal to the second call terminal;

Further, the backup server in the central subsystem is responsible for collecting video clips to serve as face recognition learning data of the learning module, and takes responsibility for receiving communication initiation signals and sending communication reception signals when the central server runs under high load.

When the backup server receives the acquisition instruction, positioning a call terminal 3 pointed by the acquisition instruction, intercepting a face close-up screenshot in the call terminal 3 every preset time in the call process, marking the face close-up screenshot as a first face close-up screenshot, and sending the face close-up screenshot to the learning module; when the backup server receives a communication initiation signal sent by the first call terminal, the front close-up of a person of the first call terminal is shot by a camera, analysis and operation are carried out, the person corresponding to the first call terminal is judged to be the first call terminal, the authorization level and the relation tree of the first call terminal are called in a database, the relation tree is fed back to a screen of the first call terminal, an object of video and audio call is displayed, and after the person A confirms that a video and audio request is sent to the person B in the relation tree, the backup server sends a communication receiving signal to the second call terminal corresponding to the second call terminal.

Further, the call terminal is responsible for taking responsibility of call initiation, participation and termination, and comprises a camera, a video output device, an audio output device and a touchable display screen, and after receiving a communication receiving signal, the call terminal displays a signal source A through the touchable display screen and inquires whether a contact person accepts or not. And after the contact B clicks the confirmation button, a confirmation signal is sent to the central server and the backup server.

The personnel with the authorization level S obtains authorization through face recognition by using any call terminal, the authorized call terminal can send a broadcast signal to the central server, a section of custom character is input, and the central server sends the custom character to all call terminals after receiving the broadcast signal. After the person with the authorization level A, B, C and the D level uses the call terminal again and face recognition is performed, the call terminal will lose the S authorization level.

The beneficial effects are that:

(1) According to the invention, iterative operation is carried out through the evolution algorithm of the learning module, the face recognition training result is stored in the form of computer data, the error rate of face recognition can be reduced to a great extent, the picture removing of face recognition is ensured to the maximum extent, and the personal information safety is ensured.

(2) The database gives different authorization grades to different people according to the face recognition result, and calls the historical call duration in the database to predict the future call demand degree, and different bandwidth resources are allocated to different communicants, so that unnecessary bandwidth burden is reduced to a certain extent, call quality is improved, accuracy and high efficiency of face recognition are guaranteed to the greatest extent, reliability of a campus call system is guaranteed, and more comprehensive and intelligent monitoring service is provided.

(3) The backup server acquires face close-up screenshot of the first user in the video call process, so that the training material source of face recognition training is enlarged, and the backup server receives communication initiation signals instead of the central server when the central server is excessively loaded, thereby helping the central server to carry out load balancing, and avoiding call drop and signal delay caused by high concurrency of communication to a certain extent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present application, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

Fig. 1 is a functional block diagram of the present invention.

Fig. 2 is a flowchart of a face recognition algorithm of the learning module.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.

As shown in fig. 1: the campus telephone audio and video system based on face recognition comprises a learning module, a database, a central server, a backup server and n call terminals, wherein n is the number of the call terminals, and n=1, 2, … and m; m=total number of call terminals; the learning module, the database and the central server form a central subsystem, and the central subsystem and the n call terminals are connected in a bidirectional information manner so as to meet the communication function.

As shown in fig. 2, a learning module in the central subsystem is responsible for analyzing and learning the credentials and the face feature shots, and sending the learning result to the database in the form of convolution kernels and weights; the method comprises the following steps:

Firstly, training resource grouping, wherein a learning module firstly acquires rosters and credentials of all school personnel from a school personnel management platform, acquires face feature screenshots in 1000 videos of the personnel from a backup server, and performs grouping; dividing the name, credentials, 1000 face close-up shots and contacts of each person into a group, and marking the group i as a person number, wherein i=1, 2 … x; x is the total number of school personnel;

The learning module compresses all face feature screenshots into face feature pictures with 295px width and 413px height through an image processing algorithm, and performs cutting, amplifying, rotating and graying processing to obtain front black-and-white pictures of the personnel; px denotes a pixel; the preprocessed certificate is stored in a two-dimensional bitmap form; analyzing the gray level G of each pixel point through an image processing algorithm, wherein each pixel point corresponds to a vector (x, y, G), x is the abscissa of the pixel point, y is the ordinate of the pixel point, x is more than or equal to 0 and less than or equal to 295,0 and less than or equal to 413, G is the gray level of the pixel point, the numerical range of G is 0 to 100, the gray level of a black pixel is 100, and the gray level of a white pixel is 0;

Secondly, performing face recognition training, wherein the learning module extracts pictures in the training set of i The learning module randomly presets two training verification elements and weights, and the initial training verification elements comprise distances and vector angles.

By the formulaCalculating the judgment factor/>；

By the formula Calculating to obtain verification element/>Is a value of (2);

By the formula Calculating the judgment factor/>；

Third step, verifying, randomly extracting a picture from the verification setIn picture/>Two pixel points E (xe, ye, ge), F (xf, yf, gf), where xe=/>，ye=/>，xf=/>，yf=/>By the formula，/>Obtain verification element/>And/>By the formula/>Obtaining the judgment factor/>When/>And if not, the step four is carried out, and if not, the step five is carried out.

Fourth step: random evolution, random adjustment of verification elements and weights, and the specific process is as follows: generating a random integer between 1 and 4M, and recording the random integer as an evolution parameter a, wherein M is the number of verification elements; a = multiple of 2, increased by 1; a = multiple of 3, 1 decrease; a = multiple of 5, increased by 1; a = multiple of 7, by 1; and so on; after completion, the vectors for pixels A, B, C and D are saved and returned to the second step.

Wherein the method comprises the steps of ，/>，/>，/>By the formula，/>Obtain/>And/>By the formulaObtaining the judgment factor/>；

By the formulaObtain test accuracy/>Wherein p is a preset weight factor;

The database in the central subsystem is responsible for storing face recognition learning results, namely convolution kernels and weights, and storing school relation networks of personnel i.

The database gives different authorization grades according to the category of the personnel, and if the personnel are school or school, gives an authorization grade S; if the personnel are a primary office, a courtyard or a secondary courtyard of the school, an authorization level A is given; if the personnel is a senior principal or a subordinate or a counselor or professor, the authorization level B is given; if the personnel is a staff owner or a lecturer or a school worker or a common teaching staff, the authorization level C is given; if personnel assign an authorization level D to an ordinary student or other service personnel. The database builds a relation network according to the category relation, formulates a relation tree, and stores the relation tree in the database so as to be convenient to call at any time in audio and video call.

The central server in the central subsystem is responsible for receiving the communication initiation signal of the call terminal, and will call the corresponding face learning result and school relation network from the database, and then send out the communication receipt signal to the call terminal.

The backup server in the central subsystem is responsible for collecting video clips to serve as face recognition learning data of the learning module, and takes responsibility for receiving communication initiation signals and sending communication reception signals when the central server runs under high load.

The communication terminal is responsible for bearing the responsibility of call initiation, participation and termination, and consists of a camera, video output equipment, audio output equipment and a touchable display screen, and after receiving a communication receiving signal, the communication terminal displays a signal source A through the touchable display screen and inquires whether a contact person accepts or not. And after the contact B clicks the confirmation button, a confirmation signal is sent to the central server and the backup server.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The campus telephone audio and video system based on face recognition comprises a learning module, a database, a central server, a backup server and call terminals, wherein the learning module, the database, the central server and the backup server form a central subsystem, and the central subsystem and n call terminals are connected with each other through bidirectional information so as to realize a communication function; the method is characterized in that:

the learning module is responsible for analyzing and learning the credentials and the face feature screenshot, performing iterative operation training by using a depth algorithm, and transmitting a learning result to the database in the form of a convolution kernel and verification weight;

The database is responsible for storing the learning result of face recognition, the school relation network of personnel and the authorization level of personnel, immediately calling resources to perform face recognition operation when a call is established, endowing corresponding authorization level and providing a contact network before communication;

The central server is in charge of receiving a communication initiating signal of the call terminal, calling a corresponding learning result and a school relation network from the database, and then sending a communication receiving signal to the call terminal; allocating resources according to the authorization level and the current central server load condition, and judging whether a backup server is started or not;

The backup server is responsible for collecting video clips as face recognition learning data of the learning module and is responsible for receiving communication initiation signals and sending communication reception signals when the central server runs under high load.

2. The face recognition-based campus telephone audio-video system of claim 1, wherein: the call terminal is responsible for bearing the responsibility of call initiation, participation and termination, and consists of a camera, video output equipment, audio output equipment and a touchable display screen.

3. The face recognition-based campus telephone audio-video system of claim 1, wherein: the specific steps of the learning module responsible for analyzing and learning the certificate and the face close-up screenshot are as follows:

Firstly, acquiring rosters and credentials of all school personnel from a school personnel management platform, acquiring face close-up screenshots in a plurality of videos of the personnel from a backup server, and grouping; dividing the name, credentials, a plurality of face feature shots and contacts of each person into a group;

dividing a plurality of face feature screenshots of each person to obtain a training set, a verification set and a test set;

compressing all face feature screenshots through an image processing algorithm to obtain a face feature picture, and performing cutting, amplifying, rotating and graying treatment on the face feature screenshots to obtain a front black-and-white picture of a person; the preprocessed certificate is stored in a two-dimensional bitmap form; analyzing the gray scale of each pixel point through an image processing algorithm;

secondly, performing face recognition training, namely extracting a plurality of pictures from a training set of personnel, and randomly presetting two training verification elements and weights, wherein the initial training verification elements comprise distances and vector angles;

Randomly extracting and analyzing two groups of pixels with the same gray level to obtain the values of a first preset training verification element and a second preset training verification element, and calculating the first preset training verification element and the second preset training verification element to obtain a first judgment factor And judgment factor two/>; The number of each group of pixel points is two; when/>If so, go to the third step, otherwise return to the second step,/>A preset learning judgment factor threshold value;

Third, verifying, randomly extracting a picture from the verification set to locate two pixel points in the picture, and analyzing the two pixel points to obtain a judgment factor I And judgment factor two/>Obtaining a judgment factor three/>, by calculating the verification element I and the verification element IIWhen/>When the method is used, the fourth step is carried out, and otherwise, the fifth step is carried out;

Fourth step: randomly evolving, randomly adjusting the verification elements and the weights, generating a random integer according to the total number of the verification elements to be recorded as evolution parameters, and adjusting the weights according to the values of the evolution parameters; storing the vector of each pixel point, and returning to the second step;

Fifth step: named convolution kernel, which is embodied as: named convolution kernel one is: positioning two pixel points in the new image, analyzing to obtain a first verification element and a second verification element, and calculating to obtain a judgment factor;

sixth step: testing, namely processing images of the verification set and the test set by using all convolution checks to obtain corresponding judgment factors; analyzing the corresponding judgment factors to obtain a first similarity parameter and a second similarity parameter, and carrying out data fusion based on the first similarity parameter and the second similarity parameter to obtain a test accuracy;

when the test accuracy is greater than or equal to a preset threshold, outputting all convolution kernels and weights, deleting the training set of the personnel, then starting face recognition training of the next personnel, when the test accuracy is less than the preset threshold, maintaining the personnel unchanged, and returning to the first step to restart the face recognition training of the personnel.

4. The face recognition-based campus telephone audio-video system of claim 2, wherein: the database in the central subsystem is responsible for storing face recognition learning results and personnel school relation networks, stores all convolution kernels and weights, gives different authorization grades according to personnel types, builds the relation networks according to personnel relations, formulates relation trees and stores the relation trees in the database.

5. The face recognition-based campus telephone audio-video system of claim 4, wherein: the process of the central server receiving the communication initiation signal of the call terminal to realize the communication function is as follows:

When the central server receives a communication initiation signal sent by a call terminal I, acquiring a face close-up picture to perform face recognition analysis operation, if the communication initiation signal cannot pass, sending an identification request to the call terminal I, requiring a user to input a correct school personnel number, an identification card and a name, then performing face recognition operation according to credentials in a database, marking the personnel as a first user when the test accuracy is greater than a threshold value, shooting a plurality of face close-up shots of the personnel at fixed time intervals in the subsequent call process, and sending the face close-up shots to an operation module for training;

Before each communication is initiated, invoking the authorization level and the relation tree of a person in a database, feeding back the relation tree to a screen of a call terminal, displaying optional objects of video and audio, invoking historical call duration and authority level of the person in the database by a central server, wherein each call duration and authority level corresponds to a preset resource invoking parameter I and a resource invoking parameter II, obtaining the resource invoking parameter through calculation, and configuring bandwidth resources of the central server and a backup server for the call when the resource invoking parameter is larger than a preset threshold value; and when the resource calling parameter is smaller than a preset threshold value, configuring bandwidth resources of the central server for the call.

6. The face recognition-based campus telephone audio-video system of claim 5, wherein: the backup server is also responsible for collecting video clips as face recognition learning data of the learning module and assuming responsibility for receiving communication initiation signals and sending communication reception signals when the central server runs under high load.

7. The face recognition-based campus telephone audio-video system of claim 6, wherein: after receiving the communication receiving signal, the communication terminal displays the signal source through the touchable display screen and inquires whether the contact person accepts or not; after clicking the confirm button on the communication terminal, sending confirm signals to the central server and the backup server;

The method comprises the steps that the longest authority level is obtained by using any call terminal through face recognition by the aid of the lengthening and school, a broadcast signal can be sent to a central server, a section of custom characters are input, and the central server sends the custom characters to all call terminals after receiving the broadcast signal; after the other personnel use the call terminal again and face recognition is performed, the call terminal loses the highest authorization level.