CN114333072A - Data processing method and system based on conference image communication - Google Patents

Data processing method and system based on conference image communication Download PDF

Info

Publication number
CN114333072A
CN114333072A CN202210228974.XA CN202210228974A CN114333072A CN 114333072 A CN114333072 A CN 114333072A CN 202210228974 A CN202210228974 A CN 202210228974A CN 114333072 A CN114333072 A CN 114333072A
Authority
CN
China
Prior art keywords
conference
content
language
discussion
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210228974.XA
Other languages
Chinese (zh)
Other versions
CN114333072B (en
Inventor
安佳兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yunji Intelligent Information Co ltd
Original Assignee
Shenzhen Yunji Intelligent Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yunji Intelligent Information Co ltd filed Critical Shenzhen Yunji Intelligent Information Co ltd
Priority to CN202210228974.XA priority Critical patent/CN114333072B/en
Publication of CN114333072A publication Critical patent/CN114333072A/en
Application granted granted Critical
Publication of CN114333072B publication Critical patent/CN114333072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a data processing method and a data processing system based on conference image communication, which are applied to the field of image communication; acquiring the number of images of participants in the conference process; respectively collecting image information of conference people; the image information is specifically face information of a conference person; capturing the mouth-shaped motion of the conference person according to the image information; analyzing the mouth-shaped action to obtain the conference discussion content of the conference figure; translating the conference discussion content to acquire corresponding translation content; capturing conference discussion content and translation content by adopting a holographic projection technology, and respectively projecting the conference discussion content and the translation content to one side of a conference figure image corresponding to each conference figure in a conference site; according to the invention, the mouth-shaped action of the conference figure is captured in the conference image communication process, so that the problem that the conference content is disordered due to the interference on the voice command when the ambient noise is too high is effectively reduced.

Description

Data processing method and system based on conference image communication
Technical Field
The invention relates to the field of image communication, in particular to a data processing method and a data processing system based on conference image communication.
Background
With the development of science and technology, information can be exchanged between different users in various regions through a video conference, and the current video conference as a novel communication and communication tool breaks through the limitation of regions, so that more convenient, flexible and comprehensive transmission and service of audio and video signals can be provided, and the video conference is widely applied.
For example, application number "CN 201910260503.5" discloses a conference terminal and a conference system, which relate to the technical field of video conferences and mainly aim at displaying conference scenes in a panoramic manner without losing images of any conference participants in the process of carrying out a conference. The main technical scheme of the invention is as follows: a conference terminal, comprising: a fixed mount; the image acquisition devices are arranged on the outer surface of the fixing frame along the circumferential direction, each image acquisition device is used for acquiring a part of conference images, the plurality of part of conference images jointly form a whole conference image, and each image acquisition device is also used for outputting the part of conference images; the image processing device is connected with the plurality of image acquisition devices respectively, and is used for receiving the plurality of partial conference images and converting the plurality of partial conference images into a whole conference image, and the image processing device is also used for outputting the whole conference image.
However, in the video conference, when the ambient noise is too high, the image capturing device interferes with the voice command during the conference, so that it is difficult to correctly know the discussion content of the conference subject and other conference persons facing each conference person.
Disclosure of Invention
The invention aims to solve the problem that the voice command is interfered and the conference content is disturbed when the ambient noise is too high, and provides a data processing method and a data processing system based on conference image communication.
The invention adopts the following technical means for solving the technical problems:
the invention provides a data processing method based on conference image communication, which comprises the following steps:
acquiring the number of images of participants in the conference process;
respectively acquiring image information of the conference persons; the image information is specifically face information of a conference person;
capturing the mouth-shaped action of the conference person according to the image information;
analyzing the mouth-shaped action to obtain the conference discussion content of the conference figure;
translating the conference discussion content to acquire corresponding translation content;
and capturing the conference discussion content and the translation content by adopting a holographic projection technology, and respectively projecting the conference discussion content and the translation content to one side of a conference figure image corresponding to each conference figure in the conference site.
Further, the step of acquiring the image information of the conference persons respectively includes:
acquiring the characteristics of the conference persons according to the image information of the conference persons; the features include facial features;
comparing the characteristics of the conference persons with preset characteristics of the conference persons to match the identity information of the conference persons;
and identifying the conference people according to the identity information.
Further, the step of capturing the mouth-like motion of the conference person based on the image information includes:
adopting a preset high-definition camera to collect mouth images of conference persons;
analyzing the mouth image to obtain a three-dimensional motion track of the mouth of the conference person; the three-dimensional motion track is specifically a mouth motion track converted to an X, Y, Z axis by taking a mouth center as a base point;
performing inertial motion capture according to the three-dimensional motion trajectory to obtain habitual words of the conference figure; the inertial motion capture is specifically an inertial motion of mouth transformation during speaking;
and comparing the inertial motion with a preset inertial motion to obtain the speaking characteristics of the conference persons.
Further, the step of parsing the mouth-type action to obtain the conference discussion content of the conference character includes:
acquiring a language used when the conference character introduces by itself;
matching language types in a preset database according to the language to confirm the language types adopted by the corresponding conference characters;
listing the language type on one side of the conference character image;
and compiling the discussion content according to the language type to generate general translation content corresponding to the discussion content.
Further, the step of determining the language type adopted by the corresponding conference character according to the language type in the preset database matched with the language comprises:
acquiring vowel features corresponding to the language;
inputting one or more vowel features into a preset prediction model for prediction so as to predict languages corresponding to the vowel features;
judging whether the language type is a preset language type or not;
and if so, acquiring the language and the translation content corresponding to the language.
Further, after the step of acquiring the image information of the conference persons, respectively, the method includes:
acquiring facial expressions of conference characters;
matching the expression types in a preset database according to the facial expressions to confirm the emotional conditions of the corresponding conference characters;
and listing the emotional conditions on one side of the corresponding conference person image.
Further, after the step of parsing the mouth-type action to obtain the conference discussion content of the conference character, the method includes:
obtaining sentence gap time when meeting figures discuss;
judging whether the gap time is greater than a preset time period or not;
if yes, the statement is copied; the translation specifically is to translate the sentence again according to each vocabulary of the sentence;
judging whether the content of the sentence after being translated is the same as the content of the conference discussion or not;
and if not, projecting the translated sentence content and the conference discussion content to one side of the corresponding conference figure image.
Further, before the step of capturing the conference discussion content and the translation content by using a holographic projection technology and respectively projecting the conference discussion content and the translation content to one side of the image of the conference person corresponding to the current conference site of each conference person, the method comprises the following steps:
acquiring sounds related to other conference persons when the conference persons are in the discussion content of the current conference persons;
judging whether the sound is made by other people of the current conference;
if so, acquiring the sound and converting the sound into opinions published by other conference persons;
projecting the published opinions and the published meeting people to one side of the current discussion.
Further, the step of translating the conference discussion content to obtain corresponding translated content includes:
identifying the conference discussion content, and acquiring a content recombination statement after translation;
judging whether the recombined sentence is a normal translation sentence content; the content of the normal translation statement is specifically understandable content conforming to the subject of the conference;
if yes, the statement is an error-free statement; if not, the statement is a wrong statement, and the statement needs to be analyzed and translated again manually.
The invention also provides a data processing system based on conference image communication, which comprises:
the first acquisition module is used for acquiring the number of images of participants in the conference process;
the first acquisition module is used for respectively acquiring the image information of the conference persons; the image information is specifically face information of a conference person;
the first capturing module is used for capturing the mouth-shaped action of the conference person according to the image information;
the second acquisition module is used for analyzing the mouth-shaped action so as to acquire the conference discussion content of the conference figure;
the third obtaining module is used for translating the conference discussion content to obtain corresponding translation content;
and the first listing module is used for listing the conference discussion content and the translation content to one side of the corresponding conference figure image respectively.
The invention provides a data processing method and a data processing system based on conference image communication, which have the following beneficial effects:
the invention captures the mouth-shaped action of the conference figure in the conference image communication process, judges the identity and language information of the conference figure, translates the conference discussion content of the conference figure according to the language information matched with the system, and lists the conference discussion content below the image of the conference figure, thereby effectively reducing the problem that the conference content is disordered due to the interference of voice instructions when the ambient noise is too high.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of a data processing method based on conference image communication according to the present invention;
fig. 2 is a block diagram of an embodiment of a data processing system based on conference image communication according to the present invention.
Detailed Description
It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be considered as limiting thereof, since the objects, features and advantages thereof will be further described with reference to the accompanying drawings.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a data processing method based on conference image communication according to an embodiment of the present invention includes:
s1: acquiring the number of images of participants in the conference process;
s2: respectively acquiring image information of the conference persons; the image information is specifically face information of a conference person;
s3: capturing the mouth-shaped action of the conference person according to the image information;
s4: analyzing the mouth-shaped action to obtain the conference discussion content of the conference figure;
s5: translating the conference discussion content to acquire corresponding translation content;
s6: and capturing the conference discussion content and the translation content by adopting a holographic projection technology, and respectively projecting the conference discussion content and the translation content to one side of a conference figure image corresponding to each conference figure in the conference site.
In the embodiment, the image information of the conference persons is respectively collected; step S2, in which the image information is specifically the facial information of the conference person, includes obtaining the characteristics of the conference person according to the image information of the conference person, where the characteristics include facial characteristics, comparing the characteristics of the conference person with preset characteristics of the conference person to match the identity information of the conference person, and identifying the conference person according to the identity information; the step S3 of capturing the mouth shape of the conference person according to the image information includes acquiring a mouth image of the conference person by using a preset high-definition camera, analyzing the mouth image to obtain a three-dimensional motion trajectory of the mouth of the conference person, where the three-dimensional motion trajectory is specifically a mouth motion trajectory that is transformed to X, Y, Z axis with a center of the mouth as a base point, performing inertial motion capture according to the three-dimensional motion trajectory to obtain a habitual utterance of the conference person, the inertial motion capture is specifically an inertial motion of the mouth transformation during speaking, and comparing the inertial motion with the preset inertial motion to obtain a speaking feature of the conference person; step S4 of analyzing the mouth-shaped action to obtain the conference discussion content of the conference character includes obtaining the language used when the conference character introduces itself, matching the language type in the preset database according to the language to confirm the language type adopted by the corresponding conference character, listing the language type on one side of the image of the conference character, and compiling the discussion content according to the language type to generate a general translation content corresponding to the discussion content; step S5 of translating the conference discussion content to obtain corresponding translation content includes identifying the conference discussion content, obtaining a content recombination sentence after translation, and determining whether the recombination sentence is a normal translation sentence content, where the normal translation sentence content is an understandable content meeting the conference theme, if so, the sentence is an error-free sentence, otherwise, the sentence is an error sentence, and the sentence needs to be analyzed and translated again manually; step S6 of capturing the conference discussion content and the translation content by using a holographic projection technique and projecting the conference discussion content and the translation content to the image side of the conference character corresponding to the current conference site of each conference character includes obtaining the sound related to other conference characters in the conference content of the current conference character, determining whether the sound is emitted by other characters of the current conference, if so, obtaining the sound and converting the sound into the opinions issued by other conference characters at the same time, and projecting the issued opinions and the issued conference characters to the image side of the current discussion content.
In this embodiment, the step S2 of acquiring the image information of the conference persons includes:
s21: acquiring the characteristics of the conference persons according to the image information of the conference persons; the features include facial features;
s22: comparing the characteristics of the conference persons with preset characteristics of the conference persons to match the identity information of the conference persons;
s23: and identifying the conference people according to the identity information.
In the embodiment, the facial features of the conference persons are obtained by scanning the existing image information of the conference persons; for example, if the facial feature of the conference figure is acquired as that the left face has acne marks, comparing the facial feature of the conference figure with a conference figure feature table preset by a system to judge which conference figure is in the conference figure feature table, and matching the corresponding facial feature to match the identity information of the conference figure; if the face feature of the conference character does not exist in the conference character feature list, the face feature and the facial expression of the conference character are collected and recorded, the facial feature and the facial expression are recorded in the conference character feature list for recording, and the identity information of the conference character is supplemented manually by staff at the later stage.
In this embodiment, the step S3 of capturing the mouth movement of the conference person based on the image information includes:
s31: acquiring a three-dimensional motion track of a mouth of a conference character; the three-dimensional motion track is specifically a mouth motion track converted to an X, Y, Z axis by taking a mouth center as a base point;
s32: performing inertial motion capture according to the three-dimensional motion trajectory to obtain habitual words of the conference figure; the inertial motion capture is specifically an inertial motion of mouth transformation during speaking;
s33: and comparing the inertial motion with a preset inertial motion to obtain the speaking characteristics of the conference persons.
In the embodiment, by acquiring a mouth conversion track of a conference character, X, Y of a base point and a motion track of a Z axis are acquired by taking the mouth of the conference character as a central base point, so as to judge the pronunciation of the conference character, and further capture the explanation and speech of the conference character; for example, when a conference person makes a first utterance, X, Y at the center of the mouth and the motion trajectory of the Z-axis are simultaneously expanded outward, so that the first pronunciation of the conference person can be judged to be "a", and the first pronunciation habits of the conference person can be compared from the recorded inertial action table, so that the conference person can be judged to belong to russian; for example, when a conference character makes a first utterance, the motion tracks of the X and Y axes at the center of the mouth simultaneously expand outwards, and the motion track of the Z axis does not change, so that the first pronunciation of the conference character can be judged to be "an", and the first pronunciation habits of the conference character are compared from the recorded inertial action table to judge that the conference character belongs to a american; for example, when a conference person makes a first utterance, the head utterance of the conference person can be determined to be "en" without changing both the X, Y at the center of the mouth and the movement locus of the Z axis, and the conference person can be determined to belong to a chinese by comparing the head utterance habits of the conference person from the recorded inertial action table.
In this embodiment, the step S4 of parsing the mouth-type motion to obtain the conference discussion content of the conference person includes:
s41: acquiring a language used when the conference character introduces by itself;
s42: matching language types in a preset database according to the language to confirm the language types adopted by the corresponding conference characters;
s43: listing the language type on one side of the conference character image;
s44: and compiling the discussion content according to the language type to generate general translation content corresponding to the discussion content.
In the embodiment, the language used when each conference character introduces itself in turn when the conference starts is obtained to match the recorded language type in the system database, so that different languages adopted by each conference character can be judged; for example, a certain language is adopted when a certain conference character introduces itself, after the system acquires the language, the system judges that the characteristic of the language is that vowels are few, consonants are many, most consonants are relative to clear and turbid and relative to soft and hard, and corresponding language characteristics can be searched from a database, so that the language is judged to be the characteristic of russian, namely, the language type is russian projected on one side of the corresponding conference character, and meanwhile, the system correspondingly translates the russian into general language content, such as international general language english or chinese; for example, a certain language is adopted when a certain conference character introduces itself, after the system acquires the language, the system judges that the characteristic of the language is that the language is not searched in the database, namely, background personnel are informed to actively record the language content, the type of the language is manually judged in the later period, the type of the language is projected to be unknown on one side of the corresponding conference character, and the translation content is correspondingly modified to be incapable of being translated.
In this embodiment, the step S42 of matching the language type in the preset database according to the language to determine the language type adopted by the corresponding conference person includes:
s421: acquiring vowel features corresponding to the language;
s422: inputting one or more vowel features into a preset prediction model for prediction so as to predict languages corresponding to the vowel features;
s423: judging whether the language type is a preset language type or not;
s424: and if so, acquiring the language and the translation content corresponding to the language.
In this embodiment, the language corresponding to the vowel feature and the conference discussion content translated according to the language are obtained by obtaining the vowel feature in the pronunciation process of the conference character and inputting the vowel feature into the prediction model for prediction; for example, if the obtained vowel feature is a consonant feature, inputting the consonant feature into a prediction model for prediction, predicting that the language of the consonant feature is a roman language family, wherein the roman language family is divided into french, italian, spanish, portuguese, katani, galileosia, romania and romantic, and translating the conference discussion content to obtain that the language corresponding to the consonant feature of the conference character is spanish; for example, the acquired vowel feature is a nasal sound feature, the language of the nasal sound feature is predicted to be a roman language family, the language corresponding to the nasal sound feature of the conference character is a portuguese language according to the translation of the discussion content of the conference, the system translates the portuguese language corresponding to the conference character in the recorded portuguese language data in the database, and the translation content is correspondingly projected on one side of the conference character; for example, if the acquired vowel feature is a tongue feature, the tongue feature prediction model is used for predicting that the language of the consonant feature is a roman family, the language corresponding to the nasal feature of the conference character is roman assortment according to the translation of the conference discussion content, the system does not record roman assortment in a database, cannot translate the roman assortment of the corresponding conference character, namely, the system projects 'cannot translate' on one side of the corresponding conference character, and the language of the conference character is recorded and supplemented into the system by a post-worker so as to translate the roman assortment of the conference character next time.
In this embodiment, after the step S2 of acquiring the image information of the conference persons, the method includes:
s201: acquiring facial expressions of conference characters;
s202: matching the expression types in a preset database according to the facial expressions to confirm the emotional conditions of the corresponding conference characters;
s203: and listing the emotional conditions on one side of the corresponding conference person image.
In the embodiment, the current emotional conditions of each conference character are judged by acquiring the facial expressions of each conference character when the conference character is discussed in the conference process; for example, in the process of discussion of a certain conference character, the eyes of the conference character are large, the pupil becomes small, the eyebrow is pressed down, and the nostril is unconsciously enlarged, so that the system can judge that the conference character has angry emotion when the conference character is discussed at present through the characteristics, and the emotion of the corresponding conference character is reflected on one side of the image; for example, during the discussion of a conference person, the eyes of the conference person are slowly opened, the eyebrows of the conference person are gradually raised, and the mouth of the conference person does not grow, so that the system can judge that the conference person has a surprise emotion when the conference person is discussed currently through the characteristics, and the emotion of the corresponding conference person is reflected on one side of the image.
In this embodiment, after the step S4 of parsing the mouth-type motion to obtain the conference discussion content of the conference person, the method includes:
s401: obtaining sentence gap time when meeting figures discuss;
s402: judging whether the gap time is greater than a preset time period or not;
s403: if yes, the statement is copied; the translation specifically is to translate the sentence again according to each vocabulary of the sentence;
s404: judging whether the content of the sentence after being translated is the same as the content of the conference discussion or not;
s405: and if not, projecting the translated sentence content and the conference discussion content to one side of the corresponding conference figure image.
In the embodiment, sentence gap time of contents during discussion of a conference task is acquired and compared with a time period preset by a system to judge whether the sentences are too long, so that translated contents are slightly inconsistent because the translated contents are continuously translated; for example, when a conference person is in discussion, the speaking time is as long as 5min, wherein a sentence is continuously spoken for 30s, and the time period preset by the system is only 15s, that is, a certain discussion sentence of the conference character exceeds a preset time period of the system, because the sentence time period is too long, the system judges the translation content according to the continuity of the preceding sentence and the following sentence when translating the sentence, which may cause that the sentence may have a process that the translation of the sentence is skipped by the system because the sentence time is too long, the system needs to translate the sentence again, translate the sentence again through the connectivity of each vocabulary, to complete the duplication, and compare with the translation content before duplication to determine whether they are the same, if they are the same, only the same translation content is projected to one side of the corresponding conference figure, and if the two translation contents are different, the two translation contents are projected to one side of the corresponding conference figure at the same time.
In this embodiment, before the step of capturing the conference discussion content and the translation content by using a holographic projection technology and respectively projecting the conference discussion content and the translation content to a current corresponding conference figure image side in a conference site of each conference figure, the method includes:
s601: acquiring sounds related to other conference persons when the conference persons are in the discussion content of the current conference persons;
s602: judging whether the sound is made by other people of the current conference;
s603: if so, acquiring the sound and converting the sound into opinions published by other conference persons;
s604: projecting the published opinions and the published meeting people to one side of the current discussion.
In this embodiment, whether the conference people are identified or not by acquiring the discussion content voice data of the currently discussed conference people and judging whether the voice data contains the voice data of other people in the currently discussed conference people, namely the voice of other people participating in the conference, or not is determined; for example, when a conference person is in a long-term discussion, if there are suddenly present other conference persons expressing dissatisfaction, a short-time negative opinion is inserted into the discussion content of the conference person, but the discussion of the content of the conference person is not affected, and the system can determine whether there are different opinions of other conference persons by determining whether there are voice frequencies of other current conference persons in the voice frequency of the conference person; for example, when a conference person is in a long-term discussion, if there is suddenly an opinion agreement on another conference person, a short-term opinion will be inserted into the discussion of the conference person, but the discussion of the content of the conference person will not be affected.
In this embodiment, the step of translating the conference discussion content to obtain corresponding translated content includes:
s51: identifying the conference discussion content, and acquiring a content recombination statement after translation;
s52: judging whether the recombined sentence is a normal translation sentence content; the content of the normal translation statement is specifically understandable content conforming to the subject of the conference;
s53: if yes, the statement is an error-free statement; if not, the statement is a wrong statement, and the statement needs to be analyzed and translated again manually.
In this embodiment, the discussion content of each conference character is identified again to obtain a translated content sentence again, and the content sentence is manually judged to determine whether a translation error exists or not to confirm whether the translation is to be manually performed or not so as to correct the translation content; for example, in the process of manually observing a conference discussion of a certain conference character, it is found that there is an obvious error in the translated content of the conference character, and if the translated content does not conform to the conference subject of the translated discussion content, the discussion efficiency of the whole conference is reduced.
Referring to fig. 2, a data processing system based on conference image communication according to an embodiment of the present invention includes:
a first obtaining module 10, configured to obtain the number of images of people participating in a conference process;
the first acquisition module 20 is used for respectively acquiring the image information of the conference persons; the image information is specifically face information of a conference person;
a first capturing module 30 for capturing the mouth-shaped motion of the conference person according to the image information;
the second obtaining module 40 is configured to parse the mouth-shaped motion to obtain conference discussion content of conference people;
a third obtaining module 50, configured to translate the conference discussion content to obtain corresponding translation content;
and a first listing module 60, configured to respectively list the conference discussion content and the translation content to one side of the corresponding conference person image.
In this embodiment, the first obtaining module 10 obtains the number of images of the conference persons participating in the conference process, and compares the number of the conference persons participating in the conference process with the preset number of the conference persons participating in the conference, so as to eliminate the conference persons not needing to obtain the images; the first acquisition module 20 respectively acquires the image information of the conference persons; the image information is specifically face information of conference persons, the characteristics of the conference persons are obtained according to the image information of the conference persons, the characteristics comprise face characteristics, the characteristics of the conference persons are compared with preset characteristics of the conference persons to match with identity information of the conference persons, and the conference persons are identified according to the identity information; the first capturing module 30 captures mouth movements of the conference person according to the image information, acquires a mouth image of the conference person by using a preset high-definition camera, analyzes the mouth image to obtain a three-dimensional motion track of the mouth of the conference person, specifically, the three-dimensional motion track is a mouth motion track which is transformed to an X, Y, Z axis by taking a center of the mouth as a base point, performs inertial movement capturing according to the three-dimensional motion track to obtain a habitual utterance of the conference person, specifically, the inertial movement capturing is an inertial movement of mouth transformation during speaking, and compares the inertial movement with the preset inertial movement to obtain a speaking characteristic of the conference person; the second obtaining module 40 analyzes the mouth-shaped action to obtain the conference discussion content of the conference character, obtains the language used when the conference character introduces itself, matches the language type in the preset database according to the language to confirm the language type adopted by the corresponding conference character, lists the language type at one side of the image of the conference character, and compiles the discussion content according to the language type to generate the general translation content corresponding to the discussion content; the third obtaining module 50 translates the meeting discussion content to obtain corresponding translation content, identifies the meeting discussion content, obtains a content restructuring statement after translation, and determines whether the restructuring statement is a normal translated statement content, wherein the normal translated statement content is an understandable content meeting the meeting theme, if so, the statement is an error-free statement, and if not, the statement is an error statement, and the statement needs to be manually analyzed and translated again; the first listing module 60 captures the conference discussion content and the translation content by using a holographic projection technology, and respectively projects the conference discussion content and the translation content to one side of a conference figure image corresponding to the conference site of each conference figure, obtains a sound related to other conference figures in the conference content of the current conference figure, determines whether the sound is emitted by other figures of the current conference, if so, obtains the sound and simultaneously converts the sound into an opinion issued by other conference figures, and projects the issued opinion and the issued conference figures on one side of the current discussion content.
In this embodiment, the first acquisition module further includes:
a first acquisition unit configured to acquire a feature of a conference person based on image information of the conference person; the features include facial features;
the first matching unit is used for comparing the characteristics of the conference persons with the preset characteristics of the conference persons so as to match the identity information of the conference persons;
and the first identification unit is used for identifying the conference people according to the identity information.
In the embodiment, the facial features of the conference persons are obtained by scanning the existing image information of the conference persons; for example, if the facial feature of the conference figure is acquired as that the left face has acne marks, comparing the facial feature of the conference figure with a conference figure feature table preset by a system to judge which conference figure is in the conference figure feature table, and matching the corresponding facial feature to match the identity information of the conference figure; if the face feature of the conference character does not exist in the conference character feature list, the face feature and the facial expression of the conference character are collected and recorded, the facial feature and the facial expression are recorded in the conference character feature list for recording, and the identity information of the conference character is supplemented manually by staff at the later stage.
In this embodiment, the first capturing module further includes:
the second acquisition unit is used for acquiring a three-dimensional motion track of the mouth of the conference person; the three-dimensional motion track is specifically a mouth motion track converted to an X, Y, Z axis by taking a mouth center as a base point;
the third acquisition unit is used for carrying out inertial motion capture according to the three-dimensional motion track so as to acquire the habitual words of the conference figure; the inertial motion capture is specifically an inertial motion of mouth transformation during speaking;
and the fourth acquisition unit is used for comparing the inertial motion with a preset inertial motion so as to acquire the speaking characteristics of the conference persons.
In the embodiment, by acquiring a mouth conversion track of a conference character, X, Y of a base point and a motion track of a Z axis are acquired by taking the mouth of the conference character as a central base point, so as to judge the pronunciation of the conference character, and further capture the explanation and speech of the conference character; for example, when a conference person makes a first utterance, X, Y at the center of the mouth and the motion trajectory of the Z-axis are simultaneously expanded outward, so that the first pronunciation of the conference person can be judged to be "a", and the first pronunciation habits of the conference person can be compared from the recorded inertial action table, so that the conference person can be judged to belong to russian; for example, when a conference character makes a first utterance, the motion tracks of the X and Y axes at the center of the mouth simultaneously expand outwards, and the motion track of the Z axis does not change, so that the first pronunciation of the conference character can be judged to be "an", and the first pronunciation habits of the conference character are compared from the recorded inertial action table to judge that the conference character belongs to a american; for example, when a conference person makes a first utterance, the head utterance of the conference person can be determined to be "en" without changing both the X, Y at the center of the mouth and the movement locus of the Z axis, and the conference person can be determined to belong to a chinese by comparing the head utterance habits of the conference person from the recorded inertial action table.
In this embodiment, the second obtaining module further includes:
a fifth acquiring unit, configured to acquire a language used when a conference character introduces itself;
the first confirming unit is used for matching language types in a preset database according to the language so as to confirm the language types adopted by the corresponding conference characters;
the first listing unit is used for listing the language type on one side of the conference character image;
and the first generating unit is used for compiling the discussion content according to the language type so as to generate the general translation content corresponding to the discussion content.
In the embodiment, the language used when each conference character introduces itself in turn when the conference starts is obtained to match the recorded language type in the system database, so that different languages adopted by each conference character can be judged; for example, a certain language is adopted when a certain conference character introduces itself, after the system acquires the language, the system judges that the characteristic of the language is that vowels are few, consonants are many, most consonants are relative to turbid and soft and hard, and corresponding language characteristics can be searched from a database, so that the language is judged to be the characteristic of russian, namely, the language type is russian projected on one side of the corresponding conference character, and meanwhile, the system correspondingly translates the russian into general language content, such as international language english; for example, a certain language is adopted when a certain conference character introduces itself, after the system acquires the language, the system judges that the characteristic of the language is that the language is not searched in the database, namely, background personnel are informed to actively record the language content, the type of the language is manually judged in the later period, the type of the language is projected to be unknown on one side of the corresponding conference character, and the translation content is correspondingly modified to be incapable of being translated.
In this embodiment, the first confirming unit further includes:
the first acquiring subunit is used for acquiring vowel features corresponding to the language;
the first prediction subunit is used for inputting one or more vowel features into a preset prediction model for prediction so as to predict languages corresponding to the vowel features;
the first judging subunit is used for judging whether the language type is a preset language type;
and the first execution subunit is used for acquiring the language and the translation content corresponding to the language if the language is the same as the language.
In this embodiment, the language corresponding to the vowel feature and the conference discussion content translated according to the language are obtained by obtaining the vowel feature in the pronunciation process of the conference character and inputting the vowel feature into the prediction model for prediction; for example, if the obtained vowel feature is a consonant feature, inputting the consonant feature into a prediction model for prediction, predicting that the language of the consonant feature is a roman language family, wherein the roman language family is divided into french, italian, spanish, portuguese, katani, galileosia, romania and romantic, and translating the conference discussion content to obtain that the language corresponding to the consonant feature of the conference character is spanish; for example, the acquired vowel feature is a nasal sound feature, the language of the nasal sound feature is predicted to be a roman language family, the language corresponding to the nasal sound feature of the conference character is a portuguese language according to the translation of the discussion content of the conference, the system translates the portuguese language corresponding to the conference character in the recorded portuguese language data in the database, and the translation content is correspondingly projected on one side of the conference character; for example, if the acquired vowel feature is a tongue feature, the tongue feature prediction model is used for predicting that the language of the consonant feature is a roman family, the language corresponding to the nasal feature of the conference character is roman assortment according to the translation of the conference discussion content, the system does not record roman assortment in a database, cannot translate the roman assortment of the corresponding conference character, namely, the system projects 'cannot translate' on one side of the corresponding conference character, and the language of the conference character is recorded and supplemented into the system by a post-worker so as to translate the roman assortment of the conference character next time.
In this embodiment, the method further includes:
the fourth acquisition module is used for acquiring the facial expression of the conference character;
the first confirming module is used for matching the expression types in a preset database according to the facial expressions so as to confirm the emotional conditions of the corresponding conference characters;
and the first listing module is used for listing the emotional conditions on one side of the corresponding conference character image.
In the embodiment, the current emotional conditions of each conference character are judged by acquiring the facial expressions of each conference character when the conference character is discussed in the conference process; for example, in the process of discussion of a certain conference character, the eyes of the conference character are large, the pupil becomes small, the eyebrow is pressed down, and the nostril is unconsciously enlarged, so that the system can judge that the conference character has angry emotion when the conference character is discussed at present through the characteristics, and the emotion of the corresponding conference character is reflected on one side of the image; for example, during the discussion of a conference person, the eyes of the conference person are slowly opened, the eyebrows of the conference person are gradually raised, and the mouth of the conference person does not grow, so that the system can judge that the conference person has a surprise emotion when the conference person is discussed currently through the characteristics, and the emotion of the corresponding conference person is reflected on one side of the image.
In this embodiment, the method further includes:
the fifth acquisition module is used for acquiring the sentence gap time when the conference figures discuss;
the first judgment module is used for judging whether the clearance time is greater than a preset time period or not;
the first execution module is used for copying the statement if the statement is true; the translation specifically is to translate the sentence again according to each vocabulary of the sentence;
the second judging module is used for judging whether the content of the sentence after the translation is the same as the content of the conference discussion or not;
and the second execution module is used for projecting the translated sentence content and the conference discussion content to one side of the corresponding conference figure image if the sentence content and the conference discussion content are different.
In the embodiment, sentence gap time of contents during discussion of a conference task is acquired and compared with a time period preset by a system to judge whether the sentences are too long, so that translated contents are slightly inconsistent because the translated contents are continuously translated; for example, when a conference person is in discussion, the speaking time is as long as 5min, wherein a sentence is continuously spoken for 30s, and the time period preset by the system is only 15s, that is, a certain discussion sentence of the conference character exceeds a preset time period of the system, because the sentence time period is too long, the system judges the translation content according to the continuity of the preceding sentence and the following sentence when translating the sentence, which may cause that the sentence may have a process that the translation of the sentence is skipped by the system because the sentence time is too long, the system needs to translate the sentence again, translate the sentence again through the connectivity of each vocabulary, to complete the duplication, and compare with the translation content before duplication to determine whether they are the same, if they are the same, only the same translation content is projected to one side of the corresponding conference figure, and if the two translation contents are different, the two translation contents are projected to one side of the corresponding conference figure at the same time.
In this embodiment, the method further includes;
the sixth acquisition module is used for acquiring the sound related to other conference persons when the conference persons are in the discussion content of the current conference persons;
the third judgment module is used for judging whether the sound is emitted by other people of the current conference;
the third execution module is used for acquiring the sound and converting the sound into the opinions published by other conference persons if the third execution module is yes;
the first projection module is used for projecting the published opinions and the published conference characters on one side of the current discussion content.
In this embodiment, whether the conference people are identified or not by acquiring the discussion content voice data of the currently discussed conference people and judging whether the voice data contains the voice data of other people in the currently discussed conference people, namely the voice of other people participating in the conference, or not is determined; for example, when a conference person is in a long-term discussion, if there are suddenly present other conference persons expressing dissatisfaction, a short-time negative opinion is inserted into the discussion content of the conference person, but the discussion of the content of the conference person is not affected, and the system can determine whether there are different opinions of other conference persons by determining whether there are voice frequencies of other current conference persons in the voice frequency of the conference person; for example, when a conference person is in a long-term discussion, if there is suddenly an opinion agreement on another conference person, a short-term opinion will be inserted into the discussion of the conference person, but the discussion of the content of the conference person will not be affected.
In this embodiment, the first listing module further includes:
a sixth obtaining unit, configured to identify the conference discussion content, and obtain a content reassembly statement that is translated;
the first judging unit is used for judging whether the recombined sentence is a normal translation sentence content; the content of the normal translation statement is specifically understandable content conforming to the subject of the conference;
the first execution unit is used for judging the statement to be an error-free statement if the statement is the error-free statement; if not, the statement is a wrong statement, and the statement needs to be analyzed and translated again manually.
In this embodiment, the discussion content of each conference character is identified again to obtain a translated content sentence again, and the content sentence is manually judged to determine whether a translation error exists or not to confirm whether the translation is to be manually performed or not so as to correct the translation content; for example, in the process of manually observing a conference discussion of a certain conference character, it is found that there is an obvious error in the translated content of the conference character, and if the translated content does not conform to the conference subject of the translated discussion content, the discussion efficiency of the whole conference is reduced.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A data processing method based on conference image communication is characterized by comprising the following steps:
acquiring the number of images of participants in the conference process;
respectively acquiring image information of the conference persons; the image information is specifically face information of a conference person;
capturing the mouth-shaped action of the conference person according to the image information;
analyzing the mouth-shaped action to obtain the conference discussion content of the conference figure;
translating the conference discussion content to acquire corresponding translation content;
and capturing the conference discussion content and the translation content by adopting a holographic projection technology, and respectively projecting the conference discussion content and the translation content to one side of a conference figure image corresponding to each conference figure in the conference site.
2. The data processing method based on conference image communication as claimed in claim 1, wherein said step of respectively capturing the image information of the conference persons comprises:
acquiring the characteristics of the conference persons according to the image information of the conference persons; the features include facial features;
comparing the characteristics of the conference persons with preset characteristics of the conference persons to match the identity information of the conference persons;
and identifying the conference people according to the identity information.
3. The conference image communication-based data processing method according to claim 1, wherein said step of capturing a mouth-like motion of a conference person based on the image information comprises:
adopting a preset high-definition camera to collect mouth images of conference persons;
analyzing the mouth image to obtain a three-dimensional motion track of the mouth of the conference person; the three-dimensional motion track is specifically a mouth motion track converted to an X, Y, Z axis by taking a mouth center as a base point;
performing inertial motion capture according to the three-dimensional motion trajectory to obtain habitual words of the conference figure; the inertial motion capture is specifically an inertial motion of mouth transformation during speaking;
and comparing the inertial motion with a preset inertial motion to obtain the speaking characteristics of the conference persons.
4. The conference image communication-based data processing method according to claim 1, wherein the step of parsing the mouth-type motion to obtain conference discussion content of conference persons comprises:
acquiring a language used when the conference character introduces by itself;
matching language types in a preset database according to the language to confirm the language types adopted by the corresponding conference characters;
listing the language type on one side of the conference character image;
and compiling the discussion content according to the language type to generate general translation content corresponding to the discussion content.
5. The data processing method based on conference image communication as claimed in claim 4, wherein said step of matching language type in a preset database according to said language to confirm the language type adopted by the corresponding conference person comprises:
acquiring vowel features corresponding to the language;
inputting one or more vowel features into a preset prediction model for prediction so as to predict languages corresponding to the vowel features;
judging whether the language type is a preset language type or not;
and if so, acquiring the language and the translation content corresponding to the language.
6. The data processing method based on conference image communication as claimed in claim 1, wherein said step of respectively capturing the image information of the conference persons comprises:
acquiring facial expressions of conference characters;
matching the expression types in a preset database according to the facial expressions to confirm the emotional conditions of the corresponding conference characters;
and listing the emotional conditions on one side of the corresponding conference person image.
7. The conference image communication-based data processing method according to claim 1, wherein said step of parsing the mouth-type motion to obtain conference discussion content of conference people comprises:
obtaining sentence gap time when meeting figures discuss;
judging whether the gap time is greater than a preset time period or not;
if yes, the statement is copied; the translation specifically is to translate the sentence again according to each vocabulary of the sentence;
judging whether the content of the sentence after being translated is the same as the content of the conference discussion or not;
and if not, projecting the translated sentence content and the conference discussion content to one side of the corresponding conference figure image.
8. The conference image communication-based data processing method according to claim 1, wherein before the step of capturing the conference discussion content and the translation content by using a holographic projection technology and projecting the conference discussion content and the translation content to a current corresponding conference figure image side in a conference site of each conference figure, the method comprises:
acquiring sounds related to other conference persons when the conference persons are in the discussion content of the current conference persons;
judging whether the sound is made by other people of the current conference;
if so, acquiring the sound and converting the sound into opinions published by other conference persons;
projecting the published opinions and the published meeting people to one side of the current discussion.
9. The data processing method based on conference image communication according to claim 1, wherein said step of translating the conference discussion content to obtain the corresponding translated content comprises:
identifying the conference discussion content, and acquiring a content recombination statement after translation;
judging whether the recombined sentence is a normal translation sentence content; the content of the normal translation statement is specifically understandable content conforming to the subject of the conference;
if yes, the statement is an error-free statement; if not, the statement is a wrong statement, and the statement needs to be analyzed and translated again manually.
10. A data processing system based on conference image communication, comprising:
the first acquisition module is used for acquiring the number of images of participants in the conference process;
the first acquisition module is used for respectively acquiring the image information of the conference persons; the image information is specifically face information of a conference person;
the first capturing module is used for capturing the mouth-shaped action of the conference person according to the image information;
the second acquisition module is used for analyzing the mouth-shaped action so as to acquire the conference discussion content of the conference figure;
the third obtaining module is used for translating the conference discussion content to obtain corresponding translation content;
and the first listing module is used for listing the conference discussion content and the translation content to one side of the corresponding conference figure image respectively.
CN202210228974.XA 2022-03-10 2022-03-10 Data processing method and system based on conference image communication Active CN114333072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210228974.XA CN114333072B (en) 2022-03-10 2022-03-10 Data processing method and system based on conference image communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210228974.XA CN114333072B (en) 2022-03-10 2022-03-10 Data processing method and system based on conference image communication

Publications (2)

Publication Number Publication Date
CN114333072A true CN114333072A (en) 2022-04-12
CN114333072B CN114333072B (en) 2022-06-17

Family

ID=81033963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210228974.XA Active CN114333072B (en) 2022-03-10 2022-03-10 Data processing method and system based on conference image communication

Country Status (1)

Country Link
CN (1) CN114333072B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944397A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Video recording method, device and computer-readable recording medium
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN110072075A (en) * 2019-04-30 2019-07-30 平安科技(深圳)有限公司 Conference management method, system and readable storage medium based on face recognition
US20190251970A1 (en) * 2018-02-15 2019-08-15 DMAI, Inc. System and method for disambiguating a source of sound based on detected lip movement
CN111339514A (en) * 2020-02-18 2020-06-26 江苏怀业信息技术股份有限公司 Device with conference presence recognition function and method thereof
CN113053361A (en) * 2021-03-18 2021-06-29 北京金山云网络技术有限公司 Speech recognition method, model training method, device, equipment and medium
WO2022013045A1 (en) * 2020-07-17 2022-01-20 Clinomic GmbH Method for automatic lip reading by means of a functional component and for providing said functional component

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944397A (en) * 2017-11-27 2018-04-20 腾讯音乐娱乐科技(深圳)有限公司 Video recording method, device and computer-readable recording medium
US20190251970A1 (en) * 2018-02-15 2019-08-15 DMAI, Inc. System and method for disambiguating a source of sound based on detected lip movement
CN108960103A (en) * 2018-06-25 2018-12-07 西安交通大学 The identity identifying method and system that a kind of face and lip reading blend
CN110072075A (en) * 2019-04-30 2019-07-30 平安科技(深圳)有限公司 Conference management method, system and readable storage medium based on face recognition
CN111339514A (en) * 2020-02-18 2020-06-26 江苏怀业信息技术股份有限公司 Device with conference presence recognition function and method thereof
WO2022013045A1 (en) * 2020-07-17 2022-01-20 Clinomic GmbH Method for automatic lip reading by means of a functional component and for providing said functional component
CN113053361A (en) * 2021-03-18 2021-06-29 北京金山云网络技术有限公司 Speech recognition method, model training method, device, equipment and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
杨帆: "基于深度学习的唇语识别应用的研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
杨帆: "基于深度学习的唇语识别应用的研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, no. 9, 15 September 2018 (2018-09-15), pages 138 - 322 *
马金林 等: "唇语识别的视觉特征提取方法综述", 《计算机科学与探索》 *
马金林 等: "唇语识别的视觉特征提取方法综述", 《计算机科学与探索》, vol. 15, no. 12, 30 December 2021 (2021-12-30), pages 2256 - 2275 *

Also Published As

Publication number Publication date
CN114333072B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
Petridis et al. The MAHNOB laughter database
US8131551B1 (en) System and method of providing conversational visual prosody for talking heads
US7353177B2 (en) System and method of providing conversational visual prosody for talking heads
Fernandez-Lopez et al. Towards estimating the upper bound of visual-speech recognition: The visual lip-reading feasibility database
Petridis et al. Audiovisual discrimination between speech and laughter: Why and when visual information might help
US11942093B2 (en) System and method for simultaneous multilingual dubbing of video-audio programs
Pápay et al. Hucomtech multimodal corpus annotation
JP7107229B2 (en) Information processing device, information processing method, and program
Wang et al. Computer-assisted audiovisual language learning
Petridis et al. Audiovisual laughter detection based on temporal features
Kaiser Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application
CN114333072B (en) Data processing method and system based on conference image communication
Ballard et al. A multimodal learning interface for word acquisition
Heracleous et al. Visual-speech to text conversion applicable to telephone communication for deaf individuals
JP7107228B2 (en) Information processing device, information processing method, and program
Zorić et al. Real-time language independent lip synchronization method using a genetic algorithm
Abdo et al. Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech.
Burnham et al. A blueprint for a comprehensive Australian English auditory-visual speech corpus
Bastanfard et al. A comprehensive audio-visual corpus for teaching sound persian phoneme articulation
Gavras et al. Towards a Personalized Multimodal System for Natural Voice Reproduction
Sheela et al. Indian Sign Language Translator
Cooke et al. Gaze-contingent automatic speech recognition
Ronzhin et al. PARAD-R: Speech analysis software for meeting support
Zhi et al. A Bayesian Model of Multimodal Phonemic Category Learning
Rao Audio-visual interaction in multimedia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant