CN106782545A - A kind of system and method that audio, video data is changed into writing record - Google Patents
A kind of system and method that audio, video data is changed into writing record Download PDFInfo
- Publication number
- CN106782545A CN106782545A CN201611170040.6A CN201611170040A CN106782545A CN 106782545 A CN106782545 A CN 106782545A CN 201611170040 A CN201611170040 A CN 201611170040A CN 106782545 A CN106782545 A CN 106782545A
- Authority
- CN
- China
- Prior art keywords
- data
- speaker
- participant
- current
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000036651 mood Effects 0.000 claims abstract description 27
- 238000013480 data collection Methods 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims description 39
- 230000001755 vocal effect Effects 0.000 claims description 38
- 239000013589 supplement Substances 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000002360 preparation method Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The present invention relates to a kind of system and method that audio, video data is changed into writing record, wherein system includes data collection section, data discrimination section and data tissue part;The data collection section includes audio collection module and video acquisition module;The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;The data tissue part according between at the beginning of the text information, identification, the mood of the identity label of current speaker, current speaker, generate writing record.The present invention is more careful intactly to preserve whole audio, video data process, closer to real situation;Audio, video data is converted to text formatting and is preserved by the present invention, greatly reduces the cost of storage and transmission, also allows for subsequently checking record, can more rapidly browse and position conference content.
Description
Technical field
The present invention relates to a kind of data processing technique, and in particular to a kind of to be by what audio, video data changed into writing record
System and method.
Background technology
When audio/video conference is held, in order to record conference content, video data and Mike are gathered usually using camera
Elegance collection voice data gathers voice data using only microphone, and audio, video data or voice data are preserved into multimedia
File, in storage to storage device;By playing multimedia file, conference content can be watched or listened to.Or can be by
Special scribe is put down by the input equipments such as computer or handwriting mode, records conference content.
, it is necessary to by audio-video document storage to depositing by the way of the equipment video-audio frequency data such as camera, microphone
It is relatively costly in storage equipment, it is therefore desirable to take larger memory space, and the later stage is looked into by playing multimedia file
See conference content, it is impossible to fast browsing and navigate on specific topic, it is therefore desirable to spend the more time, and may
There is the omission of content, cause inefficiency.Although conference content is recorded by the way of notes is contributed to fast browsing and determined
The specific topic in position, but requirement to recorder is higher, if writing speed is far below the speed discussed in meeting, is easy for omitting
And error, cause content sufficiently complete and careful, and subsequently check that record cannot reduce scene at that time.
Changed prior art discloses a kind of sound image data bidirectional reversible voice of Chinese language and foreign language and fill the side of captions
Method, by recognizing sound image data, by speech recognition therein into text, and translates the text into the foreign language specified, with word
The form of curtain is superimposed upon on picture, is stored together with original voice or synchronism output, so that sound image data becomes band
There is the sound image data of specified foreign language caption.But the method is, simply by speech recognition into text, and text to be translated
The simultaneous display on picture in the form of subtitles after into specified foreign language, but be organized into captions not according to the difference of speaker
More press close to the content of truth.
The content of the invention
In order to the conference content for solving above-mentioned record is imperfect and careful, and subsequently check that record cannot be reduced at that time
Scene technical problem, the invention provides a kind of system and method that audio, video data is changed into writing record, technology
Scheme is as follows.
A kind of system that audio, video data is changed into writing record, including data collection section, data discrimination section with
And data tissue part;
The data collection section includes audio collection module and video acquisition module;
The audio collection module is used to capturing the voice data of current speaker, and between recording at the beginning of it is spoken;
The video acquisition module is used to capturing the image of current speaker, and between recording at the beginning of it is spoken;
The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;
The voice and voiceprint identification module are processed the voice data that the audio collection module is captured, by its turn
The text information of text formatting is melted into, and the voice data that the audio collection module is captured is processed, be used to recognize and work as
Preceding speaker, and assign an identity label to each speaker;
The face and Expression Recognition module are entered using Expression Recognition technology to the image that the video acquisition module is captured
Row treatment, recognizes the expression of current speaker, obtains the mood of the speaker, and to the image of video acquisition module capture
Processed, be used to recognize current speaker, and an identity label is assigned to each speaker;
The data tissue part according between at the beginning of the text information, identification, the identity label of current speaker
And the mood of current speaker, generate writing record.
Further, the writing record is the writing record of dialogic operation.
Further, the system also includes data filling and amendment part, and the data filling and amendment part are used for
The writing record is supplemented and corrected.
Further, the audio collection module is microphone, and/or, the video acquisition module is camera.
Further, the system also includes memory module, and the memory module is stored with and records the body comprising speaker
The tables of data of part label, vocal print feature data and face feature data.
A kind of method that audio, video data is changed into writing record, comprises the following steps:
Step S21, Data Collection:The voice data of current speaker and the image of current speaker are captured, and records it
Between at the beginning of speaking;
Step S22, data identification:Voice data to capturing is processed, and converts it into the word letter of text formatting
Breath;Image to capturing is processed, and recognizes the expression of current speaker, obtains the mood of the speaker;To the audio for capturing
Data and/or the image to capturing are processed, and are used to recognize current speaker, and for each speaker assigns an identity mark
Sign;
Step 23, data tissue:The data tissue part according between at the beginning of the text information, identification, it is current
The identity label of speaker and the mood of current speaker, generate writing record.
Further, the writing record is the writing record of dialogic operation.
Further, methods described also includes step S24, data filling and amendment:The writing record is supplemented
And amendment.
Further, the vocal print feature data and/or face feature data of the identity label and speaker are in association
Whether storage in a storage module, before identity label is assigned to each speaker, first searches stored in the memory module
There is the identity label matched with the speaker, if do not found, assign the speaker one identity label.
A kind of method that audio, video data is changed into writing record, comprises the following steps:
Step S30, preparation:Start microphone and camera, create speaker's list, create the file for preserving text
Address, the wherein project of speaker's list include unique identity label, the vocal print feature data of speaker and the face of speaker
Portion's characteristic;
Step S31:Capture data:When speaker starts speech, the voice of the speaker that microphones capture is currently made a speech is defeated
Between entering at the beginning of the voice data of the speaker for obtaining current speech, participant's voice data of the current speech of record;Simultaneously
The image of the speaker of the current speech of camera capture obtains the video data of the speaker of current speech, the current speech of record
Between at the beginning of participant's video data;
Step S32:The audio of the speaker of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e
Data, and carry out vocal print feature identification;Speaking for the current speech that camera is obtained is analyzed and processed using face recognition technology
The video data of person, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data,
If successfully recognizing vocal print feature data and/or face feature data, into step S34;If recognizing sound not successfully
Line characteristic and face feature data, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature number in speaker's list
According to the speaker for matching, if it is present into step S35, while the relevant information of the complete speaker of supplement;If no
In the presence of, then new entry is added in speaker's list, preserve identity label, vocal print feature data and the face feature of new speaker
Data, while into step S35;
Step S35:Voice data is analyzed and processed using speech recognition technology, the behaviour that voice data changes into word is carried out
Make;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then
Return to step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the speaker of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the voice of the speaker of date, time, matching vocal print and/or face feature data
The word of identification, the mood of Expression Recognition are organized into the writing record of dialogic operation, and are stored in the file address of establishment;
Step S40:Return to step S31 is continued executing with, and the content of different speaker's speeches is preserved, until conversation procedure knot
Beam.
Beneficial effects of the present invention:
The present invention is more careful intactly to preserve whole audio, video data process, closer to real situation;The present invention will
Audio, video data is converted to text formatting and is preserved, and greatly reduces the cost of storage and transmission, also allows for subsequently checking note
Record, can more rapidly browse and position conference content.
The present invention identifies different participants using sound groove recognition technology in e, face recognition technology;And by speech recognition skill
The content of text that art and Expression Recognition technology are obtained is arranged and organized, and forms the writing record of dialogic operation.
Provides convenient user of the present invention supplement the interface of amendment record, to ensure the correctness of writing record, improves
It is readable.
Brief description of the drawings
Fig. 1 is the structured flowchart of the system that audio, video data is changed into writing record proposed by the present invention;
Fig. 2 is the structural representation of the system that audio, video data is changed into writing record proposed by the present invention;
The first flow chart of Fig. 3 methods that audio, video data is changed into writing record proposed by the present invention;
Second flow chart of Fig. 4 methods that audio, video data is changed into writing record proposed by the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference
Accompanying drawing, the present invention is described in more detail.Illustrated by taking video conference video recording as an example in embodiment, but people in the art
Member knows that the method can be used in all video images, it is not limited to accompanying drawing and following examples.
The present invention is needed using public affairs such as speech recognition technology, sound groove recognition technology in e, face recognition technology, Expression Recognition technologies
Technology is known, for obtaining necessary data.Existing each identification technology comparatively perfect, it is possible to achieve " unrelated with text
Application on Voiceprint Recognition ", " face tracking ", " facial action identification ", " expression shape change identification " etc., the present invention no longer does to these technologies
Detailed description and explanation.It is proposed by the present invention that the system and method that audio, video data changes into writing record are counted according to these
According to audio, video data being organized into the complete and writing record of the dialogic operation of image.
Embodiment 1:
As depicted in figs. 1 and 2, it is proposed by the present invention that the system that audio, video data changes into writing record is received including data
Collection part, data discrimination section, data tissue part and data filling and amendment part.
Data collection section includes microphone, images first-class data acquisition device.
Microphone is used for the voice data of the participant for capturing current speech, when participant starts speech, microphone collection
The voice data of the participant of current speech, the intensity size according to the voice data for collecting judges the participant of current speech
Making a speech or pausing, thinking that participant speech terminates if pausing and exceeding certain hour (such as 3s), record is current
Between at the beginning of participant's voice data of speech and the end time, by the voice data of the participant of current speech together with beginning
Time (can also add the end time) and the device identifier of microphone send data discrimination section to together.Wherein transmit
The effect of the device identifier of microphone is to be used to distinguish different conferenced parties in Multi-Party Conference.
Camera is used for the image of the participant for capturing current speech, and when participant starts speech, camera collection is current
The image of the participant of speech, judges that the participant of current speech is making a speech or pause according to the image for collecting, if
Pause and then think that participant speech terminates more than certain hour (such as 3s), the starting of participant's speech of the current speech of record
Time and end time, the image that will be gathered is together with time started (can also add the end time) and the equipment mark of camera
Know symbol and send data discrimination section to together.The effect of the wherein device identifier of transmission camera is in order in Multi-Party Conference
It is used to distinguish different conferenced parties.
Wherein, in order to ensure the time consistency that microphone and camera are recorded, microphone and camera use identical mark
Between punctual.
Data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module.
Voice and voiceprint identification module receive voice data, the time started of microphones capture (at the end of can also adding
Between) and microphone device identifier;The voice data for capturing is processed using speech recognition technology, by audio format
Voice data change into the text information of text formatting, and using sound groove recognition technology in e at the voice data that captures
Reason, is used to recognize the participant of current speech.
Face and Expression Recognition module receive the image of camera capture, the time started (end time can also be added) with
And the device identifier of camera, the image for capturing is processed using Expression Recognition technology, the participant of the current speech of identification
The expression of person, obtains participant mood at that time, and the image for capturing is processed using face recognition technology, is used to know
The participant not made a speech not currently.
The vocal print feature data of the participant of current speech can be recognized using sound groove recognition technology in e, using recognition of face skill
Art can recognize the face feature data of the participant of current speech, therefore, voice and voiceprint identification module are according to vocal print feature
Data and face and Expression Recognition module can identify and distinguish between out different participants according to face feature data.In addition,
The device identifier and face and Expression Recognition module setting using camera of voice and voiceprint identification module using microphone
Standby identifier can identify and distinguish between out different conferenced parties.
Voice and voiceprint identification module and face and Expression Recognition module are in processing procedure to the participant that each is made a speech
Person assigns unique identity label, for example:During one side's meeting, can be made with " participant A ", " participant B ", " participant C " etc.
For identity label is assigned to participant;During Multi-Party Conference, the conducts such as " participant A1 ", " participant B2 ", " participant C1 " can be used
Identity label is assigned to participant, and first character " A " wherein in label, " B ", " C " represent each conferenced parties, second character " 1 ",
" 2 " represent certain participant in certain conferenced parties.Data tissue portion of tissue writing record for convenience, voice and Application on Voiceprint Recognition
The identity label that module and face and Expression Recognition module are assigned to same participant should be identical, for example, can be according to connecing
The identity of the participant for identifying voice and voiceprint identification module and face and Expression Recognition module between at the beginning of receiving
Label is unified.
For the ease of searching and managing, identity label of the record comprising participant, vocal print feature data and face can be set up
The tables of data of portion's characteristic, the information of the participant for recording speech, tables of data storage (is not shown in memory module in figure
Go out) in.Vocal print feature data and face feature data can be obtained by sound groove recognition technology in e and face recognition technology, for example, lead to
The face recognition technology of 21 point model position locations is crossed, the feature of face can be just described by these key points, accuracy can
Reach 96%.
Text information, time started, the identity of participant that data tissue part obtains according to data discrimination section treatment
The data such as label, the mood of participant, organize these data according to certain form, generation comprising the time started and
The identity label of meeting person, the writing record of the mood of participant and text information are simultaneously preserved.
For the ease of checking, the content line feed record of different participants.
Organized formats for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from foregoing each several part;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, will can be integrated with the content on a date, record can be integrated example as the aforementioned
For:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Data filling and amendment part are used to that the writing record that data tissue part preserves to be supplemented and corrected, to carry
The readability of writing record high, it is ensured that the correctness of writing record.Such as provides convenient user is carried out to the writing record for preserving
Supplement and amendment interface and prompting, and will supplement and corrector name, supplement and the correction time, supplement and amendment content enter
Row record, facilitates consultation.
The mode and content of supplement be, for example,:Theme, the summary of problem, meeting of the input frame by user input meeting are provided
The information such as conclusion, facilitate other people quickly to understand conference content;
The content of amendment is, for example, the errors in text in writing record, or replace some information, for example:Whole meeting
View process has recognized 3 participant's speeches, now ejects prompting frame and prompts the user whether to need the body of " participant A, B, C "
Part tag replacement is into respective really name.
Embodiment 2:
The invention allows for a kind of method that audio, video data is changed into writing record, the method flow chart such as Fig. 3
It is shown, comprise the following steps:
Step S21, Data Collection:
When participant starts speech, the voice data of the participant of the current speech of microphone collection, according to the sound for collecting
The intensity size of frequency evidence judges that the participant of current speech is making a speech or pausing, if pause exceeding certain hour (example
Such as 3s) if think that participant speech terminates, at the beginning of participant's voice data of the current speech of record between and the end time,
By the voice data of the participant of current speech together with time started (end time can also be added) and the equipment of microphone
Identifier sends data identification step to together.The effect of the wherein device identifier of transmission microphone is in order in Multi-Party Conference
When be used to distinguish different conferenced parties;
Camera is used for the image of the participant for capturing current speech, and when participant starts speech, camera collection is current
The image of the participant of speech, judges that the participant of current speech is making a speech or pause according to the image for collecting, if
Pause and then think that participant speech terminates more than certain hour (such as 3s), the starting of participant's speech of the current speech of record
Time and end time, the image that will be gathered is together with time started (can also add the end time) and the equipment mark of camera
Know symbol and send data identification step to together.The effect of the wherein device identifier of transmission camera is in order in Multi-Party Conference
It is used to distinguish different conferenced parties.
Wherein, in order to ensure the time consistency that microphone and camera are recorded, microphone and camera use identical mark
Between punctual.
Step S22, data identification:
The equipment for receiving the voice data, time started (end time can also be added) and microphone of microphones capture
Identifier;The voice data for capturing is processed using speech recognition technology, the voice data conversion of audio format is written
The text information of this form, and the voice data for capturing is processed using sound groove recognition technology in e, it is used to recognize current speech
Participant.
Receive the device identification of the image, time started (end time can also be added) and camera of camera capture
Symbol, is processed the image for capturing using Expression Recognition technology, the expression of the participant of the current speech of identification, obtains the participant
Person's mood at that time, and the image for capturing is processed using face recognition technology, it is used to recognize the participant of current speech.
The vocal print feature data of the participant of current speech can be recognized using sound groove recognition technology in e, using recognition of face skill
Art can recognize the face feature data of the participant of current speech, therefore, voice and voiceprint identification module are according to vocal print feature
Data and face and Expression Recognition module can identify and distinguish between out different participants according to face feature data.In addition,
The device identifier and face and Expression Recognition module setting using camera of voice and voiceprint identification module using microphone
Standby identifier can identify and distinguish between out different conferenced parties.
Voice and voiceprint identification module and face and Expression Recognition module are in processing procedure to the participant that each is made a speech
Person assigns unique identity label, for example:During one side's meeting, can be made with " participant A ", " participant B ", " participant C " etc.
For identity label is assigned to participant;During Multi-Party Conference, the conducts such as " participant A1 ", " participant B2 ", " participant C1 " can be used
Identity label is assigned to participant, and first character " A " wherein in label, " B ", " C " represent each conferenced parties, second character " 1 ",
" 2 " represent certain participant in certain conferenced parties.Text information, voice and voiceprint identification module and face are organized for convenience
The identity label assigned to same participant with Expression Recognition module should be identical, for example, at the beginning of can be according to receiving
Between the identity label of participant that identifies voice and voiceprint identification module and face and Expression Recognition module unified.
For the ease of searching and managing, identity label of the record comprising participant, vocal print feature data and face can be set up
The tables of data of portion's characteristic, the information of the participant for recording speech, it is ensured that the identity label of same participant is unique, and
And ensure that vocal print feature data and face feature data are corresponding with the identity label of corresponding participant consistent.Vocal print feature data
Can be obtained by sound groove recognition technology in e and face recognition technology with face feature data, such as by 21 point model position locations
Face recognition technology, the feature of face can just be described by these key points, accuracy can reach 96%.
Step S23, data tissue:
According to text information, time started, the identity label of participant, participant that the treatment of data identification step is obtained
The data such as mood, organize these data according to certain form, and generation includes time started, the identity mark of participant
The writing record of label, the mood of participant and text information is simultaneously preserved in the form of a dialog.
For the ease of checking, the content line feed record of different participants.
Organized formats for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from These steps;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, memory space is saved, will can be integrated with the content on a date, example is as the aforementioned
Record can be integrated into:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Step S24, data filling and amendment
The writing record that data organising step is preserved is supplemented and corrected, to improve the readability of writing record, is protected
Demonstrate,prove the correctness of writing record.Interface and carry that such as provides convenient user is supplemented and corrected to the writing record for preserving
Show, and will supplement and corrector name, supplement and the correction time, supplement and amendment content recorded, facilitate consultation.
The mode and content of supplement be, for example,:Theme, the summary of problem, meeting of the input frame by user input meeting are provided
The information such as conclusion, facilitate other people quickly to understand conference content;
The content of amendment is, for example, the errors in text in writing record, or replace some information, for example:Whole meeting
View process has recognized 3 participant's speeches, now ejects prompting frame and prompts the user whether to need the body of " participant A, B, C "
Part tag replacement is into respective really name.
Embodiment 3:
The invention allows for a kind of method that audio, video data is changed into writing record, the method flow chart such as Fig. 4
It is shown, comprise the following steps:
Step S30, preparation:
Start microphone and camera, create participant's list, create the file address for preserving text, wherein participant's row
Table includes unique identity label of participant, also vocal print feature data and face feature including the participant subsequently to be gathered
Data;
Each participant is endowed unique identity label, for example:During one side's meeting, " participant A ", " participant can be used
Person B ", " participant C " etc. are assigned to participant as identity label;During Multi-Party Conference, " participant A1 ", " participant can be used
B2 ", " participant C1 " etc. are assigned to participant as identity label, and first character " A ", " B ", " C " are represented respectively wherein in label
Conferenced parties, second character " 1 ", " 2 " represent certain participant in certain conferenced parties.
Step S31:When participant starts speech, the phonetic entry of the participant that microphones capture is currently made a speech obtains current
Between at the beginning of the voice data of the participant of speech, participant's voice data of the current speech of record;Camera capture simultaneously
The image of the participant of current speech obtains the video data of the participant of current speech, participant's video of the current speech of record
Between at the beginning of data;
Step S32:The audio of the participant of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e
Data, and carry out vocal print feature identification;The participant of the current speech that camera is obtained is analyzed and processed using face recognition technology
The video data of person, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data,
If successfully recognizing vocal print feature data and/or face feature data, into step S34;If recognizing sound not successfully
Line characteristic and face feature data, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature number in participant's list
According to the participant for matching, if it is present into step S35, while the relevant information of the complete participant of supplement is (if i.e.
Vocal print feature data are only existed in participant's list without face feature data, then supplements face feature data;If participant
Face feature data are only existed in person's list without vocal print feature data, then supplements vocal print feature data);If it does not,
New entry is then added in participant's list, identity label, vocal print feature data and the face feature data of new participant are preserved,
Enter step S35 simultaneously;
Step S35:Voice data is analyzed and processed using speech recognition technology, the behaviour that voice data changes into word is carried out
Make;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then
Return to step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the participant of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the voice of the participant of date, time, matching vocal print and/or face feature data
The word of identification, the mood of Expression Recognition are organized into the writing record of dialogic operation, and are stored in the file address of establishment;
The form for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from These steps;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, memory space is saved, will can be integrated with the content on a date, example is as the aforementioned
Record can be integrated into:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Step S40:Return to step S31 is continued executing with, and the content of different participant's speeches is preserved, until whole meeting knot
Beam.
More than, embodiments of the present invention are illustrated.But, the present invention is not limited to above-mentioned implementation method.It is all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in guarantor of the invention
Within the scope of shield.
Claims (10)
1. a kind of system that audio, video data is changed into writing record, it is characterised in that know including data collection section, data
Not part and data tissue part;
The data collection section includes audio collection module and video acquisition module;
The audio collection module is used to capturing the voice data of current speaker, and between recording at the beginning of it is spoken;
The video acquisition module is used to capturing the image of current speaker, and between recording at the beginning of it is spoken;
The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;
The voice and voiceprint identification module are processed the voice data that the audio collection module is captured, and are converted it into
The text information of text formatting, and the voice data that the audio collection module is captured is processed, it is used to recognize and currently says
Words person, and assign an identity label to each speaker;
At the image that the face and Expression Recognition module are captured using Expression Recognition technology to the video acquisition module
Reason, recognizes the expression of current speaker, obtains the mood of the speaker, and the image that the video acquisition module is captured is carried out
Treatment, is used to recognize current speaker, and assign an identity label to each speaker;
The data tissue part according to the text information, identification at the beginning of between, the identity label of current speaker and
The mood of current speaker, generates writing record.
2. system according to claim 1, it is characterised in that the writing record is the writing record of dialogic operation.
3. system according to claim 1 and 2, it is characterised in that the system also includes data filling and amendment part,
The data filling and amendment part are used to that the writing record to be supplemented and corrected.
4. system according to claim 1 and 2, it is characterised in that the audio collection module is microphone, and/or, institute
Video acquisition module is stated for camera.
5. system according to claim 1 and 2, it is characterised in that the system also includes memory module, the storage mould
Block is stored with and records the tables of data of the identity label comprising speaker, vocal print feature data and face feature data.
6. a kind of method that audio, video data is changed into writing record, it is characterised in that comprise the following steps:
Step S21, Data Collection:The voice data of current speaker and the image of current speaker are captured, and records it and spoken
At the beginning of between;
Step S22, data identification:Voice data to capturing is processed, and converts it into the text information of text formatting;It is right
The image of capture is processed, and recognizes the expression of current speaker, obtains the mood of the speaker;To the voice data for capturing
And/or the image to capturing is processed, it is used to recognize current speaker, and for each speaker assigns an identity label;
Step 23, data tissue:The data tissue part according to the text information, identification at the beginning of between, currently speak
The identity label of person and the mood of current speaker, generate writing record.
7. method according to claim 6, it is characterised in that the writing record is the writing record of dialogic operation.
8. the method according to claim 6 or 7, it is characterised in that methods described also include step S24, data filling and
Amendment:The writing record is supplemented and corrected.
9. the method according to claim 6 or 7, it is characterised in that the vocal print feature number of the identity label and speaker
According to and/or face feature data store in association in a storage module, to each speaker assign identity label before, first look into
Whether the identity label for being stored with and having been matched with the speaker is looked in the memory module, if do not found, assigning should
One identity label of speaker.
10. a kind of method that audio, video data is changed into writing record, it is characterised in that comprise the following steps:
Step S30, preparation:Start microphone and camera, create speaker's list, create the file ground for preserving text
Location, the wherein project of speaker's list include unique identity label, the vocal print feature data of speaker and the face of speaker
Characteristic;
Step S31:Capture data:When speaker starts speech, the phonetic entry of the speaker that microphones capture is currently made a speech is obtained
Between obtaining at the beginning of the voice data of the speaker for currently making a speech, participant's voice data of the current speech of record;Image simultaneously
The image of the speaker of the current speech of head capture obtains the video data of the speaker of current speech, the participant of the current speech of record
Between at the beginning of person's video data;
Step S32:The voice data of the speaker of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e,
And carry out vocal print feature identification;The speaker's of the current speech obtained using face recognition technology analysis and treatment camera is regarded
Frequency evidence, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data, if
Vocal print feature data and/or face feature data are successfully recognized, then into step S34;If failed recognize vocal print spy
Data and face feature data are levied, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature data phase in speaker's list
The speaker of matching, if it is present into step S35, while the relevant information of the complete speaker of supplement;If do not deposited
, then new entry is added in speaker's list, preserve identity label, vocal print feature data and the face feature number of new speaker
According to while into step S35;
Step S35:Voice data is analyzed and processed using speech recognition technology, the operation that voice data changes into word is carried out;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then returning
Step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the speaker of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the speech recognition of the speaker of date, time, matching vocal print and/or face feature data
Word, the mood of Expression Recognition is organized into the writing record of dialogic operation, and is stored in the file address of establishment;
Step S40:Return to step S31 is continued executing with, and the content of different speaker's speeches is preserved, until conversation procedure terminates.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611170040.6A CN106782545B (en) | 2016-12-16 | 2016-12-16 | A kind of system and method that audio, video data is converted to writing record |
PCT/CN2017/077035 WO2018107605A1 (en) | 2016-12-16 | 2017-03-17 | System and method for converting audio/video data into written records |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611170040.6A CN106782545B (en) | 2016-12-16 | 2016-12-16 | A kind of system and method that audio, video data is converted to writing record |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106782545A true CN106782545A (en) | 2017-05-31 |
CN106782545B CN106782545B (en) | 2019-07-16 |
Family
ID=58892306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611170040.6A Active CN106782545B (en) | 2016-12-16 | 2016-12-16 | A kind of system and method that audio, video data is converted to writing record |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106782545B (en) |
WO (1) | WO2018107605A1 (en) |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609045A (en) * | 2017-08-17 | 2018-01-19 | 深圳壹秘科技有限公司 | A kind of minutes generating means and its method |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN107818785A (en) * | 2017-09-26 | 2018-03-20 | 平安普惠企业管理有限公司 | A kind of method and terminal device that information is extracted from multimedia file |
CN107885723A (en) * | 2017-11-03 | 2018-04-06 | 广州杰赛科技股份有限公司 | Conversational character differentiating method and system |
CN107918771A (en) * | 2017-12-07 | 2018-04-17 | 河北工业大学 | Character recognition method and Worn type person recognition system |
CN108062359A (en) * | 2017-11-28 | 2018-05-22 | 歌尔科技有限公司 | A kind of robot interview and the method contributed to newspapers, magazines,etc. and interview and the robot contributed to newspapers, magazines,etc. |
CN108256513A (en) * | 2018-03-23 | 2018-07-06 | 中国科学院长春光学精密机械与物理研究所 | A kind of intelligent video analysis method and intelligent video record system |
CN108537508A (en) * | 2018-03-30 | 2018-09-14 | 上海爱优威软件开发有限公司 | Minutes method and system |
CN108573033A (en) * | 2018-03-27 | 2018-09-25 | 中国科学院长春光学精密机械与物理研究所 | Cyborg network of vein method for building up based on recognition of face and relevant device |
CN108712624A (en) * | 2018-08-08 | 2018-10-26 | 上海启诺信息科技有限公司 | Video recording archive devices based on writing record and method |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
CN108810446A (en) * | 2018-06-07 | 2018-11-13 | 北京智能管家科技有限公司 | A kind of label generating method of video conference, device, equipment and medium |
CN108986825A (en) * | 2018-07-02 | 2018-12-11 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
CN109118626A (en) * | 2018-08-08 | 2019-01-01 | 腾讯科技(深圳)有限公司 | Control method, device, storage medium and the electronic device of lockset |
CN109150556A (en) * | 2018-07-31 | 2019-01-04 | 何镝 | More people's teleconferences based on speech recognition record system |
CN109285548A (en) * | 2017-07-19 | 2019-01-29 | 阿里巴巴集团控股有限公司 | Information processing method, system, electronic equipment and computer storage medium |
CN109325737A (en) * | 2018-09-17 | 2019-02-12 | 态度国际咨询管理(深圳)有限公司 | A kind of enterprise intelligent virtual assistant system and its method |
CN109361825A (en) * | 2018-11-12 | 2019-02-19 | 平安科技(深圳)有限公司 | Meeting summary recording method, terminal and computer storage medium |
CN109788232A (en) * | 2018-12-18 | 2019-05-21 | 视联动力信息技术股份有限公司 | A kind of summary of meeting recording method of video conference, device and system |
CN109817245A (en) * | 2019-01-17 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Generation method, device, computer equipment and the storage medium of meeting summary |
CN109817221A (en) * | 2018-12-26 | 2019-05-28 | 出门问问信息科技有限公司 | More people's video methods, device, equipment and storage medium |
CN109920428A (en) * | 2017-12-12 | 2019-06-21 | 杭州海康威视数字技术股份有限公司 | A kind of notes input method, device, electronic equipment and storage medium |
CN109979458A (en) * | 2019-01-17 | 2019-07-05 | 平安科技(深圳)有限公司 | News interview original text automatic generation method and relevant device based on artificial intelligence |
CN110019286A (en) * | 2017-07-19 | 2019-07-16 | ***通信有限公司研究院 | A kind of expression recommended method and device based on user social contact relationship |
CN110139062A (en) * | 2019-05-09 | 2019-08-16 | 平安科技(深圳)有限公司 | A kind of creation method, device and the terminal device of video conference record |
CN110196914A (en) * | 2019-07-29 | 2019-09-03 | 上海肇观电子科技有限公司 | A kind of method and apparatus by face information input database |
CN110232925A (en) * | 2019-06-28 | 2019-09-13 | 百度在线网络技术(北京)有限公司 | Generate the method, apparatus and conference terminal of minutes |
CN110232553A (en) * | 2018-03-05 | 2019-09-13 | 柯尼卡美能达株式会社 | Meeting support system and computer-readable recording medium |
CN110278398A (en) * | 2018-03-16 | 2019-09-24 | 杭州海康威视数字技术股份有限公司 | Date storage method, video clip searching method, device and electronic equipment |
CN110297907A (en) * | 2019-06-28 | 2019-10-01 | 谭浩 | Generate method, computer readable storage medium and the terminal device of interview report |
CN110378226A (en) * | 2019-06-14 | 2019-10-25 | 平安科技(深圳)有限公司 | Image processing method, device, computer equipment and storage medium based on long-range court's trial |
WO2019227579A1 (en) * | 2018-05-29 | 2019-12-05 | 平安科技(深圳)有限公司 | Conference information recording method and apparatus, computer device, and storage medium |
CN110557678A (en) * | 2018-05-31 | 2019-12-10 | 北京百度网讯科技有限公司 | Video processing method, device and equipment |
CN110661923A (en) * | 2018-06-28 | 2020-01-07 | 视联动力信息技术股份有限公司 | Method and device for recording speech information in conference |
CN110674351A (en) * | 2019-10-09 | 2020-01-10 | 常州市小先信息技术有限公司 | Audio digital label retrieval system and method for monitoring system |
CN110895575A (en) * | 2018-08-24 | 2020-03-20 | 阿里巴巴集团控股有限公司 | Audio processing method and device |
CN111221987A (en) * | 2019-12-30 | 2020-06-02 | 秒针信息技术有限公司 | Hybrid audio tagging method and apparatus |
CN111291179A (en) * | 2018-12-06 | 2020-06-16 | 北京嘀嘀无限科技发展有限公司 | Conversation classification method and device, electronic equipment and storage medium |
CN111369384A (en) * | 2019-12-23 | 2020-07-03 | 国网河南省电力公司郑州供电公司 | Power transformation operation and maintenance hidden danger overall process control system |
WO2020154916A1 (en) * | 2019-01-29 | 2020-08-06 | 深圳市欢太科技有限公司 | Video subtitle synthesis method and apparatus, storage medium, and electronic device |
CN111708912A (en) * | 2020-05-06 | 2020-09-25 | 深圳震有科技股份有限公司 | Video conference record query processing method and device |
CN111916053A (en) * | 2020-08-17 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Voice generation method, device, equipment and computer readable medium |
CN111938674A (en) * | 2020-09-07 | 2020-11-17 | 南京宇乂科技有限公司 | Emotion recognition control system for conversation |
CN112037791A (en) * | 2020-08-12 | 2020-12-04 | 广东电力信息科技有限公司 | Conference summary transcription method, apparatus and storage medium |
CN112584078A (en) * | 2019-09-27 | 2021-03-30 | 深圳市万普拉斯科技有限公司 | Video call method, video call device, computer equipment and storage medium |
CN112669847A (en) * | 2020-12-03 | 2021-04-16 | 安徽宝信信息科技有限公司 | Intelligent screen capable of being used for automatic editing and sorting of conference records |
WO2022016994A1 (en) * | 2020-07-23 | 2022-01-27 | 平安科技(深圳)有限公司 | Ai recognition-based meeting minutes generation method and apparatus, device and medium |
CN114023351A (en) * | 2021-12-17 | 2022-02-08 | 广东讯飞启明科技发展有限公司 | Speech enhancement method and system based on noisy environment |
WO2022037383A1 (en) * | 2020-08-17 | 2022-02-24 | 北京字节跳动网络技术有限公司 | Voice processing method and apparatus, electronic device, and computer readable medium |
WO2022062471A1 (en) * | 2020-09-25 | 2022-03-31 | 华为技术有限公司 | Audio data processing method, device and system |
TWI764020B (en) * | 2019-07-24 | 2022-05-11 | 圓展科技股份有限公司 | Video conference system and method thereof |
CN114630069A (en) * | 2022-02-22 | 2022-06-14 | 东南大学 | Audio and video optimization method, audio terminal and storage medium |
CN114745213A (en) * | 2022-04-11 | 2022-07-12 | 深信服科技股份有限公司 | Conference record generation method and device, electronic equipment and storage medium |
WO2022179253A1 (en) * | 2021-02-26 | 2022-09-01 | 华为技术有限公司 | Speech operation method for device, apparatus, and electronic device |
CN117577115A (en) * | 2024-01-15 | 2024-02-20 | 杭州讯意迪科技有限公司 | Intelligent paperless conference system |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920639B (en) * | 2018-07-02 | 2022-01-18 | 北京百度网讯科技有限公司 | Context obtaining method and device based on voice interaction |
CN110838291B (en) * | 2018-08-16 | 2024-06-18 | 北京搜狗科技发展有限公司 | Input method and device and electronic equipment |
CN109525800A (en) * | 2018-11-08 | 2019-03-26 | 江西国泰利民信息科技有限公司 | A kind of teleconference voice recognition data transmission method |
CN110853646B (en) * | 2019-11-20 | 2023-11-03 | 深圳前海微众银行股份有限公司 | Conference speaking role distinguishing method, device, equipment and readable storage medium |
CN111785271A (en) * | 2020-05-15 | 2020-10-16 | 深圳奥尼电子股份有限公司 | Voice recognition processing method and system and electronic equipment |
CN111860523B (en) * | 2020-07-28 | 2024-04-30 | 上海兑观信息科技技术有限公司 | Intelligent recording system and method for sound image files |
CN111899743A (en) * | 2020-07-31 | 2020-11-06 | 斑马网络技术有限公司 | Method and device for acquiring target sound, electronic equipment and storage medium |
CN111860915A (en) * | 2020-07-31 | 2020-10-30 | 贵州东冠科技有限公司 | Prison management-based family affection assisting system |
CN112185391A (en) * | 2020-09-30 | 2021-01-05 | 深圳供电局有限公司 | Automatic modification processing method for customer service record |
CN114299953B (en) * | 2021-12-29 | 2022-08-23 | 湖北微模式科技发展有限公司 | Speaker role distinguishing method and system combining mouth movement analysis |
CN116757646B (en) * | 2023-08-15 | 2023-11-10 | 成都市青羊大数据有限责任公司 | Comprehensive management system for teaching |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298694A (en) * | 2011-06-21 | 2011-12-28 | 广东爱科数字科技有限公司 | Man-machine interaction identification system applied to remote information service |
CN103903074A (en) * | 2012-12-24 | 2014-07-02 | 华为技术有限公司 | Information processing method and information processing device for video communication |
US20150235654A1 (en) * | 2011-06-17 | 2015-08-20 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
CN104932665A (en) * | 2014-03-19 | 2015-09-23 | 联想(北京)有限公司 | Information processing method and electronic device |
CN105512348A (en) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | Method and device for processing videos and related audios and retrieving method and device |
CN106033339A (en) * | 2015-03-13 | 2016-10-19 | 联想(北京)有限公司 | Information processing method and electronic device |
-
2016
- 2016-12-16 CN CN201611170040.6A patent/CN106782545B/en active Active
-
2017
- 2017-03-17 WO PCT/CN2017/077035 patent/WO2018107605A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150235654A1 (en) * | 2011-06-17 | 2015-08-20 | At&T Intellectual Property I, L.P. | Speaker association with a visual representation of spoken content |
CN102298694A (en) * | 2011-06-21 | 2011-12-28 | 广东爱科数字科技有限公司 | Man-machine interaction identification system applied to remote information service |
CN103903074A (en) * | 2012-12-24 | 2014-07-02 | 华为技术有限公司 | Information processing method and information processing device for video communication |
CN104932665A (en) * | 2014-03-19 | 2015-09-23 | 联想(北京)有限公司 | Information processing method and electronic device |
CN106033339A (en) * | 2015-03-13 | 2016-10-19 | 联想(北京)有限公司 | Information processing method and electronic device |
CN105512348A (en) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | Method and device for processing videos and related audios and retrieving method and device |
Cited By (71)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11664030B2 (en) | 2017-07-19 | 2023-05-30 | Alibaba Group Holding Limited | Information processing method, system, electronic device, and computer storage medium |
CN109285548A (en) * | 2017-07-19 | 2019-01-29 | 阿里巴巴集团控股有限公司 | Information processing method, system, electronic equipment and computer storage medium |
CN110019286A (en) * | 2017-07-19 | 2019-07-16 | ***通信有限公司研究院 | A kind of expression recommended method and device based on user social contact relationship |
CN110019286B (en) * | 2017-07-19 | 2021-10-29 | ***通信有限公司研究院 | Expression recommendation method and device based on user social relationship |
CN107609045A (en) * | 2017-08-17 | 2018-01-19 | 深圳壹秘科技有限公司 | A kind of minutes generating means and its method |
CN107818785A (en) * | 2017-09-26 | 2018-03-20 | 平安普惠企业管理有限公司 | A kind of method and terminal device that information is extracted from multimedia file |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN107885723B (en) * | 2017-11-03 | 2021-04-09 | 广州杰赛科技股份有限公司 | Conversation role distinguishing method and system |
CN107885723A (en) * | 2017-11-03 | 2018-04-06 | 广州杰赛科技股份有限公司 | Conversational character differentiating method and system |
CN108062359A (en) * | 2017-11-28 | 2018-05-22 | 歌尔科技有限公司 | A kind of robot interview and the method contributed to newspapers, magazines,etc. and interview and the robot contributed to newspapers, magazines,etc. |
CN108062359B (en) * | 2017-11-28 | 2021-03-16 | 歌尔科技有限公司 | Method for robot interview and draft writing and robot interview and draft writing |
CN107918771A (en) * | 2017-12-07 | 2018-04-17 | 河北工业大学 | Character recognition method and Worn type person recognition system |
CN107918771B (en) * | 2017-12-07 | 2023-11-24 | 河北工业大学 | Person identification method and wearable person identification system |
CN109920428A (en) * | 2017-12-12 | 2019-06-21 | 杭州海康威视数字技术股份有限公司 | A kind of notes input method, device, electronic equipment and storage medium |
CN110232553A (en) * | 2018-03-05 | 2019-09-13 | 柯尼卡美能达株式会社 | Meeting support system and computer-readable recording medium |
CN110278398B (en) * | 2018-03-16 | 2022-01-21 | 杭州海康威视数字技术股份有限公司 | Data storage method, video clip searching method, device and electronic equipment |
CN110278398A (en) * | 2018-03-16 | 2019-09-24 | 杭州海康威视数字技术股份有限公司 | Date storage method, video clip searching method, device and electronic equipment |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
CN108256513A (en) * | 2018-03-23 | 2018-07-06 | 中国科学院长春光学精密机械与物理研究所 | A kind of intelligent video analysis method and intelligent video record system |
CN108573033A (en) * | 2018-03-27 | 2018-09-25 | 中国科学院长春光学精密机械与物理研究所 | Cyborg network of vein method for building up based on recognition of face and relevant device |
CN108537508A (en) * | 2018-03-30 | 2018-09-14 | 上海爱优威软件开发有限公司 | Minutes method and system |
WO2019227579A1 (en) * | 2018-05-29 | 2019-12-05 | 平安科技(深圳)有限公司 | Conference information recording method and apparatus, computer device, and storage medium |
CN110557678A (en) * | 2018-05-31 | 2019-12-10 | 北京百度网讯科技有限公司 | Video processing method, device and equipment |
CN110557678B (en) * | 2018-05-31 | 2022-05-03 | 北京百度网讯科技有限公司 | Video processing method, device and equipment |
CN108810446A (en) * | 2018-06-07 | 2018-11-13 | 北京智能管家科技有限公司 | A kind of label generating method of video conference, device, equipment and medium |
CN110661923A (en) * | 2018-06-28 | 2020-01-07 | 视联动力信息技术股份有限公司 | Method and device for recording speech information in conference |
CN108986825A (en) * | 2018-07-02 | 2018-12-11 | 北京百度网讯科技有限公司 | Context acquisition methods and equipment based on interactive voice |
CN109150556A (en) * | 2018-07-31 | 2019-01-04 | 何镝 | More people's teleconferences based on speech recognition record system |
CN108712624A (en) * | 2018-08-08 | 2018-10-26 | 上海启诺信息科技有限公司 | Video recording archive devices based on writing record and method |
CN109118626A (en) * | 2018-08-08 | 2019-01-01 | 腾讯科技(深圳)有限公司 | Control method, device, storage medium and the electronic device of lockset |
CN110895575B (en) * | 2018-08-24 | 2023-06-23 | 阿里巴巴集团控股有限公司 | Audio processing method and device |
CN110895575A (en) * | 2018-08-24 | 2020-03-20 | 阿里巴巴集团控股有限公司 | Audio processing method and device |
CN109325737A (en) * | 2018-09-17 | 2019-02-12 | 态度国际咨询管理(深圳)有限公司 | A kind of enterprise intelligent virtual assistant system and its method |
CN109361825A (en) * | 2018-11-12 | 2019-02-19 | 平安科技(深圳)有限公司 | Meeting summary recording method, terminal and computer storage medium |
CN111291179A (en) * | 2018-12-06 | 2020-06-16 | 北京嘀嘀无限科技发展有限公司 | Conversation classification method and device, electronic equipment and storage medium |
CN111291179B (en) * | 2018-12-06 | 2023-12-08 | 北京嘀嘀无限科技发展有限公司 | Dialogue classification method and device, electronic equipment and storage medium |
CN109788232A (en) * | 2018-12-18 | 2019-05-21 | 视联动力信息技术股份有限公司 | A kind of summary of meeting recording method of video conference, device and system |
CN109817221B (en) * | 2018-12-26 | 2021-01-01 | 出门问问信息科技有限公司 | Multi-person video method, device, equipment and storage medium |
CN109817221A (en) * | 2018-12-26 | 2019-05-28 | 出门问问信息科技有限公司 | More people's video methods, device, equipment and storage medium |
CN109979458A (en) * | 2019-01-17 | 2019-07-05 | 平安科技(深圳)有限公司 | News interview original text automatic generation method and relevant device based on artificial intelligence |
CN109817245A (en) * | 2019-01-17 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Generation method, device, computer equipment and the storage medium of meeting summary |
WO2020154916A1 (en) * | 2019-01-29 | 2020-08-06 | 深圳市欢太科技有限公司 | Video subtitle synthesis method and apparatus, storage medium, and electronic device |
CN110139062A (en) * | 2019-05-09 | 2019-08-16 | 平安科技(深圳)有限公司 | A kind of creation method, device and the terminal device of video conference record |
CN110139062B (en) * | 2019-05-09 | 2022-10-18 | 平安科技(深圳)有限公司 | Video conference record creating method and device and terminal equipment |
CN110378226A (en) * | 2019-06-14 | 2019-10-25 | 平安科技(深圳)有限公司 | Image processing method, device, computer equipment and storage medium based on long-range court's trial |
CN110232925A (en) * | 2019-06-28 | 2019-09-13 | 百度在线网络技术(北京)有限公司 | Generate the method, apparatus and conference terminal of minutes |
CN110297907A (en) * | 2019-06-28 | 2019-10-01 | 谭浩 | Generate method, computer readable storage medium and the terminal device of interview report |
TWI764020B (en) * | 2019-07-24 | 2022-05-11 | 圓展科技股份有限公司 | Video conference system and method thereof |
CN110196914A (en) * | 2019-07-29 | 2019-09-03 | 上海肇观电子科技有限公司 | A kind of method and apparatus by face information input database |
CN110196914B (en) * | 2019-07-29 | 2019-12-27 | 上海肇观电子科技有限公司 | Method and device for inputting face information into database |
US10922570B1 (en) | 2019-07-29 | 2021-02-16 | NextVPU (Shanghai) Co., Ltd. | Entering of human face information into database |
CN112584078A (en) * | 2019-09-27 | 2021-03-30 | 深圳市万普拉斯科技有限公司 | Video call method, video call device, computer equipment and storage medium |
CN112584078B (en) * | 2019-09-27 | 2022-03-18 | 深圳市万普拉斯科技有限公司 | Video call method, video call device, computer equipment and storage medium |
CN110674351A (en) * | 2019-10-09 | 2020-01-10 | 常州市小先信息技术有限公司 | Audio digital label retrieval system and method for monitoring system |
CN111369384A (en) * | 2019-12-23 | 2020-07-03 | 国网河南省电力公司郑州供电公司 | Power transformation operation and maintenance hidden danger overall process control system |
CN111221987A (en) * | 2019-12-30 | 2020-06-02 | 秒针信息技术有限公司 | Hybrid audio tagging method and apparatus |
CN111708912A (en) * | 2020-05-06 | 2020-09-25 | 深圳震有科技股份有限公司 | Video conference record query processing method and device |
WO2022016994A1 (en) * | 2020-07-23 | 2022-01-27 | 平安科技(深圳)有限公司 | Ai recognition-based meeting minutes generation method and apparatus, device and medium |
CN112037791A (en) * | 2020-08-12 | 2020-12-04 | 广东电力信息科技有限公司 | Conference summary transcription method, apparatus and storage medium |
WO2022037383A1 (en) * | 2020-08-17 | 2022-02-24 | 北京字节跳动网络技术有限公司 | Voice processing method and apparatus, electronic device, and computer readable medium |
CN111916053A (en) * | 2020-08-17 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Voice generation method, device, equipment and computer readable medium |
CN111938674A (en) * | 2020-09-07 | 2020-11-17 | 南京宇乂科技有限公司 | Emotion recognition control system for conversation |
WO2022062471A1 (en) * | 2020-09-25 | 2022-03-31 | 华为技术有限公司 | Audio data processing method, device and system |
CN112669847A (en) * | 2020-12-03 | 2021-04-16 | 安徽宝信信息科技有限公司 | Intelligent screen capable of being used for automatic editing and sorting of conference records |
WO2022179253A1 (en) * | 2021-02-26 | 2022-09-01 | 华为技术有限公司 | Speech operation method for device, apparatus, and electronic device |
CN114023351A (en) * | 2021-12-17 | 2022-02-08 | 广东讯飞启明科技发展有限公司 | Speech enhancement method and system based on noisy environment |
CN114630069A (en) * | 2022-02-22 | 2022-06-14 | 东南大学 | Audio and video optimization method, audio terminal and storage medium |
CN114745213A (en) * | 2022-04-11 | 2022-07-12 | 深信服科技股份有限公司 | Conference record generation method and device, electronic equipment and storage medium |
CN114745213B (en) * | 2022-04-11 | 2024-05-28 | 深信服科技股份有限公司 | Conference record generation method and device, electronic equipment and storage medium |
CN117577115A (en) * | 2024-01-15 | 2024-02-20 | 杭州讯意迪科技有限公司 | Intelligent paperless conference system |
CN117577115B (en) * | 2024-01-15 | 2024-03-29 | 杭州讯意迪科技有限公司 | Intelligent paperless conference system |
Also Published As
Publication number | Publication date |
---|---|
WO2018107605A1 (en) | 2018-06-21 |
CN106782545B (en) | 2019-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782545B (en) | A kind of system and method that audio, video data is converted to writing record | |
CN106657865B (en) | Conference summary generation method and device and video conference system | |
US11699456B2 (en) | Automated transcript generation from multi-channel audio | |
CN108346034B (en) | Intelligent conference management method and system | |
US10034028B2 (en) | Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs | |
US6687671B2 (en) | Method and apparatus for automatic collection and summarization of meeting information | |
CN107527623B (en) | Screen transmission method and device, electronic equipment and computer readable storage medium | |
US8315866B2 (en) | Generating representations of group interactions | |
CN107911646A (en) | The method and device of minutes is shared, is generated in a kind of meeting | |
CN109003608A (en) | Court's trial control method, system, computer equipment and storage medium | |
CN107333090B (en) | Video conference data processing method and platform | |
CN102207844A (en) | Information processing device, information processing method and program | |
CN111415128B (en) | Method, system, device, equipment and medium for controlling conference | |
CN106157957A (en) | Audio recognition method, device and subscriber equipment | |
KR102462219B1 (en) | Method of Automatically Generating Meeting Minutes Using Speaker Diarization Technology | |
US8615153B2 (en) | Multi-media data editing system, method and electronic device using same | |
US20220013127A1 (en) | Electronic Speech to Text Court Reporting System For Generating Quick and Accurate Transcripts | |
CN113365109A (en) | Method and device for generating video subtitles, electronic equipment and storage medium | |
KR102346668B1 (en) | apparatus for interpreting conference | |
CN103594086B (en) | Speech processing system, device and method | |
CN113611308B (en) | Voice recognition method, device, system, server and storage medium | |
CN211788155U (en) | Intelligent conference recording system | |
KR101783872B1 (en) | Video Search System and Method thereof | |
JP2019176375A (en) | Moving image output apparatus, moving image output method, and moving image output program | |
KR102291113B1 (en) | Apparatus and method for producing conference record |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |