CN106782545A - A kind of system and method that audio, video data is changed into writing record - Google Patents

A kind of system and method that audio, video data is changed into writing record Download PDF

Info

Publication number
CN106782545A
CN106782545A CN201611170040.6A CN201611170040A CN106782545A CN 106782545 A CN106782545 A CN 106782545A CN 201611170040 A CN201611170040 A CN 201611170040A CN 106782545 A CN106782545 A CN 106782545A
Authority
CN
China
Prior art keywords
data
speaker
participant
current
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611170040.6A
Other languages
Chinese (zh)
Other versions
CN106782545B (en
Inventor
李纯冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd, Guangzhou Shirui Electronics Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201611170040.6A priority Critical patent/CN106782545B/en
Priority to PCT/CN2017/077035 priority patent/WO2018107605A1/en
Publication of CN106782545A publication Critical patent/CN106782545A/en
Application granted granted Critical
Publication of CN106782545B publication Critical patent/CN106782545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention relates to a kind of system and method that audio, video data is changed into writing record, wherein system includes data collection section, data discrimination section and data tissue part;The data collection section includes audio collection module and video acquisition module;The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;The data tissue part according between at the beginning of the text information, identification, the mood of the identity label of current speaker, current speaker, generate writing record.The present invention is more careful intactly to preserve whole audio, video data process, closer to real situation;Audio, video data is converted to text formatting and is preserved by the present invention, greatly reduces the cost of storage and transmission, also allows for subsequently checking record, can more rapidly browse and position conference content.

Description

A kind of system and method that audio, video data is changed into writing record
Technical field
The present invention relates to a kind of data processing technique, and in particular to a kind of to be by what audio, video data changed into writing record System and method.
Background technology
When audio/video conference is held, in order to record conference content, video data and Mike are gathered usually using camera Elegance collection voice data gathers voice data using only microphone, and audio, video data or voice data are preserved into multimedia File, in storage to storage device;By playing multimedia file, conference content can be watched or listened to.Or can be by Special scribe is put down by the input equipments such as computer or handwriting mode, records conference content.
, it is necessary to by audio-video document storage to depositing by the way of the equipment video-audio frequency data such as camera, microphone It is relatively costly in storage equipment, it is therefore desirable to take larger memory space, and the later stage is looked into by playing multimedia file See conference content, it is impossible to fast browsing and navigate on specific topic, it is therefore desirable to spend the more time, and may There is the omission of content, cause inefficiency.Although conference content is recorded by the way of notes is contributed to fast browsing and determined The specific topic in position, but requirement to recorder is higher, if writing speed is far below the speed discussed in meeting, is easy for omitting And error, cause content sufficiently complete and careful, and subsequently check that record cannot reduce scene at that time.
Changed prior art discloses a kind of sound image data bidirectional reversible voice of Chinese language and foreign language and fill the side of captions Method, by recognizing sound image data, by speech recognition therein into text, and translates the text into the foreign language specified, with word The form of curtain is superimposed upon on picture, is stored together with original voice or synchronism output, so that sound image data becomes band There is the sound image data of specified foreign language caption.But the method is, simply by speech recognition into text, and text to be translated The simultaneous display on picture in the form of subtitles after into specified foreign language, but be organized into captions not according to the difference of speaker More press close to the content of truth.
The content of the invention
In order to the conference content for solving above-mentioned record is imperfect and careful, and subsequently check that record cannot be reduced at that time Scene technical problem, the invention provides a kind of system and method that audio, video data is changed into writing record, technology Scheme is as follows.
A kind of system that audio, video data is changed into writing record, including data collection section, data discrimination section with And data tissue part;
The data collection section includes audio collection module and video acquisition module;
The audio collection module is used to capturing the voice data of current speaker, and between recording at the beginning of it is spoken;
The video acquisition module is used to capturing the image of current speaker, and between recording at the beginning of it is spoken;
The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;
The voice and voiceprint identification module are processed the voice data that the audio collection module is captured, by its turn The text information of text formatting is melted into, and the voice data that the audio collection module is captured is processed, be used to recognize and work as Preceding speaker, and assign an identity label to each speaker;
The face and Expression Recognition module are entered using Expression Recognition technology to the image that the video acquisition module is captured Row treatment, recognizes the expression of current speaker, obtains the mood of the speaker, and to the image of video acquisition module capture Processed, be used to recognize current speaker, and an identity label is assigned to each speaker;
The data tissue part according between at the beginning of the text information, identification, the identity label of current speaker And the mood of current speaker, generate writing record.
Further, the writing record is the writing record of dialogic operation.
Further, the system also includes data filling and amendment part, and the data filling and amendment part are used for The writing record is supplemented and corrected.
Further, the audio collection module is microphone, and/or, the video acquisition module is camera.
Further, the system also includes memory module, and the memory module is stored with and records the body comprising speaker The tables of data of part label, vocal print feature data and face feature data.
A kind of method that audio, video data is changed into writing record, comprises the following steps:
Step S21, Data Collection:The voice data of current speaker and the image of current speaker are captured, and records it Between at the beginning of speaking;
Step S22, data identification:Voice data to capturing is processed, and converts it into the word letter of text formatting Breath;Image to capturing is processed, and recognizes the expression of current speaker, obtains the mood of the speaker;To the audio for capturing Data and/or the image to capturing are processed, and are used to recognize current speaker, and for each speaker assigns an identity mark Sign;
Step 23, data tissue:The data tissue part according between at the beginning of the text information, identification, it is current The identity label of speaker and the mood of current speaker, generate writing record.
Further, the writing record is the writing record of dialogic operation.
Further, methods described also includes step S24, data filling and amendment:The writing record is supplemented And amendment.
Further, the vocal print feature data and/or face feature data of the identity label and speaker are in association Whether storage in a storage module, before identity label is assigned to each speaker, first searches stored in the memory module There is the identity label matched with the speaker, if do not found, assign the speaker one identity label.
A kind of method that audio, video data is changed into writing record, comprises the following steps:
Step S30, preparation:Start microphone and camera, create speaker's list, create the file for preserving text Address, the wherein project of speaker's list include unique identity label, the vocal print feature data of speaker and the face of speaker Portion's characteristic;
Step S31:Capture data:When speaker starts speech, the voice of the speaker that microphones capture is currently made a speech is defeated Between entering at the beginning of the voice data of the speaker for obtaining current speech, participant's voice data of the current speech of record;Simultaneously The image of the speaker of the current speech of camera capture obtains the video data of the speaker of current speech, the current speech of record Between at the beginning of participant's video data;
Step S32:The audio of the speaker of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e Data, and carry out vocal print feature identification;Speaking for the current speech that camera is obtained is analyzed and processed using face recognition technology The video data of person, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data, If successfully recognizing vocal print feature data and/or face feature data, into step S34;If recognizing sound not successfully Line characteristic and face feature data, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature number in speaker's list According to the speaker for matching, if it is present into step S35, while the relevant information of the complete speaker of supplement;If no In the presence of, then new entry is added in speaker's list, preserve identity label, vocal print feature data and the face feature of new speaker Data, while into step S35;
Step S35:Voice data is analyzed and processed using speech recognition technology, the behaviour that voice data changes into word is carried out Make;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then Return to step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the speaker of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the voice of the speaker of date, time, matching vocal print and/or face feature data The word of identification, the mood of Expression Recognition are organized into the writing record of dialogic operation, and are stored in the file address of establishment;
Step S40:Return to step S31 is continued executing with, and the content of different speaker's speeches is preserved, until conversation procedure knot Beam.
Beneficial effects of the present invention:
The present invention is more careful intactly to preserve whole audio, video data process, closer to real situation;The present invention will Audio, video data is converted to text formatting and is preserved, and greatly reduces the cost of storage and transmission, also allows for subsequently checking note Record, can more rapidly browse and position conference content.
The present invention identifies different participants using sound groove recognition technology in e, face recognition technology;And by speech recognition skill The content of text that art and Expression Recognition technology are obtained is arranged and organized, and forms the writing record of dialogic operation.
Provides convenient user of the present invention supplement the interface of amendment record, to ensure the correctness of writing record, improves It is readable.
Brief description of the drawings
Fig. 1 is the structured flowchart of the system that audio, video data is changed into writing record proposed by the present invention;
Fig. 2 is the structural representation of the system that audio, video data is changed into writing record proposed by the present invention;
The first flow chart of Fig. 3 methods that audio, video data is changed into writing record proposed by the present invention;
Second flow chart of Fig. 4 methods that audio, video data is changed into writing record proposed by the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention become more apparent, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in more detail.Illustrated by taking video conference video recording as an example in embodiment, but people in the art Member knows that the method can be used in all video images, it is not limited to accompanying drawing and following examples.
The present invention is needed using public affairs such as speech recognition technology, sound groove recognition technology in e, face recognition technology, Expression Recognition technologies Technology is known, for obtaining necessary data.Existing each identification technology comparatively perfect, it is possible to achieve " unrelated with text Application on Voiceprint Recognition ", " face tracking ", " facial action identification ", " expression shape change identification " etc., the present invention no longer does to these technologies Detailed description and explanation.It is proposed by the present invention that the system and method that audio, video data changes into writing record are counted according to these According to audio, video data being organized into the complete and writing record of the dialogic operation of image.
Embodiment 1:
As depicted in figs. 1 and 2, it is proposed by the present invention that the system that audio, video data changes into writing record is received including data Collection part, data discrimination section, data tissue part and data filling and amendment part.
Data collection section includes microphone, images first-class data acquisition device.
Microphone is used for the voice data of the participant for capturing current speech, when participant starts speech, microphone collection The voice data of the participant of current speech, the intensity size according to the voice data for collecting judges the participant of current speech Making a speech or pausing, thinking that participant speech terminates if pausing and exceeding certain hour (such as 3s), record is current Between at the beginning of participant's voice data of speech and the end time, by the voice data of the participant of current speech together with beginning Time (can also add the end time) and the device identifier of microphone send data discrimination section to together.Wherein transmit The effect of the device identifier of microphone is to be used to distinguish different conferenced parties in Multi-Party Conference.
Camera is used for the image of the participant for capturing current speech, and when participant starts speech, camera collection is current The image of the participant of speech, judges that the participant of current speech is making a speech or pause according to the image for collecting, if Pause and then think that participant speech terminates more than certain hour (such as 3s), the starting of participant's speech of the current speech of record Time and end time, the image that will be gathered is together with time started (can also add the end time) and the equipment mark of camera Know symbol and send data discrimination section to together.The effect of the wherein device identifier of transmission camera is in order in Multi-Party Conference It is used to distinguish different conferenced parties.
Wherein, in order to ensure the time consistency that microphone and camera are recorded, microphone and camera use identical mark Between punctual.
Data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module.
Voice and voiceprint identification module receive voice data, the time started of microphones capture (at the end of can also adding Between) and microphone device identifier;The voice data for capturing is processed using speech recognition technology, by audio format Voice data change into the text information of text formatting, and using sound groove recognition technology in e at the voice data that captures Reason, is used to recognize the participant of current speech.
Face and Expression Recognition module receive the image of camera capture, the time started (end time can also be added) with And the device identifier of camera, the image for capturing is processed using Expression Recognition technology, the participant of the current speech of identification The expression of person, obtains participant mood at that time, and the image for capturing is processed using face recognition technology, is used to know The participant not made a speech not currently.
The vocal print feature data of the participant of current speech can be recognized using sound groove recognition technology in e, using recognition of face skill Art can recognize the face feature data of the participant of current speech, therefore, voice and voiceprint identification module are according to vocal print feature Data and face and Expression Recognition module can identify and distinguish between out different participants according to face feature data.In addition, The device identifier and face and Expression Recognition module setting using camera of voice and voiceprint identification module using microphone Standby identifier can identify and distinguish between out different conferenced parties.
Voice and voiceprint identification module and face and Expression Recognition module are in processing procedure to the participant that each is made a speech Person assigns unique identity label, for example:During one side's meeting, can be made with " participant A ", " participant B ", " participant C " etc. For identity label is assigned to participant;During Multi-Party Conference, the conducts such as " participant A1 ", " participant B2 ", " participant C1 " can be used Identity label is assigned to participant, and first character " A " wherein in label, " B ", " C " represent each conferenced parties, second character " 1 ", " 2 " represent certain participant in certain conferenced parties.Data tissue portion of tissue writing record for convenience, voice and Application on Voiceprint Recognition The identity label that module and face and Expression Recognition module are assigned to same participant should be identical, for example, can be according to connecing The identity of the participant for identifying voice and voiceprint identification module and face and Expression Recognition module between at the beginning of receiving Label is unified.
For the ease of searching and managing, identity label of the record comprising participant, vocal print feature data and face can be set up The tables of data of portion's characteristic, the information of the participant for recording speech, tables of data storage (is not shown in memory module in figure Go out) in.Vocal print feature data and face feature data can be obtained by sound groove recognition technology in e and face recognition technology, for example, lead to The face recognition technology of 21 point model position locations is crossed, the feature of face can be just described by these key points, accuracy can Reach 96%.
Text information, time started, the identity of participant that data tissue part obtains according to data discrimination section treatment The data such as label, the mood of participant, organize these data according to certain form, generation comprising the time started and The identity label of meeting person, the writing record of the mood of participant and text information are simultaneously preserved.
For the ease of checking, the content line feed record of different participants.
Organized formats for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from foregoing each several part;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, will can be integrated with the content on a date, record can be integrated example as the aforementioned For:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Data filling and amendment part are used to that the writing record that data tissue part preserves to be supplemented and corrected, to carry The readability of writing record high, it is ensured that the correctness of writing record.Such as provides convenient user is carried out to the writing record for preserving Supplement and amendment interface and prompting, and will supplement and corrector name, supplement and the correction time, supplement and amendment content enter Row record, facilitates consultation.
The mode and content of supplement be, for example,:Theme, the summary of problem, meeting of the input frame by user input meeting are provided The information such as conclusion, facilitate other people quickly to understand conference content;
The content of amendment is, for example, the errors in text in writing record, or replace some information, for example:Whole meeting View process has recognized 3 participant's speeches, now ejects prompting frame and prompts the user whether to need the body of " participant A, B, C " Part tag replacement is into respective really name.
Embodiment 2:
The invention allows for a kind of method that audio, video data is changed into writing record, the method flow chart such as Fig. 3 It is shown, comprise the following steps:
Step S21, Data Collection:
When participant starts speech, the voice data of the participant of the current speech of microphone collection, according to the sound for collecting The intensity size of frequency evidence judges that the participant of current speech is making a speech or pausing, if pause exceeding certain hour (example Such as 3s) if think that participant speech terminates, at the beginning of participant's voice data of the current speech of record between and the end time, By the voice data of the participant of current speech together with time started (end time can also be added) and the equipment of microphone Identifier sends data identification step to together.The effect of the wherein device identifier of transmission microphone is in order in Multi-Party Conference When be used to distinguish different conferenced parties;
Camera is used for the image of the participant for capturing current speech, and when participant starts speech, camera collection is current The image of the participant of speech, judges that the participant of current speech is making a speech or pause according to the image for collecting, if Pause and then think that participant speech terminates more than certain hour (such as 3s), the starting of participant's speech of the current speech of record Time and end time, the image that will be gathered is together with time started (can also add the end time) and the equipment mark of camera Know symbol and send data identification step to together.The effect of the wherein device identifier of transmission camera is in order in Multi-Party Conference It is used to distinguish different conferenced parties.
Wherein, in order to ensure the time consistency that microphone and camera are recorded, microphone and camera use identical mark Between punctual.
Step S22, data identification:
The equipment for receiving the voice data, time started (end time can also be added) and microphone of microphones capture Identifier;The voice data for capturing is processed using speech recognition technology, the voice data conversion of audio format is written The text information of this form, and the voice data for capturing is processed using sound groove recognition technology in e, it is used to recognize current speech Participant.
Receive the device identification of the image, time started (end time can also be added) and camera of camera capture Symbol, is processed the image for capturing using Expression Recognition technology, the expression of the participant of the current speech of identification, obtains the participant Person's mood at that time, and the image for capturing is processed using face recognition technology, it is used to recognize the participant of current speech.
The vocal print feature data of the participant of current speech can be recognized using sound groove recognition technology in e, using recognition of face skill Art can recognize the face feature data of the participant of current speech, therefore, voice and voiceprint identification module are according to vocal print feature Data and face and Expression Recognition module can identify and distinguish between out different participants according to face feature data.In addition, The device identifier and face and Expression Recognition module setting using camera of voice and voiceprint identification module using microphone Standby identifier can identify and distinguish between out different conferenced parties.
Voice and voiceprint identification module and face and Expression Recognition module are in processing procedure to the participant that each is made a speech Person assigns unique identity label, for example:During one side's meeting, can be made with " participant A ", " participant B ", " participant C " etc. For identity label is assigned to participant;During Multi-Party Conference, the conducts such as " participant A1 ", " participant B2 ", " participant C1 " can be used Identity label is assigned to participant, and first character " A " wherein in label, " B ", " C " represent each conferenced parties, second character " 1 ", " 2 " represent certain participant in certain conferenced parties.Text information, voice and voiceprint identification module and face are organized for convenience The identity label assigned to same participant with Expression Recognition module should be identical, for example, at the beginning of can be according to receiving Between the identity label of participant that identifies voice and voiceprint identification module and face and Expression Recognition module unified.
For the ease of searching and managing, identity label of the record comprising participant, vocal print feature data and face can be set up The tables of data of portion's characteristic, the information of the participant for recording speech, it is ensured that the identity label of same participant is unique, and And ensure that vocal print feature data and face feature data are corresponding with the identity label of corresponding participant consistent.Vocal print feature data Can be obtained by sound groove recognition technology in e and face recognition technology with face feature data, such as by 21 point model position locations Face recognition technology, the feature of face can just be described by these key points, accuracy can reach 96%.
Step S23, data tissue:
According to text information, time started, the identity label of participant, participant that the treatment of data identification step is obtained The data such as mood, organize these data according to certain form, and generation includes time started, the identity mark of participant The writing record of label, the mood of participant and text information is simultaneously preserved in the form of a dialog.
For the ease of checking, the content line feed record of different participants.
Organized formats for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from These steps;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, memory space is saved, will can be integrated with the content on a date, example is as the aforementioned Record can be integrated into:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Step S24, data filling and amendment
The writing record that data organising step is preserved is supplemented and corrected, to improve the readability of writing record, is protected Demonstrate,prove the correctness of writing record.Interface and carry that such as provides convenient user is supplemented and corrected to the writing record for preserving Show, and will supplement and corrector name, supplement and the correction time, supplement and amendment content recorded, facilitate consultation.
The mode and content of supplement be, for example,:Theme, the summary of problem, meeting of the input frame by user input meeting are provided The information such as conclusion, facilitate other people quickly to understand conference content;
The content of amendment is, for example, the errors in text in writing record, or replace some information, for example:Whole meeting View process has recognized 3 participant's speeches, now ejects prompting frame and prompts the user whether to need the body of " participant A, B, C " Part tag replacement is into respective really name.
Embodiment 3:
The invention allows for a kind of method that audio, video data is changed into writing record, the method flow chart such as Fig. 4 It is shown, comprise the following steps:
Step S30, preparation:
Start microphone and camera, create participant's list, create the file address for preserving text, wherein participant's row Table includes unique identity label of participant, also vocal print feature data and face feature including the participant subsequently to be gathered Data;
Each participant is endowed unique identity label, for example:During one side's meeting, " participant A ", " participant can be used Person B ", " participant C " etc. are assigned to participant as identity label;During Multi-Party Conference, " participant A1 ", " participant can be used B2 ", " participant C1 " etc. are assigned to participant as identity label, and first character " A ", " B ", " C " are represented respectively wherein in label Conferenced parties, second character " 1 ", " 2 " represent certain participant in certain conferenced parties.
Step S31:When participant starts speech, the phonetic entry of the participant that microphones capture is currently made a speech obtains current Between at the beginning of the voice data of the participant of speech, participant's voice data of the current speech of record;Camera capture simultaneously The image of the participant of current speech obtains the video data of the participant of current speech, participant's video of the current speech of record Between at the beginning of data;
Step S32:The audio of the participant of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e Data, and carry out vocal print feature identification;The participant of the current speech that camera is obtained is analyzed and processed using face recognition technology The video data of person, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data, If successfully recognizing vocal print feature data and/or face feature data, into step S34;If recognizing sound not successfully Line characteristic and face feature data, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature number in participant's list According to the participant for matching, if it is present into step S35, while the relevant information of the complete participant of supplement is (if i.e. Vocal print feature data are only existed in participant's list without face feature data, then supplements face feature data;If participant Face feature data are only existed in person's list without vocal print feature data, then supplements vocal print feature data);If it does not, New entry is then added in participant's list, identity label, vocal print feature data and the face feature data of new participant are preserved, Enter step S35 simultaneously;
Step S35:Voice data is analyzed and processed using speech recognition technology, the behaviour that voice data changes into word is carried out Make;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then Return to step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the participant of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the voice of the participant of date, time, matching vocal print and/or face feature data The word of identification, the mood of Expression Recognition are organized into the writing record of dialogic operation, and are stored in the file address of establishment;
The form for example can be:
【Date】
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
【Hour Minute Second】【Participant】(【Mood】):“【Text】”
……
In above-mentioned organized formats, symbol "【】" represent the content obtained from These steps;
It is that the content is optional content that symbol " () " is represented, has data just to add;
Symbol ":" represent the content for being followed by speech.
For example:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
On November 15th, 2016
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
In order to simplify record content, memory space is saved, will can be integrated with the content on a date, example is as the aforementioned Record can be integrated into:
On November 15th, 2016
09:24:12 participant A:" our first call for Votes a "
09:24:16 participant B (anger):" problem a was discussed, it should call for Votes b "
……
Step S40:Return to step S31 is continued executing with, and the content of different participant's speeches is preserved, until whole meeting knot Beam.
More than, embodiments of the present invention are illustrated.But, the present invention is not limited to above-mentioned implementation method.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in guarantor of the invention Within the scope of shield.

Claims (10)

1. a kind of system that audio, video data is changed into writing record, it is characterised in that know including data collection section, data Not part and data tissue part;
The data collection section includes audio collection module and video acquisition module;
The audio collection module is used to capturing the voice data of current speaker, and between recording at the beginning of it is spoken;
The video acquisition module is used to capturing the image of current speaker, and between recording at the beginning of it is spoken;
The data discrimination section includes voice and voiceprint identification module and face and Expression Recognition module;
The voice and voiceprint identification module are processed the voice data that the audio collection module is captured, and are converted it into The text information of text formatting, and the voice data that the audio collection module is captured is processed, it is used to recognize and currently says Words person, and assign an identity label to each speaker;
At the image that the face and Expression Recognition module are captured using Expression Recognition technology to the video acquisition module Reason, recognizes the expression of current speaker, obtains the mood of the speaker, and the image that the video acquisition module is captured is carried out Treatment, is used to recognize current speaker, and assign an identity label to each speaker;
The data tissue part according to the text information, identification at the beginning of between, the identity label of current speaker and The mood of current speaker, generates writing record.
2. system according to claim 1, it is characterised in that the writing record is the writing record of dialogic operation.
3. system according to claim 1 and 2, it is characterised in that the system also includes data filling and amendment part, The data filling and amendment part are used to that the writing record to be supplemented and corrected.
4. system according to claim 1 and 2, it is characterised in that the audio collection module is microphone, and/or, institute Video acquisition module is stated for camera.
5. system according to claim 1 and 2, it is characterised in that the system also includes memory module, the storage mould Block is stored with and records the tables of data of the identity label comprising speaker, vocal print feature data and face feature data.
6. a kind of method that audio, video data is changed into writing record, it is characterised in that comprise the following steps:
Step S21, Data Collection:The voice data of current speaker and the image of current speaker are captured, and records it and spoken At the beginning of between;
Step S22, data identification:Voice data to capturing is processed, and converts it into the text information of text formatting;It is right The image of capture is processed, and recognizes the expression of current speaker, obtains the mood of the speaker;To the voice data for capturing And/or the image to capturing is processed, it is used to recognize current speaker, and for each speaker assigns an identity label;
Step 23, data tissue:The data tissue part according to the text information, identification at the beginning of between, currently speak The identity label of person and the mood of current speaker, generate writing record.
7. method according to claim 6, it is characterised in that the writing record is the writing record of dialogic operation.
8. the method according to claim 6 or 7, it is characterised in that methods described also include step S24, data filling and Amendment:The writing record is supplemented and corrected.
9. the method according to claim 6 or 7, it is characterised in that the vocal print feature number of the identity label and speaker According to and/or face feature data store in association in a storage module, to each speaker assign identity label before, first look into Whether the identity label for being stored with and having been matched with the speaker is looked in the memory module, if do not found, assigning should One identity label of speaker.
10. a kind of method that audio, video data is changed into writing record, it is characterised in that comprise the following steps:
Step S30, preparation:Start microphone and camera, create speaker's list, create the file ground for preserving text Location, the wherein project of speaker's list include unique identity label, the vocal print feature data of speaker and the face of speaker Characteristic;
Step S31:Capture data:When speaker starts speech, the phonetic entry of the speaker that microphones capture is currently made a speech is obtained Between obtaining at the beginning of the voice data of the speaker for currently making a speech, participant's voice data of the current speech of record;Image simultaneously The image of the speaker of the current speech of head capture obtains the video data of the speaker of current speech, the participant of the current speech of record Between at the beginning of person's video data;
Step S32:The voice data of the speaker of the current speech that microphone is obtained is analyzed and processed using sound groove recognition technology in e, And carry out vocal print feature identification;The speaker's of the current speech obtained using face recognition technology analysis and treatment camera is regarded Frequency evidence, and carry out face feature identification;
Step S33:Judge whether successfully to recognize vocal print feature data and whether successfully recognize face feature data, if Vocal print feature data and/or face feature data are successfully recognized, then into step S34;If failed recognize vocal print spy Data and face feature data are levied, then into step S35;
Step S34:Judge whether existed and the vocal print feature data and/or the face feature data phase in speaker's list The speaker of matching, if it is present into step S35, while the relevant information of the complete speaker of supplement;If do not deposited , then new entry is added in speaker's list, preserve identity label, vocal print feature data and the face feature number of new speaker According to while into step S35;
Step S35:Voice data is analyzed and processed using speech recognition technology, the operation that voice data changes into word is carried out;
Step S36:Whether successful conversion is into word for audio data, if it is, into step S37;If it is not, then returning Step S35 proceeds conversion operation;
Step S37:Video data is processed using Expression Recognition technical Analysis, the mood of the speaker of current speech is obtained;
Step S38:Obtain current date and time;
Step S39:By identity label, the speech recognition of the speaker of date, time, matching vocal print and/or face feature data Word, the mood of Expression Recognition is organized into the writing record of dialogic operation, and is stored in the file address of establishment;
Step S40:Return to step S31 is continued executing with, and the content of different speaker's speeches is preserved, until conversation procedure terminates.
CN201611170040.6A 2016-12-16 2016-12-16 A kind of system and method that audio, video data is converted to writing record Active CN106782545B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201611170040.6A CN106782545B (en) 2016-12-16 2016-12-16 A kind of system and method that audio, video data is converted to writing record
PCT/CN2017/077035 WO2018107605A1 (en) 2016-12-16 2017-03-17 System and method for converting audio/video data into written records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611170040.6A CN106782545B (en) 2016-12-16 2016-12-16 A kind of system and method that audio, video data is converted to writing record

Publications (2)

Publication Number Publication Date
CN106782545A true CN106782545A (en) 2017-05-31
CN106782545B CN106782545B (en) 2019-07-16

Family

ID=58892306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611170040.6A Active CN106782545B (en) 2016-12-16 2016-12-16 A kind of system and method that audio, video data is converted to writing record

Country Status (2)

Country Link
CN (1) CN106782545B (en)
WO (1) WO2018107605A1 (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609045A (en) * 2017-08-17 2018-01-19 深圳壹秘科技有限公司 A kind of minutes generating means and its method
CN107689225A (en) * 2017-09-29 2018-02-13 福建实达电脑设备有限公司 A kind of method for automatically generating minutes
CN107818785A (en) * 2017-09-26 2018-03-20 平安普惠企业管理有限公司 A kind of method and terminal device that information is extracted from multimedia file
CN107885723A (en) * 2017-11-03 2018-04-06 广州杰赛科技股份有限公司 Conversational character differentiating method and system
CN107918771A (en) * 2017-12-07 2018-04-17 河北工业大学 Character recognition method and Worn type person recognition system
CN108062359A (en) * 2017-11-28 2018-05-22 歌尔科技有限公司 A kind of robot interview and the method contributed to newspapers, magazines,etc. and interview and the robot contributed to newspapers, magazines,etc.
CN108256513A (en) * 2018-03-23 2018-07-06 中国科学院长春光学精密机械与物理研究所 A kind of intelligent video analysis method and intelligent video record system
CN108537508A (en) * 2018-03-30 2018-09-14 上海爱优威软件开发有限公司 Minutes method and system
CN108573033A (en) * 2018-03-27 2018-09-25 中国科学院长春光学精密机械与物理研究所 Cyborg network of vein method for building up based on recognition of face and relevant device
CN108712624A (en) * 2018-08-08 2018-10-26 上海启诺信息科技有限公司 Video recording archive devices based on writing record and method
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN108810446A (en) * 2018-06-07 2018-11-13 北京智能管家科技有限公司 A kind of label generating method of video conference, device, equipment and medium
CN108986825A (en) * 2018-07-02 2018-12-11 北京百度网讯科技有限公司 Context acquisition methods and equipment based on interactive voice
CN109118626A (en) * 2018-08-08 2019-01-01 腾讯科技(深圳)有限公司 Control method, device, storage medium and the electronic device of lockset
CN109150556A (en) * 2018-07-31 2019-01-04 何镝 More people's teleconferences based on speech recognition record system
CN109285548A (en) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 Information processing method, system, electronic equipment and computer storage medium
CN109325737A (en) * 2018-09-17 2019-02-12 态度国际咨询管理(深圳)有限公司 A kind of enterprise intelligent virtual assistant system and its method
CN109361825A (en) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 Meeting summary recording method, terminal and computer storage medium
CN109788232A (en) * 2018-12-18 2019-05-21 视联动力信息技术股份有限公司 A kind of summary of meeting recording method of video conference, device and system
CN109817245A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Generation method, device, computer equipment and the storage medium of meeting summary
CN109817221A (en) * 2018-12-26 2019-05-28 出门问问信息科技有限公司 More people's video methods, device, equipment and storage medium
CN109920428A (en) * 2017-12-12 2019-06-21 杭州海康威视数字技术股份有限公司 A kind of notes input method, device, electronic equipment and storage medium
CN109979458A (en) * 2019-01-17 2019-07-05 平安科技(深圳)有限公司 News interview original text automatic generation method and relevant device based on artificial intelligence
CN110019286A (en) * 2017-07-19 2019-07-16 ***通信有限公司研究院 A kind of expression recommended method and device based on user social contact relationship
CN110139062A (en) * 2019-05-09 2019-08-16 平安科技(深圳)有限公司 A kind of creation method, device and the terminal device of video conference record
CN110196914A (en) * 2019-07-29 2019-09-03 上海肇观电子科技有限公司 A kind of method and apparatus by face information input database
CN110232925A (en) * 2019-06-28 2019-09-13 百度在线网络技术(北京)有限公司 Generate the method, apparatus and conference terminal of minutes
CN110232553A (en) * 2018-03-05 2019-09-13 柯尼卡美能达株式会社 Meeting support system and computer-readable recording medium
CN110278398A (en) * 2018-03-16 2019-09-24 杭州海康威视数字技术股份有限公司 Date storage method, video clip searching method, device and electronic equipment
CN110297907A (en) * 2019-06-28 2019-10-01 谭浩 Generate method, computer readable storage medium and the terminal device of interview report
CN110378226A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium based on long-range court's trial
WO2019227579A1 (en) * 2018-05-29 2019-12-05 平安科技(深圳)有限公司 Conference information recording method and apparatus, computer device, and storage medium
CN110557678A (en) * 2018-05-31 2019-12-10 北京百度网讯科技有限公司 Video processing method, device and equipment
CN110661923A (en) * 2018-06-28 2020-01-07 视联动力信息技术股份有限公司 Method and device for recording speech information in conference
CN110674351A (en) * 2019-10-09 2020-01-10 常州市小先信息技术有限公司 Audio digital label retrieval system and method for monitoring system
CN110895575A (en) * 2018-08-24 2020-03-20 阿里巴巴集团控股有限公司 Audio processing method and device
CN111221987A (en) * 2019-12-30 2020-06-02 秒针信息技术有限公司 Hybrid audio tagging method and apparatus
CN111291179A (en) * 2018-12-06 2020-06-16 北京嘀嘀无限科技发展有限公司 Conversation classification method and device, electronic equipment and storage medium
CN111369384A (en) * 2019-12-23 2020-07-03 国网河南省电力公司郑州供电公司 Power transformation operation and maintenance hidden danger overall process control system
WO2020154916A1 (en) * 2019-01-29 2020-08-06 深圳市欢太科技有限公司 Video subtitle synthesis method and apparatus, storage medium, and electronic device
CN111708912A (en) * 2020-05-06 2020-09-25 深圳震有科技股份有限公司 Video conference record query processing method and device
CN111916053A (en) * 2020-08-17 2020-11-10 北京字节跳动网络技术有限公司 Voice generation method, device, equipment and computer readable medium
CN111938674A (en) * 2020-09-07 2020-11-17 南京宇乂科技有限公司 Emotion recognition control system for conversation
CN112037791A (en) * 2020-08-12 2020-12-04 广东电力信息科技有限公司 Conference summary transcription method, apparatus and storage medium
CN112584078A (en) * 2019-09-27 2021-03-30 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium
CN112669847A (en) * 2020-12-03 2021-04-16 安徽宝信信息科技有限公司 Intelligent screen capable of being used for automatic editing and sorting of conference records
WO2022016994A1 (en) * 2020-07-23 2022-01-27 平安科技(深圳)有限公司 Ai recognition-based meeting minutes generation method and apparatus, device and medium
CN114023351A (en) * 2021-12-17 2022-02-08 广东讯飞启明科技发展有限公司 Speech enhancement method and system based on noisy environment
WO2022037383A1 (en) * 2020-08-17 2022-02-24 北京字节跳动网络技术有限公司 Voice processing method and apparatus, electronic device, and computer readable medium
WO2022062471A1 (en) * 2020-09-25 2022-03-31 华为技术有限公司 Audio data processing method, device and system
TWI764020B (en) * 2019-07-24 2022-05-11 圓展科技股份有限公司 Video conference system and method thereof
CN114630069A (en) * 2022-02-22 2022-06-14 东南大学 Audio and video optimization method, audio terminal and storage medium
CN114745213A (en) * 2022-04-11 2022-07-12 深信服科技股份有限公司 Conference record generation method and device, electronic equipment and storage medium
WO2022179253A1 (en) * 2021-02-26 2022-09-01 华为技术有限公司 Speech operation method for device, apparatus, and electronic device
CN117577115A (en) * 2024-01-15 2024-02-20 杭州讯意迪科技有限公司 Intelligent paperless conference system

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920639B (en) * 2018-07-02 2022-01-18 北京百度网讯科技有限公司 Context obtaining method and device based on voice interaction
CN110838291B (en) * 2018-08-16 2024-06-18 北京搜狗科技发展有限公司 Input method and device and electronic equipment
CN109525800A (en) * 2018-11-08 2019-03-26 江西国泰利民信息科技有限公司 A kind of teleconference voice recognition data transmission method
CN110853646B (en) * 2019-11-20 2023-11-03 深圳前海微众银行股份有限公司 Conference speaking role distinguishing method, device, equipment and readable storage medium
CN111785271A (en) * 2020-05-15 2020-10-16 深圳奥尼电子股份有限公司 Voice recognition processing method and system and electronic equipment
CN111860523B (en) * 2020-07-28 2024-04-30 上海兑观信息科技技术有限公司 Intelligent recording system and method for sound image files
CN111899743A (en) * 2020-07-31 2020-11-06 斑马网络技术有限公司 Method and device for acquiring target sound, electronic equipment and storage medium
CN111860915A (en) * 2020-07-31 2020-10-30 贵州东冠科技有限公司 Prison management-based family affection assisting system
CN112185391A (en) * 2020-09-30 2021-01-05 深圳供电局有限公司 Automatic modification processing method for customer service record
CN114299953B (en) * 2021-12-29 2022-08-23 湖北微模式科技发展有限公司 Speaker role distinguishing method and system combining mouth movement analysis
CN116757646B (en) * 2023-08-15 2023-11-10 成都市青羊大数据有限责任公司 Comprehensive management system for teaching

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298694A (en) * 2011-06-21 2011-12-28 广东爱科数字科技有限公司 Man-machine interaction identification system applied to remote information service
CN103903074A (en) * 2012-12-24 2014-07-02 华为技术有限公司 Information processing method and information processing device for video communication
US20150235654A1 (en) * 2011-06-17 2015-08-20 At&T Intellectual Property I, L.P. Speaker association with a visual representation of spoken content
CN104932665A (en) * 2014-03-19 2015-09-23 联想(北京)有限公司 Information processing method and electronic device
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device
CN106033339A (en) * 2015-03-13 2016-10-19 联想(北京)有限公司 Information processing method and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150235654A1 (en) * 2011-06-17 2015-08-20 At&T Intellectual Property I, L.P. Speaker association with a visual representation of spoken content
CN102298694A (en) * 2011-06-21 2011-12-28 广东爱科数字科技有限公司 Man-machine interaction identification system applied to remote information service
CN103903074A (en) * 2012-12-24 2014-07-02 华为技术有限公司 Information processing method and information processing device for video communication
CN104932665A (en) * 2014-03-19 2015-09-23 联想(北京)有限公司 Information processing method and electronic device
CN106033339A (en) * 2015-03-13 2016-10-19 联想(北京)有限公司 Information processing method and electronic device
CN105512348A (en) * 2016-01-28 2016-04-20 北京旷视科技有限公司 Method and device for processing videos and related audios and retrieving method and device

Cited By (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11664030B2 (en) 2017-07-19 2023-05-30 Alibaba Group Holding Limited Information processing method, system, electronic device, and computer storage medium
CN109285548A (en) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 Information processing method, system, electronic equipment and computer storage medium
CN110019286A (en) * 2017-07-19 2019-07-16 ***通信有限公司研究院 A kind of expression recommended method and device based on user social contact relationship
CN110019286B (en) * 2017-07-19 2021-10-29 ***通信有限公司研究院 Expression recommendation method and device based on user social relationship
CN107609045A (en) * 2017-08-17 2018-01-19 深圳壹秘科技有限公司 A kind of minutes generating means and its method
CN107818785A (en) * 2017-09-26 2018-03-20 平安普惠企业管理有限公司 A kind of method and terminal device that information is extracted from multimedia file
CN107689225A (en) * 2017-09-29 2018-02-13 福建实达电脑设备有限公司 A kind of method for automatically generating minutes
CN107885723B (en) * 2017-11-03 2021-04-09 广州杰赛科技股份有限公司 Conversation role distinguishing method and system
CN107885723A (en) * 2017-11-03 2018-04-06 广州杰赛科技股份有限公司 Conversational character differentiating method and system
CN108062359A (en) * 2017-11-28 2018-05-22 歌尔科技有限公司 A kind of robot interview and the method contributed to newspapers, magazines,etc. and interview and the robot contributed to newspapers, magazines,etc.
CN108062359B (en) * 2017-11-28 2021-03-16 歌尔科技有限公司 Method for robot interview and draft writing and robot interview and draft writing
CN107918771A (en) * 2017-12-07 2018-04-17 河北工业大学 Character recognition method and Worn type person recognition system
CN107918771B (en) * 2017-12-07 2023-11-24 河北工业大学 Person identification method and wearable person identification system
CN109920428A (en) * 2017-12-12 2019-06-21 杭州海康威视数字技术股份有限公司 A kind of notes input method, device, electronic equipment and storage medium
CN110232553A (en) * 2018-03-05 2019-09-13 柯尼卡美能达株式会社 Meeting support system and computer-readable recording medium
CN110278398B (en) * 2018-03-16 2022-01-21 杭州海康威视数字技术股份有限公司 Data storage method, video clip searching method, device and electronic equipment
CN110278398A (en) * 2018-03-16 2019-09-24 杭州海康威视数字技术股份有限公司 Date storage method, video clip searching method, device and electronic equipment
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN108256513A (en) * 2018-03-23 2018-07-06 中国科学院长春光学精密机械与物理研究所 A kind of intelligent video analysis method and intelligent video record system
CN108573033A (en) * 2018-03-27 2018-09-25 中国科学院长春光学精密机械与物理研究所 Cyborg network of vein method for building up based on recognition of face and relevant device
CN108537508A (en) * 2018-03-30 2018-09-14 上海爱优威软件开发有限公司 Minutes method and system
WO2019227579A1 (en) * 2018-05-29 2019-12-05 平安科技(深圳)有限公司 Conference information recording method and apparatus, computer device, and storage medium
CN110557678A (en) * 2018-05-31 2019-12-10 北京百度网讯科技有限公司 Video processing method, device and equipment
CN110557678B (en) * 2018-05-31 2022-05-03 北京百度网讯科技有限公司 Video processing method, device and equipment
CN108810446A (en) * 2018-06-07 2018-11-13 北京智能管家科技有限公司 A kind of label generating method of video conference, device, equipment and medium
CN110661923A (en) * 2018-06-28 2020-01-07 视联动力信息技术股份有限公司 Method and device for recording speech information in conference
CN108986825A (en) * 2018-07-02 2018-12-11 北京百度网讯科技有限公司 Context acquisition methods and equipment based on interactive voice
CN109150556A (en) * 2018-07-31 2019-01-04 何镝 More people's teleconferences based on speech recognition record system
CN108712624A (en) * 2018-08-08 2018-10-26 上海启诺信息科技有限公司 Video recording archive devices based on writing record and method
CN109118626A (en) * 2018-08-08 2019-01-01 腾讯科技(深圳)有限公司 Control method, device, storage medium and the electronic device of lockset
CN110895575B (en) * 2018-08-24 2023-06-23 阿里巴巴集团控股有限公司 Audio processing method and device
CN110895575A (en) * 2018-08-24 2020-03-20 阿里巴巴集团控股有限公司 Audio processing method and device
CN109325737A (en) * 2018-09-17 2019-02-12 态度国际咨询管理(深圳)有限公司 A kind of enterprise intelligent virtual assistant system and its method
CN109361825A (en) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 Meeting summary recording method, terminal and computer storage medium
CN111291179A (en) * 2018-12-06 2020-06-16 北京嘀嘀无限科技发展有限公司 Conversation classification method and device, electronic equipment and storage medium
CN111291179B (en) * 2018-12-06 2023-12-08 北京嘀嘀无限科技发展有限公司 Dialogue classification method and device, electronic equipment and storage medium
CN109788232A (en) * 2018-12-18 2019-05-21 视联动力信息技术股份有限公司 A kind of summary of meeting recording method of video conference, device and system
CN109817221B (en) * 2018-12-26 2021-01-01 出门问问信息科技有限公司 Multi-person video method, device, equipment and storage medium
CN109817221A (en) * 2018-12-26 2019-05-28 出门问问信息科技有限公司 More people's video methods, device, equipment and storage medium
CN109979458A (en) * 2019-01-17 2019-07-05 平安科技(深圳)有限公司 News interview original text automatic generation method and relevant device based on artificial intelligence
CN109817245A (en) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 Generation method, device, computer equipment and the storage medium of meeting summary
WO2020154916A1 (en) * 2019-01-29 2020-08-06 深圳市欢太科技有限公司 Video subtitle synthesis method and apparatus, storage medium, and electronic device
CN110139062A (en) * 2019-05-09 2019-08-16 平安科技(深圳)有限公司 A kind of creation method, device and the terminal device of video conference record
CN110139062B (en) * 2019-05-09 2022-10-18 平安科技(深圳)有限公司 Video conference record creating method and device and terminal equipment
CN110378226A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium based on long-range court's trial
CN110232925A (en) * 2019-06-28 2019-09-13 百度在线网络技术(北京)有限公司 Generate the method, apparatus and conference terminal of minutes
CN110297907A (en) * 2019-06-28 2019-10-01 谭浩 Generate method, computer readable storage medium and the terminal device of interview report
TWI764020B (en) * 2019-07-24 2022-05-11 圓展科技股份有限公司 Video conference system and method thereof
CN110196914A (en) * 2019-07-29 2019-09-03 上海肇观电子科技有限公司 A kind of method and apparatus by face information input database
CN110196914B (en) * 2019-07-29 2019-12-27 上海肇观电子科技有限公司 Method and device for inputting face information into database
US10922570B1 (en) 2019-07-29 2021-02-16 NextVPU (Shanghai) Co., Ltd. Entering of human face information into database
CN112584078A (en) * 2019-09-27 2021-03-30 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium
CN112584078B (en) * 2019-09-27 2022-03-18 深圳市万普拉斯科技有限公司 Video call method, video call device, computer equipment and storage medium
CN110674351A (en) * 2019-10-09 2020-01-10 常州市小先信息技术有限公司 Audio digital label retrieval system and method for monitoring system
CN111369384A (en) * 2019-12-23 2020-07-03 国网河南省电力公司郑州供电公司 Power transformation operation and maintenance hidden danger overall process control system
CN111221987A (en) * 2019-12-30 2020-06-02 秒针信息技术有限公司 Hybrid audio tagging method and apparatus
CN111708912A (en) * 2020-05-06 2020-09-25 深圳震有科技股份有限公司 Video conference record query processing method and device
WO2022016994A1 (en) * 2020-07-23 2022-01-27 平安科技(深圳)有限公司 Ai recognition-based meeting minutes generation method and apparatus, device and medium
CN112037791A (en) * 2020-08-12 2020-12-04 广东电力信息科技有限公司 Conference summary transcription method, apparatus and storage medium
WO2022037383A1 (en) * 2020-08-17 2022-02-24 北京字节跳动网络技术有限公司 Voice processing method and apparatus, electronic device, and computer readable medium
CN111916053A (en) * 2020-08-17 2020-11-10 北京字节跳动网络技术有限公司 Voice generation method, device, equipment and computer readable medium
CN111938674A (en) * 2020-09-07 2020-11-17 南京宇乂科技有限公司 Emotion recognition control system for conversation
WO2022062471A1 (en) * 2020-09-25 2022-03-31 华为技术有限公司 Audio data processing method, device and system
CN112669847A (en) * 2020-12-03 2021-04-16 安徽宝信信息科技有限公司 Intelligent screen capable of being used for automatic editing and sorting of conference records
WO2022179253A1 (en) * 2021-02-26 2022-09-01 华为技术有限公司 Speech operation method for device, apparatus, and electronic device
CN114023351A (en) * 2021-12-17 2022-02-08 广东讯飞启明科技发展有限公司 Speech enhancement method and system based on noisy environment
CN114630069A (en) * 2022-02-22 2022-06-14 东南大学 Audio and video optimization method, audio terminal and storage medium
CN114745213A (en) * 2022-04-11 2022-07-12 深信服科技股份有限公司 Conference record generation method and device, electronic equipment and storage medium
CN114745213B (en) * 2022-04-11 2024-05-28 深信服科技股份有限公司 Conference record generation method and device, electronic equipment and storage medium
CN117577115A (en) * 2024-01-15 2024-02-20 杭州讯意迪科技有限公司 Intelligent paperless conference system
CN117577115B (en) * 2024-01-15 2024-03-29 杭州讯意迪科技有限公司 Intelligent paperless conference system

Also Published As

Publication number Publication date
WO2018107605A1 (en) 2018-06-21
CN106782545B (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN106782545B (en) A kind of system and method that audio, video data is converted to writing record
CN106657865B (en) Conference summary generation method and device and video conference system
US11699456B2 (en) Automated transcript generation from multi-channel audio
CN108346034B (en) Intelligent conference management method and system
US10034028B2 (en) Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs
US6687671B2 (en) Method and apparatus for automatic collection and summarization of meeting information
CN107527623B (en) Screen transmission method and device, electronic equipment and computer readable storage medium
US8315866B2 (en) Generating representations of group interactions
CN107911646A (en) The method and device of minutes is shared, is generated in a kind of meeting
CN109003608A (en) Court's trial control method, system, computer equipment and storage medium
CN107333090B (en) Video conference data processing method and platform
CN102207844A (en) Information processing device, information processing method and program
CN111415128B (en) Method, system, device, equipment and medium for controlling conference
CN106157957A (en) Audio recognition method, device and subscriber equipment
KR102462219B1 (en) Method of Automatically Generating Meeting Minutes Using Speaker Diarization Technology
US8615153B2 (en) Multi-media data editing system, method and electronic device using same
US20220013127A1 (en) Electronic Speech to Text Court Reporting System For Generating Quick and Accurate Transcripts
CN113365109A (en) Method and device for generating video subtitles, electronic equipment and storage medium
KR102346668B1 (en) apparatus for interpreting conference
CN103594086B (en) Speech processing system, device and method
CN113611308B (en) Voice recognition method, device, system, server and storage medium
CN211788155U (en) Intelligent conference recording system
KR101783872B1 (en) Video Search System and Method thereof
JP2019176375A (en) Moving image output apparatus, moving image output method, and moving image output program
KR102291113B1 (en) Apparatus and method for producing conference record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant