CN114762039A - 一种会议数据处理方法及相关设备 - Google Patents
一种会议数据处理方法及相关设备 Download PDFInfo
- Publication number
- CN114762039A CN114762039A CN201980102782.0A CN201980102782A CN114762039A CN 114762039 A CN114762039 A CN 114762039A CN 201980102782 A CN201980102782 A CN 201980102782A CN 114762039 A CN114762039 A CN 114762039A
- Authority
- CN
- China
- Prior art keywords
- audio
- conference
- voiceprint
- additional information
- clip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 230000010365 information processing Effects 0.000 claims abstract description 329
- 238000000034 method Methods 0.000 claims abstract description 159
- 230000011218 segmentation Effects 0.000 claims description 40
- 238000000926 separation method Methods 0.000 claims description 39
- 238000004891 communication Methods 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 19
- 230000001815 facial effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 76
- 238000010586 diagram Methods 0.000 description 16
- 238000012790 confirmation Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 6
- 230000004807 localization Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/41—Electronic components, circuits, software, systems or apparatus used in telephone systems using speaker recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/55—Aspects of automatic or semi-automatic exchanges related to network data storage and management
- H04M2203/552—Call annotations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/60—Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
- H04M2203/6054—Biometric subscriber identification
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
本发明实施例提供一种会议数据处理方法及相关设备,该方法应用于会议***,该方法包括:会议终端在会议进行的过程中根据声源方位采集第一会场的音频片段;生成采集的多个音频片段各自对应的第一附加信息;向会议信息处理设备发送会议过程中录制的会议音频和多个音频片段对应的第一附加信息,会议音频被会议信息处理设备分割成多个音频片段并附有相应的第二附加信息,其中,每个音频片段对应的第二附加信息包括用于确定音频片段的发言人身份的信息和相应的音频片段的标识信息;第一附加信息和第二附加信息被会议信息处理设备用于生成参会人员与发言的对应关系。采用本申请实施例,能够更准确地确定会议过程中的发言与发言者的对应关系。
Description
PCT国内申请,说明书已公开。
Claims (45)
- PCT国内申请,权利要求书已公开。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/130978 WO2021134720A1 (zh) | 2019-12-31 | 2019-12-31 | 一种会议数据处理方法及相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114762039A true CN114762039A (zh) | 2022-07-15 |
Family
ID=76686340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201980102782.0A Pending CN114762039A (zh) | 2019-12-31 | 2019-12-31 | 一种会议数据处理方法及相关设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220335949A1 (zh) |
EP (1) | EP4068282A4 (zh) |
CN (1) | CN114762039A (zh) |
WO (1) | WO2021134720A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022062874A (ja) * | 2020-10-09 | 2022-04-21 | ヤマハ株式会社 | 話者予測方法、話者予測装置、およびコミュニケーションシステム |
CN115396627A (zh) * | 2022-08-24 | 2022-11-25 | 易讯科技股份有限公司 | 一种录屏视频会议的定位管理方法及*** |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7117157B1 (en) * | 1999-03-26 | 2006-10-03 | Canon Kabushiki Kaisha | Processing apparatus for determining which person in a group is speaking |
CN102572372B (zh) * | 2011-12-28 | 2018-10-16 | 中兴通讯股份有限公司 | 会议纪要的提取方法和装置 |
CN102968991B (zh) * | 2012-11-29 | 2015-01-21 | 华为技术有限公司 | 一种语音会议纪要的分类方法、设备和*** |
US20190303879A1 (en) * | 2018-04-02 | 2019-10-03 | Ca, Inc. | Meeting recording software |
US10867610B2 (en) * | 2018-05-04 | 2020-12-15 | Microsoft Technology Licensing, Llc | Computerized intelligent assistant for conferences |
CN108922538B (zh) * | 2018-05-29 | 2023-04-07 | 平安科技(深圳)有限公司 | 会议信息记录方法、装置、计算机设备及存储介质 |
CN109388701A (zh) * | 2018-08-17 | 2019-02-26 | 深圳壹账通智能科技有限公司 | 会议记录生成方法、装置、设备和计算机存储介质 |
CN110232925A (zh) * | 2019-06-28 | 2019-09-13 | 百度在线网络技术(北京)有限公司 | 生成会议记录的方法、装置和会议终端 |
-
2019
- 2019-12-31 WO PCT/CN2019/130978 patent/WO2021134720A1/zh unknown
- 2019-12-31 CN CN201980102782.0A patent/CN114762039A/zh active Pending
- 2019-12-31 EP EP19958182.8A patent/EP4068282A4/en active Pending
-
2022
- 2022-06-29 US US17/852,800 patent/US20220335949A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
EP4068282A1 (en) | 2022-10-05 |
EP4068282A4 (en) | 2022-11-30 |
US20220335949A1 (en) | 2022-10-20 |
WO2021134720A1 (zh) | 2021-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3963576B1 (en) | Speaker attributed transcript generation | |
JP2022532313A (ja) | 分散システムにおいてユーザの好みに最適化するためのカスタマイズされた出力 | |
US9064160B2 (en) | Meeting room participant recogniser | |
US10923139B2 (en) | Systems and methods for processing meeting information obtained from multiple sources | |
US20190215464A1 (en) | Systems and methods for decomposing a video stream into face streams | |
US11138980B2 (en) | Processing overlapping speech from distributed devices | |
KR101636716B1 (ko) | 발언자를 구별하는 영상 회의 장치 및 방법 | |
JP6999734B2 (ja) | オーディオビジュアルデータに基づく話者ダイアライゼーション方法および装置 | |
US10812921B1 (en) | Audio stream processing for distributed device meeting | |
WO2019184650A1 (zh) | 字幕生成方法及终端 | |
JP2007528031A (ja) | 音声および映像ソースデータを分離および評価する技術 | |
US20220335949A1 (en) | Conference Data Processing Method and Related Device | |
CN111883168A (zh) | 一种语音处理方法及装置 | |
JP2005055668A (ja) | 音声処理装置 | |
CN110196914B (zh) | 一种将人脸信息录入数据库的方法和装置 | |
US20210174791A1 (en) | Systems and methods for processing meeting information obtained from multiple sources | |
US9165182B2 (en) | Method and apparatus for using face detection information to improve speaker segmentation | |
CN114333853A (zh) | 一种音频数据的处理方法、设备和*** | |
JP5030868B2 (ja) | 会議音声録音システム | |
Hung et al. | Towards audio-visual on-line diarization of participants in group meetings | |
CN113611308A (zh) | 一种语音识别方法、装置、***、服务器及存储介质 | |
CN114764690A (zh) | 一种智能进行会议纪要的方法、装置和*** | |
CN113542604A (zh) | 视频对焦方法及装置 | |
RU2821283C2 (ru) | Индивидуально настроенный вывод, который оптимизируется для пользовательских предпочтений в распределенной системе | |
CN117392995A (zh) | 基于多模态的话者分离方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |