CN116033206A - Intelligent marking method, system and medium for converting voice into text - Google Patents

Intelligent marking method, system and medium for converting voice into text Download PDF

Info

Publication number
CN116033206A
CN116033206A CN202211496746.7A CN202211496746A CN116033206A CN 116033206 A CN116033206 A CN 116033206A CN 202211496746 A CN202211496746 A CN 202211496746A CN 116033206 A CN116033206 A CN 116033206A
Authority
CN
China
Prior art keywords
marking
text
information
user
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211496746.7A
Other languages
Chinese (zh)
Inventor
詹梓彦
刘兆武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Craftsman Network Technology Co ltd
Original Assignee
Shenzhen Craftsman Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Craftsman Network Technology Co ltd filed Critical Shenzhen Craftsman Network Technology Co ltd
Priority to CN202211496746.7A priority Critical patent/CN116033206A/en
Publication of CN116033206A publication Critical patent/CN116033206A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Document Processing Apparatus (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an intelligent marking method, a system and a medium for converting voice into characters, wherein the method comprises the following steps: when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user. Compared with the prior art, the method combines the media stream playing picture (such as live broadcast, audio and video files and the like) with the voice-to-text part and the marking part, can intelligently mark according to the playing progress or text, is convenient for self or other cooperators to check or audit, and improves the auditing efficiency of the media stream compliant content.

Description

Intelligent marking method, system and medium for converting voice into text
Technical Field
The invention relates to the technical field of voice processing, in particular to an intelligent marking method, system and medium for converting voice into characters.
Background
Currently, most voice-to-text products for live and playback do not have a method to provide intelligent tagging, which is certainly inefficient as the user spends time re-reviewing each time.
Disclosure of Invention
The invention mainly aims to provide an intelligent marking method, system and medium for converting voice into text, which aim to intelligently mark according to playing progress or text, facilitate checking or auditing by a user and improve auditing efficiency of compliance content of a media stream.
In order to achieve the above purpose, the invention provides an intelligent marking method for converting voice into characters, which comprises the following steps:
when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface;
marking the text information by adopting a preset marking strategy to obtain marking information;
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.
The further technical scheme of the invention is that the step of marking the text information by adopting a preset marking strategy to obtain marking information comprises the following steps:
searching a preset word stock;
analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock;
if the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.
The further technical scheme of the invention is that the step of marking the text information by adopting a preset marking strategy to obtain marking information comprises the following steps:
capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
The further technical scheme of the invention is that the operation behavior of the user is to click the playing progress of the audio and video file on the media stream playing picture, the operation behavior of the user is captured, the text information is marked according to the operation behavior of the user, and the step of obtaining the marking information comprises the following steps:
when capturing the operation behavior of clicking the playing progress of the audio and video file on the media stream playing picture by the user, marking the corresponding text information according to the playing progress of the media stream playing picture to obtain marking information.
The further technical scheme of the invention is that the operation behavior of the user is text content framed on a display interface of the text information, the operation behavior of the user is captured, the text information is marked according to the operation behavior of the user, and the step of obtaining marking information comprises the following steps:
when capturing the operation behavior of the user for selecting text content in the text information display interface, extracting the corresponding text and playing time from the text information for marking, and obtaining marking information.
According to a further technical scheme of the invention, the step of synchronously displaying the marking information, the text information and the audio/video file back to the user on a marking information display interface according to the marking point selected by the user comprises the following steps:
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface by adopting selection and range technology.
In order to achieve the above object, the present invention further provides an intelligent marking system for converting voice into text, the system comprising a memory, a processor, and an intelligent marking program for converting voice into text stored on the processor, wherein the intelligent marking program for converting voice into text is executed by the processor to perform the following steps:
when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface;
marking the text information by adopting a preset marking strategy to obtain marking information;
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.
The invention further adopts the technical scheme that the intelligent marking program for converting voice into characters also executes the following steps when being run by the processor:
searching a preset word stock;
analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock;
if the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.
The invention further adopts the technical scheme that the intelligent marking program for converting voice into characters also executes the following steps when being run by the processor:
capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
To achieve the above object, the present invention also proposes a computer-readable storage medium storing a voice-to-text smart marking program which, when executed by a processor, performs the steps of the method as described above.
The intelligent marking method, system and medium for converting voice into characters have the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a first embodiment of a voice-to-text intelligent marking method according to the present invention;
FIG. 2 is an effect diagram showing the marked information in the intelligent marking method for converting voice into characters;
FIG. 3 is another effect diagram of the display of the marking information in the intelligent marking method for converting voice into text according to the invention;
FIG. 4 is a flowchart of a first embodiment of a voice-to-text intelligent labeling method according to the present invention;
fig. 5 is a flowchart of a second embodiment of the voice-to-text intelligent marking method of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides an intelligent marking method for converting voice into text, which combines a media stream playing picture (such as live broadcast, audio and video files and the like) with a voice converting text part and a marking part, so that a user can intelligently mark according to the playing progress or text, thereby facilitating the examination or auditing of the user or other cooperators and improving the auditing efficiency of the compliance content of the media stream.
As shown in fig. 1 to 3, a first embodiment of the intelligent marking method for converting voice into text of the present invention includes the following steps:
step S10, when capturing the media stream playing picture, converting the audio and video file into text information, and synchronously presenting the text information to the user in the audio and video file of the text information display interface media stream playing picture.
And S20, marking the text information by adopting a preset marking strategy to obtain marking information.
In this embodiment, when the preset marking policy is adopted to mark the text information, two schemes may be adopted.
One scheme is to automatically and intelligently mark the text information to obtain marked information. Specifically, in this embodiment, a word stock is preset, after an audio and video file is converted into text information, the preset word stock is queried, the text information is analyzed, the vocabulary in the text information is matched with the vocabulary in the word stock, and if a problematic vocabulary is found, the problematic vocabulary is intelligently marked, so as to obtain marking information.
Alternatively, in this embodiment, the text information may be marked according to the operation behavior of the user, so as to obtain marking information. The operation behavior of the user may be a behavior of clicking the playing progress of the audio/video file on the media stream playing screen, or a behavior of framing text content on the display interface of the text information.
When capturing the operation behavior of clicking the playing progress of the audio and video file on the media stream playing picture by the user, the embodiment marks the corresponding text information according to the playing progress of the media stream playing picture to obtain the mark information. When capturing the operation behavior of the user for selecting text content in the text information display interface, extracting the corresponding text and playing time from the text information for marking, and obtaining marking information.
And step S30, synchronizing and displaying the marking information with the text information and the audio/video file on a marking information display interface according to the marking point selected by the user.
Specifically, in this embodiment, step S30 specifically includes: and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface by adopting selection and range technology.
According to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.
Further, referring to fig. 4, a second embodiment of the intelligent marking method for converting voice into text according to the first embodiment shown in fig. 1 is provided, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of marking the text information by using a preset marking policy includes:
step S201, searching a preset word stock.
Step S202, analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock.
And step S203, if the problematic vocabulary is found, performing intelligent marking on the problematic vocabulary to obtain marking information.
In this embodiment, during the process of converting text information, each vocabulary in the text information is analyzed, the vocabulary in the text information is matched with the vocabulary in the word bank, if a problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained, so that automatic intelligent marking is realized, and the auditing efficiency of media stream content is improved.
Further, referring to fig. 5, a third embodiment of the intelligent voice-to-text marking method according to the present invention is provided based on the first embodiment shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of marking the text information by using a preset marking policy includes:
step S204, capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
Specifically, as an implementation manner, the operation behavior of the user is clicking the playing progress of the audio/video file on the media stream playing screen, step S204, capturing the operation behavior of the user, marking the text information according to the operation behavior of the user, and obtaining the marking information includes:
step S2041, when capturing the operation behavior of clicking the playing progress of the audio/video file on the media stream playing screen by the user, marking the corresponding text information according to the playing progress of the media stream playing screen, to obtain marking information.
Note that in this solution, attention needs to be paid to the situation that the compatible voice is not matched with the text, for example, the playing progress of the audio/video file has no sound or no speaking.
As another embodiment, the operation behavior of the user is to frame text content on the display interface of the text information, and the step S204 of capturing the operation behavior of the user, marking the text information according to the operation behavior of the user, and obtaining the marked information includes:
step S2042, when capturing the operation behavior of the user for selecting the text content on the display interface of the text information, extracting the corresponding text and playing time from the text information for marking, and obtaining the marking information.
By the intelligent marking method for converting voice into text, the media stream playing picture (such as live broadcast, audio and video files and the like) and the voice converting text part and the marking part are combined together, so that the following functions can be provided:
1. the user can select the text, mark the corresponding playing time and record the selected text content;
2. the user can mark the current playing progress;
3. the background carries out intelligent auditing on the converted text and returns marking information of the problematic text;
4. the user can jump to the corresponding progress of the play (excluding live) and relocate to the corresponding text according to the marking information.
The intelligent marking method for converting voice into characters has the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.
In order to achieve the above object, the present invention further provides an intelligent marking system for converting voice into text, the system comprising a memory, a processor, and an intelligent marking program for converting voice into text stored on the processor, wherein the intelligent marking program for converting voice into text is executed by the processor to perform the following steps:
when capturing the media stream playing picture, converting the audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of the text information display interface.
And marking the text information by adopting a preset marking strategy to obtain marking information.
And according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.
Further, the intelligent marking program for converting the voice into the text also executes the following steps when being executed by the processor:
searching a preset word stock.
And analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock.
If the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.
Further, the intelligent marking program for converting the voice into the text also executes the following steps when being executed by the processor:
capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
The intelligent marking system for converting voice into characters has the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.
To achieve the above objective, the present invention further provides a computer readable storage medium, where a voice-to-text intelligent marking program is stored, and when the voice-to-text intelligent marking program is executed by a processor, the steps of the method described in the above embodiments are executed, which is not described herein again.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the specification and drawings of the present invention or direct/indirect application in other related technical fields are included in the scope of the present invention.

Claims (10)

1. An intelligent marking method for converting voice into characters is characterized by comprising the following steps:
when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface;
marking the text information by adopting a preset marking strategy to obtain marking information;
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.
2. The intelligent marking method for converting voice into text according to claim 1, wherein the step of marking the text information by using a preset marking strategy to obtain marking information comprises the steps of:
searching a preset word stock;
analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock;
if the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.
3. The intelligent marking method for converting voice into text according to claim 1, wherein the step of marking the text information by using a preset marking strategy to obtain marking information comprises the steps of:
capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
4. The intelligent marking method for converting voice into text according to claim 3, wherein the operation behavior of the user is clicking the playing progress of the audio/video file on the media stream playing picture, the capturing operation behavior of the user, marking the text information according to the operation behavior of the user, and the step of obtaining the marking information includes:
when capturing the operation behavior of clicking the playing progress of the audio and video file on the media stream playing picture by the user, marking the corresponding text information according to the playing progress of the media stream playing picture to obtain marking information.
5. The intelligent marking method for converting voice into text according to claim 3, wherein the operation behavior of the user is to frame text content on a display interface of the text information, the capturing operation behavior of the user, marking the text information according to the operation behavior of the user, and the step of obtaining marking information includes:
when capturing the operation behavior of the user for selecting text content in the text information display interface, extracting the corresponding text and playing time from the text information for marking, and obtaining marking information.
6. The intelligent marking method for converting voice into text according to claim 1, wherein the step of displaying the marking information back to the user in synchronization with the text information and the audio/video file on the marking information display interface according to the marking point selected by the user comprises the steps of:
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface by adopting selection and range technology.
7. An intelligent voice-to-text marking system, comprising a memory, a processor, and an intelligent voice-to-text marking program stored on the processor, wherein the intelligent voice-to-text marking program is executed by the processor to perform the steps of:
when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface;
marking the text information by adopting a preset marking strategy to obtain marking information;
and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.
8. The intelligent voice-to-text tagging system of claim 7, wherein the intelligent voice-to-text tagging program, when executed by the processor, further performs the steps of:
searching a preset word stock;
analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock;
if the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.
9. The intelligent voice-to-text tagging system of claim 7, wherein the intelligent voice-to-text tagging program, when executed by the processor, further performs the steps of:
capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a voice-to-text smart marking program which, when executed by a processor, performs the steps of the method according to any one of claims 1 to 6.
CN202211496746.7A 2022-11-24 2022-11-24 Intelligent marking method, system and medium for converting voice into text Pending CN116033206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211496746.7A CN116033206A (en) 2022-11-24 2022-11-24 Intelligent marking method, system and medium for converting voice into text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211496746.7A CN116033206A (en) 2022-11-24 2022-11-24 Intelligent marking method, system and medium for converting voice into text

Publications (1)

Publication Number Publication Date
CN116033206A true CN116033206A (en) 2023-04-28

Family

ID=86069853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211496746.7A Pending CN116033206A (en) 2022-11-24 2022-11-24 Intelligent marking method, system and medium for converting voice into text

Country Status (1)

Country Link
CN (1) CN116033206A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN110719518A (en) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 Multimedia data processing method, device and equipment
CN112231498A (en) * 2020-09-29 2021-01-15 北京字跳网络技术有限公司 Interactive information processing method, device, equipment and medium
CN112380365A (en) * 2020-11-18 2021-02-19 北京字跳网络技术有限公司 Multimedia subtitle interaction method, device, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110719518A (en) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 Multimedia data processing method, device and equipment
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN112231498A (en) * 2020-09-29 2021-01-15 北京字跳网络技术有限公司 Interactive information processing method, device, equipment and medium
CN112380365A (en) * 2020-11-18 2021-02-19 北京字跳网络技术有限公司 Multimedia subtitle interaction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN104252861B (en) Video speech conversion method, device and server
CN110401878A (en) A kind of video clipping method, system and storage medium
CN104581351A (en) Audio/video recording method, audio/video playing method and electronic device
CN110557678A (en) Video processing method, device and equipment
CN105612743A (en) Audio video playback synchronization for encoded media
DE112018006727B4 (en) ELECTRONIC DEVICE FOR COMBINING MUSIC WITH PHOTOGRAPHY AND CONTROL METHODS THEREFOR
JP6202815B2 (en) Character recognition device, character recognition method, and character recognition program
JP5338911B2 (en) Moving image processing apparatus, thumbnail image generation program, and thumbnail image generation method
CN112989104B (en) Information display method and device, computer readable storage medium and electronic equipment
KR20090026942A (en) Method and apparatus for recording multimedia data by automatically generating/updating metadata
US9666211B2 (en) Information processing apparatus, information processing method, display control apparatus, and display control method
CN115396738A (en) Video playing method, device, equipment and storage medium
CN114071184A (en) Subtitle positioning method, electronic equipment and medium
CN113395605A (en) Video note generation method and device
CN113411674A (en) Video playing control method and device, electronic equipment and storage medium
CN101924847B (en) Multimedia playing device and playing method thereof
CN111723235B (en) Music content identification method, device and equipment
US20150111189A1 (en) System and method for browsing multimedia file
JP5341523B2 (en) Method and apparatus for generating metadata
CN113641837A (en) Display method and related equipment thereof
CN116033206A (en) Intelligent marking method, system and medium for converting voice into text
CN113411517B (en) Video template generation method and device, electronic equipment and storage medium
CN111522992A (en) Method, device and equipment for putting questions into storage and storage medium
GB2614850A (en) Embedding asynchronous content in text
CN107679068B (en) Information importing and displaying method of multimedia file, mobile terminal and storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination