CN116033206A

CN116033206A - Intelligent marking method, system and medium for converting voice into text

Info

Publication number: CN116033206A
Application number: CN202211496746.7A
Authority: CN
Inventors: 詹梓彦; 刘兆武
Original assignee: Shenzhen Craftsman Network Technology Co ltd
Current assignee: Shenzhen Craftsman Network Technology Co ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-04-28

Abstract

The invention discloses an intelligent marking method, a system and a medium for converting voice into characters, wherein the method comprises the following steps: when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user. Compared with the prior art, the method combines the media stream playing picture (such as live broadcast, audio and video files and the like) with the voice-to-text part and the marking part, can intelligently mark according to the playing progress or text, is convenient for self or other cooperators to check or audit, and improves the auditing efficiency of the media stream compliant content.

Description

Intelligent marking method, system and medium for converting voice into text

Technical Field

The invention relates to the technical field of voice processing, in particular to an intelligent marking method, system and medium for converting voice into characters.

Background

Currently, most voice-to-text products for live and playback do not have a method to provide intelligent tagging, which is certainly inefficient as the user spends time re-reviewing each time.

Disclosure of Invention

The invention mainly aims to provide an intelligent marking method, system and medium for converting voice into text, which aim to intelligently mark according to playing progress or text, facilitate checking or auditing by a user and improve auditing efficiency of compliance content of a media stream.

In order to achieve the above purpose, the invention provides an intelligent marking method for converting voice into characters, which comprises the following steps:

when capturing a media stream playing picture, converting an audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of a text information display interface;

marking the text information by adopting a preset marking strategy to obtain marking information;

and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface to the user.

The further technical scheme of the invention is that the step of marking the text information by adopting a preset marking strategy to obtain marking information comprises the following steps:

searching a preset word stock;

analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock;

if the problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained.

capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.

The further technical scheme of the invention is that the operation behavior of the user is to click the playing progress of the audio and video file on the media stream playing picture, the operation behavior of the user is captured, the text information is marked according to the operation behavior of the user, and the step of obtaining the marking information comprises the following steps:

when capturing the operation behavior of clicking the playing progress of the audio and video file on the media stream playing picture by the user, marking the corresponding text information according to the playing progress of the media stream playing picture to obtain marking information.

The further technical scheme of the invention is that the operation behavior of the user is text content framed on a display interface of the text information, the operation behavior of the user is captured, the text information is marked according to the operation behavior of the user, and the step of obtaining marking information comprises the following steps:

when capturing the operation behavior of the user for selecting text content in the text information display interface, extracting the corresponding text and playing time from the text information for marking, and obtaining marking information.

According to a further technical scheme of the invention, the step of synchronously displaying the marking information, the text information and the audio/video file back to the user on a marking information display interface according to the marking point selected by the user comprises the following steps:

and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface by adopting selection and range technology.

In order to achieve the above object, the present invention further provides an intelligent marking system for converting voice into text, the system comprising a memory, a processor, and an intelligent marking program for converting voice into text stored on the processor, wherein the intelligent marking program for converting voice into text is executed by the processor to perform the following steps:

The invention further adopts the technical scheme that the intelligent marking program for converting voice into characters also executes the following steps when being run by the processor:

searching a preset word stock;

To achieve the above object, the present invention also proposes a computer-readable storage medium storing a voice-to-text smart marking program which, when executed by a processor, performs the steps of the method as described above.

The intelligent marking method, system and medium for converting voice into characters have the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a first embodiment of a voice-to-text intelligent marking method according to the present invention;

FIG. 2 is an effect diagram showing the marked information in the intelligent marking method for converting voice into characters;

FIG. 3 is another effect diagram of the display of the marking information in the intelligent marking method for converting voice into text according to the invention;

FIG. 4 is a flowchart of a first embodiment of a voice-to-text intelligent labeling method according to the present invention;

fig. 5 is a flowchart of a second embodiment of the voice-to-text intelligent marking method of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides an intelligent marking method for converting voice into text, which combines a media stream playing picture (such as live broadcast, audio and video files and the like) with a voice converting text part and a marking part, so that a user can intelligently mark according to the playing progress or text, thereby facilitating the examination or auditing of the user or other cooperators and improving the auditing efficiency of the compliance content of the media stream.

As shown in fig. 1 to 3, a first embodiment of the intelligent marking method for converting voice into text of the present invention includes the following steps:

step S10, when capturing the media stream playing picture, converting the audio and video file into text information, and synchronously presenting the text information to the user in the audio and video file of the text information display interface media stream playing picture.

And S20, marking the text information by adopting a preset marking strategy to obtain marking information.

In this embodiment, when the preset marking policy is adopted to mark the text information, two schemes may be adopted.

One scheme is to automatically and intelligently mark the text information to obtain marked information. Specifically, in this embodiment, a word stock is preset, after an audio and video file is converted into text information, the preset word stock is queried, the text information is analyzed, the vocabulary in the text information is matched with the vocabulary in the word stock, and if a problematic vocabulary is found, the problematic vocabulary is intelligently marked, so as to obtain marking information.

Alternatively, in this embodiment, the text information may be marked according to the operation behavior of the user, so as to obtain marking information. The operation behavior of the user may be a behavior of clicking the playing progress of the audio/video file on the media stream playing screen, or a behavior of framing text content on the display interface of the text information.

When capturing the operation behavior of clicking the playing progress of the audio and video file on the media stream playing picture by the user, the embodiment marks the corresponding text information according to the playing progress of the media stream playing picture to obtain the mark information. When capturing the operation behavior of the user for selecting text content in the text information display interface, extracting the corresponding text and playing time from the text information for marking, and obtaining marking information.

And step S30, synchronizing and displaying the marking information with the text information and the audio/video file on a marking information display interface according to the marking point selected by the user.

Specifically, in this embodiment, step S30 specifically includes: and according to the mark points selected by the user, synchronizing and displaying the mark information with the text information and the audio/video file on a mark information display interface by adopting selection and range technology.

According to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.

Further, referring to fig. 4, a second embodiment of the intelligent marking method for converting voice into text according to the first embodiment shown in fig. 1 is provided, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of marking the text information by using a preset marking policy includes:

step S201, searching a preset word stock.

Step S202, analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock.

And step S203, if the problematic vocabulary is found, performing intelligent marking on the problematic vocabulary to obtain marking information.

In this embodiment, during the process of converting text information, each vocabulary in the text information is analyzed, the vocabulary in the text information is matched with the vocabulary in the word bank, if a problematic vocabulary is found, the problematic vocabulary is intelligently marked, and marking information is obtained, so that automatic intelligent marking is realized, and the auditing efficiency of media stream content is improved.

Further, referring to fig. 5, a third embodiment of the intelligent voice-to-text marking method according to the present invention is provided based on the first embodiment shown in fig. 1, and the difference between the present embodiment and the first embodiment shown in fig. 1 is that in the present embodiment, the step S20 of marking the text information by using a preset marking policy includes:

step S204, capturing the operation behaviors of the user, and marking the text information according to the operation behaviors of the user to obtain marking information.

Specifically, as an implementation manner, the operation behavior of the user is clicking the playing progress of the audio/video file on the media stream playing screen, step S204, capturing the operation behavior of the user, marking the text information according to the operation behavior of the user, and obtaining the marking information includes:

step S2041, when capturing the operation behavior of clicking the playing progress of the audio/video file on the media stream playing screen by the user, marking the corresponding text information according to the playing progress of the media stream playing screen, to obtain marking information.

Note that in this solution, attention needs to be paid to the situation that the compatible voice is not matched with the text, for example, the playing progress of the audio/video file has no sound or no speaking.

As another embodiment, the operation behavior of the user is to frame text content on the display interface of the text information, and the step S204 of capturing the operation behavior of the user, marking the text information according to the operation behavior of the user, and obtaining the marked information includes:

step S2042, when capturing the operation behavior of the user for selecting the text content on the display interface of the text information, extracting the corresponding text and playing time from the text information for marking, and obtaining the marking information.

By the intelligent marking method for converting voice into text, the media stream playing picture (such as live broadcast, audio and video files and the like) and the voice converting text part and the marking part are combined together, so that the following functions can be provided:

1. the user can select the text, mark the corresponding playing time and record the selected text content;

2. the user can mark the current playing progress;

3. the background carries out intelligent auditing on the converted text and returns marking information of the problematic text;

4. the user can jump to the corresponding progress of the play (excluding live) and relocate to the corresponding text according to the marking information.

The intelligent marking method for converting voice into characters has the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.

when capturing the media stream playing picture, converting the audio and video file into text information, and synchronously presenting the text information to a user in the audio and video file of the media stream playing picture of the text information display interface.

And marking the text information by adopting a preset marking strategy to obtain marking information.

Further, the intelligent marking program for converting the voice into the text also executes the following steps when being executed by the processor:

searching a preset word stock.

And analyzing the text information, and matching the vocabulary in the text information with the vocabulary in the word stock.

The intelligent marking system for converting voice into characters has the beneficial effects that: according to the technical scheme, when a media stream playing picture is captured, the audio and video file is converted into text information, and the text information is synchronously presented to a user on the audio and video file of the text information media stream playing picture on a text information display interface; marking the text information by adopting a preset marking strategy to obtain marking information; and according to the mark points selected by the user, the mark information is synchronously displayed back to the user on a mark information display interface together with the text information and the audio/video file, the user can intelligently mark according to the playing progress or the text, so that the user or other cooperators can conveniently check or audit the content of the media stream, and the auditing efficiency of the content of the media stream is improved.

To achieve the above objective, the present invention further provides a computer readable storage medium, where a voice-to-text intelligent marking program is stored, and when the voice-to-text intelligent marking program is executed by a processor, the steps of the method described in the above embodiments are executed, which is not described herein again.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the specification and drawings of the present invention or direct/indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. An intelligent marking method for converting voice into characters is characterized by comprising the following steps:

2. The intelligent marking method for converting voice into text according to claim 1, wherein the step of marking the text information by using a preset marking strategy to obtain marking information comprises the steps of:

searching a preset word stock;

3. The intelligent marking method for converting voice into text according to claim 1, wherein the step of marking the text information by using a preset marking strategy to obtain marking information comprises the steps of:

4. The intelligent marking method for converting voice into text according to claim 3, wherein the operation behavior of the user is clicking the playing progress of the audio/video file on the media stream playing picture, the capturing operation behavior of the user, marking the text information according to the operation behavior of the user, and the step of obtaining the marking information includes:

5. The intelligent marking method for converting voice into text according to claim 3, wherein the operation behavior of the user is to frame text content on a display interface of the text information, the capturing operation behavior of the user, marking the text information according to the operation behavior of the user, and the step of obtaining marking information includes:

6. The intelligent marking method for converting voice into text according to claim 1, wherein the step of displaying the marking information back to the user in synchronization with the text information and the audio/video file on the marking information display interface according to the marking point selected by the user comprises the steps of:

7. An intelligent voice-to-text marking system, comprising a memory, a processor, and an intelligent voice-to-text marking program stored on the processor, wherein the intelligent voice-to-text marking program is executed by the processor to perform the steps of:

8. The intelligent voice-to-text tagging system of claim 7, wherein the intelligent voice-to-text tagging program, when executed by the processor, further performs the steps of:

searching a preset word stock;

9. The intelligent voice-to-text tagging system of claim 7, wherein the intelligent voice-to-text tagging program, when executed by the processor, further performs the steps of:

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a voice-to-text smart marking program which, when executed by a processor, performs the steps of the method according to any one of claims 1 to 6.