CN105913838B - Audio frequency controller device and method - Google Patents

Audio frequency controller device and method Download PDF

Info

Publication number
CN105913838B
CN105913838B CN201610339908.4A CN201610339908A CN105913838B CN 105913838 B CN105913838 B CN 105913838B CN 201610339908 A CN201610339908 A CN 201610339908A CN 105913838 B CN105913838 B CN 105913838B
Authority
CN
China
Prior art keywords
information
text
audio file
audio
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610339908.4A
Other languages
Chinese (zh)
Other versions
CN105913838A (en
Inventor
王荣洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201610339908.4A priority Critical patent/CN105913838B/en
Publication of CN105913838A publication Critical patent/CN105913838A/en
Application granted granted Critical
Publication of CN105913838B publication Critical patent/CN105913838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a kind of audio frequency controller device, which includes: speech recognition module, for carrying out speech recognition to audio file, obtains the association in time information of the corresponding text of the audio file and the audio file and the text;Information extraction modules, for extracting the markup information of the text according to preconfigured identification model;Label acquisition module obtains the label of the audio file for the association in time information according to the markup information of the text and the audio file and the text.The invention also discloses a kind of audio frequency controller methods.The present invention realizes the label of automatic identification, intelligence addition audio file, carries out addition and the editor of label to audio file manually without user, the user experience is improved.

Description

Audio management apparatus and method
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to an audio management apparatus and method.
Background
When a user uses a mobile phone, a recording pen and other terminal equipment to record, a label can be added to a recording file in the recording process, and key marks or paragraph division can be carried out. Subsequently, when the user plays the recording file, the prompt of the recording content can be obtained according to the label.
For example, when a meeting is opened, the meeting content usually has several definite issues, and when different issues are discussed, the user can use the tagging function of the recording application to tag the time point of the recording file. Therefore, when the recording file is played, the user can know the topic of the recording according to the label on the time point.
However, currently, the addition of the recording label is usually performed manually by a user, some important labels may be missed and not added in the recording process, and if the labels are added after the recording is finished, the user needs to find the recording time again to add the labels, which wastes the energy of the user. Moreover, when a user feels that a recording needs to be marked, the recording is manually marked by using the label function, and in the process of marking, the user has two purposes at a glance and is distracted, important conversation contents can be missed, and the use experience of the user is influenced.
Disclosure of Invention
The invention mainly aims to provide an audio management device and method, and aims to solve the technical problem that voice tags are not intelligently added.
To achieve the above object, the present invention provides an audio management apparatus, comprising:
the voice recognition module is used for carrying out voice recognition on an audio file to obtain a text corresponding to the audio file and time association information of the audio file and the text;
the information extraction module is used for extracting the marking information of the text according to a pre-configured recognition model;
and the label acquisition module is used for acquiring the label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text.
In one embodiment, the information extraction module comprises:
the information extraction unit is used for respectively extracting the label information corresponding to each sentence of character information in the text according to a pre-configured identification model;
the duplication removing unit is used for carrying out duplication removing processing on the labeling information of the character information;
and the integration unit is used for acquiring the marked information subjected to the duplicate removal processing as the marked information of the text.
In one embodiment, the audio management apparatus further comprises:
and the association module is used for establishing the association relationship between the label and the audio file and linking the label to the time period or the time point corresponding to the audio file.
In one embodiment, the audio management apparatus further comprises:
and the model configuration module is used for training to obtain the recognition model according to the pre-configured training corpus and the feature template.
In one embodiment, the model configuration module comprises:
the preprocessing unit is used for preprocessing the pre-configured training corpus to acquire correct labeling information of the training corpus;
and the configuration unit is used for performing feature extraction training on the preprocessed training corpus according to a pre-configured feature template and the correct marking information to obtain model parameters and establish an identification model.
In addition, to achieve the above object, the present invention further provides an audio management method, including:
performing voice recognition on an audio file to obtain a text corresponding to the audio file and time associated information of the audio file and the text;
extracting the labeling information of the text according to a pre-configured recognition model;
and acquiring the label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text.
In one embodiment, the step of extracting the label information of the text according to the preconfigured recognition model includes:
respectively extracting marking information corresponding to each sentence of character information in the text according to a pre-configured recognition model;
carrying out duplication elimination processing on the labeling information of the character information;
and acquiring the marked information subjected to the duplicate removal processing as the marked information of the text.
In one embodiment, after the step of obtaining the tag of the audio file according to the labeling information of the text and the time-related information of the audio file and the text, the method further includes:
and establishing an incidence relation between the label and the audio file, and linking the label to a time period or a time point corresponding to the audio file.
In one embodiment, before the steps of performing speech recognition on an audio file, acquiring a text corresponding to the audio file, and time-related information between the audio file and the text, the method further includes:
and training to obtain the recognition model according to the preset training corpus and the feature template.
In an embodiment, the step of obtaining the recognition model by training according to a pre-configured corpus and a feature template includes:
preprocessing a pre-configured training corpus to acquire correct labeling information of the training corpus;
and according to a pre-configured feature template and the correct marking information, performing feature extraction training on the preprocessed training corpus to obtain model parameters, and establishing a recognition model.
The invention provides an audio management device and method, which are characterized in that voice recognition is carried out on an audio file through a voice recognition module, and a text corresponding to the audio file and time associated information of the audio file and the text are obtained; then, the information extraction module extracts the labeling information of the text according to a pre-configured recognition model; and then, the label obtaining module obtains the label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text. According to the method, the corresponding text is obtained by carrying out voice recognition on the audio file, so that the label of the audio file is added according to the extracted labeling information of the text; and acquiring time correlation information of the audio file and the text, so that the acquired label is added to a time period or a time point corresponding to the audio file, and the accuracy of the position of the label of the audio file is ensured. Therefore, the invention realizes automatic identification and intelligent addition of the tags of the audio files, does not need the manual addition and editing of the tags of the audio files by users, and improves the user experience.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of an alternative mobile terminal for implementing various embodiments of the present invention;
FIG. 2 is a diagram of a wireless communication system for the mobile terminal shown in FIG. 1;
FIG. 3 is a functional block diagram of an audio management device according to a first embodiment of the present invention;
FIG. 4 is a functional block diagram of an audio management device according to a second embodiment of the present invention;
FIG. 5 is a functional block diagram of an audio management device according to a third embodiment of the present invention;
FIG. 6 is a functional block diagram of an audio management device according to a fourth embodiment of the present invention;
FIG. 7 is a functional block diagram of a fifth embodiment of an audio management device according to the present invention;
FIG. 8 is a flowchart illustrating a first embodiment of an audio management method according to the present invention;
FIG. 9 is a flowchart illustrating a second embodiment of an audio management method according to the present invention;
FIG. 10 is a flowchart illustrating a third embodiment of an audio management method according to the present invention;
FIG. 11 is a flowchart illustrating a fourth embodiment of an audio management method according to the present invention;
FIG. 12 is a flowchart illustrating a fifth embodiment of an audio management method according to the present invention;
FIG. 13 is a schematic diagram of an audio file tag according to an embodiment of the present invention;
FIG. 14 is a schematic diagram of an audio file tag according to an embodiment of the present invention;
fig. 15 is a schematic view of an application scenario of a recognition model for training extraction of an evaluation object according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A mobile terminal implementing various embodiments of the present invention will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
The mobile terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a smart phone, a voice pen, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, etc., and a fixed terminal such as a digital TV, a desktop computer, etc. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.
Fig. 1 is a schematic diagram of an alternative mobile terminal hardware architecture for implementing various embodiments of the present invention.
The mobile terminal 100 may include an a/V (audio/video) input unit 110, a user input unit 120, an output unit 130, a memory 140, a controller 150, and a power supply unit 160, etc. Fig. 1 illustrates a mobile terminal having various components, but it is to be understood that not all illustrated components are required to be implemented. More or fewer components may alternatively be implemented. Elements of the mobile terminal will be described in detail below.
The a/V input unit 110 is used to receive an audio or video signal. The a/V input unit 110 may include a microphone 111, and the microphone 111 may receive sounds (audio data) via the microphone in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the mobile communication module 112 in case of a phone call mode. The microphone 111 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting an audio signal.
The user input unit 120 may generate key input data to control various operations of the mobile terminal according to a command input by a user. The user input unit 120 allows a user to input various types of information, and may include a keyboard, dome sheet, touch pad (e.g., a touch-sensitive member that detects changes in resistance, pressure, capacitance, and the like due to being touched), scroll wheel, joystick, and the like. In particular, when the touch panel is superimposed on the display unit 131 in the form of a layer, a touch screen may be formed.
The output unit 130 may include a display unit 131, an audio output module 132, and the like.
The display unit 131 may display information processed in the mobile terminal 100. For example, when the mobile terminal 100 is in a phone call mode, the display unit 131 may display a User Interface (UI) or a Graphical User Interface (GUI) related to a call or other communication (e.g., text messaging, multimedia file downloading, etc.). When the mobile terminal 100 is in a video call mode or an image capturing mode, the display unit 131 may display a captured image and/or a received image, a UI or GUI showing a video or an image and related functions, and the like.
Meanwhile, when the display unit 131 and the touch panel are stacked on each other in the form of layers to form a touch screen, the display unit 131 may function as an input device and an output device. The display unit 131 may include at least one of a Liquid Crystal Display (LCD), a thin film transistor LCD (TFT-LCD), an Organic Light Emitting Diode (OLED) display, a flexible display, a three-dimensional (3D) display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and a typical transparent display may be, for example, a TOLED (transparent organic light emitting diode) display or the like. Depending on the particular desired implementation, the mobile terminal 100 may include two or more display units (or other display devices), for example, the mobile terminal may include an external display unit (not shown) and an internal display unit (not shown). The touch screen may be used to detect a touch input pressure as well as a touch input position and a touch input area.
The audio output module 132 may convert audio data received by the wireless communication unit 110 or stored in the memory 140 into an audio signal and output as sound when the mobile terminal is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output module 132 may provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output module 132 may include a speaker, a buzzer, and the like.
The memory 140 may store software programs and the like for processing and controlling operations performed by the controller 150, or may temporarily store data (e.g., a phonebook, messages, still images, videos, and the like) that has been or is to be output. Also, the memory 140 may store data regarding various ways of vibration and audio signals output when a touch is applied to the touch screen.
The memory 140 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and so on. Also, the mobile terminal 100 may cooperate with a network storage device that performs a storage function of the memory 140 through a network connection.
The controller 150 generally controls the overall operation of the mobile terminal. For example, the controller 150 performs control and processing related to voice calls, data communications, video calls, and the like. The controller 150 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
The power supply unit 160 receives external power or internal power and provides appropriate power required to operate the respective elements and components under the control of the controller 150.
The various embodiments described herein may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, such embodiments may be implemented in the controller 150. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory 140 and executed by controller 150.
Up to this point, mobile terminals have been described in terms of their functionality. Hereinafter, a slide-type mobile terminal among various types of mobile terminals, such as a folder-type, bar-type, swing-type, slide-type mobile terminal, and the like, will be described as an example for the sake of brevity. Accordingly, the present invention can be applied to any type of mobile terminal, and is not limited to a slide type mobile terminal.
The mobile terminal 100 as shown in fig. 1 may be configured to operate with communication systems such as wired and wireless communication systems and satellite-based communication systems that transmit data via frames or packets.
A communication system in which a mobile terminal according to the present invention is operable will now be described with reference to fig. 2.
Such communication systems may use different air interfaces and/or physical layers. For example, the air interface used by the communication system includes, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Universal Mobile Telecommunications System (UMTS) (in particular, Long Term Evolution (LTE)), global system for mobile communications (GSM), and the like. By way of non-limiting example, the following description relates to a CDMA communication system, but such teachings are equally applicable to other types of systems.
Referring to fig. 2, the CDMA wireless communication system may include a plurality of mobile terminals 100, a plurality of Base Stations (BSs) 270, Base Station Controllers (BSCs) 275, and a Mobile Switching Center (MSC) 280. The MSC280 is configured to interface with a Public Switched Telephone Network (PSTN) 290. The MSC280 is also configured to interface with a BSC275, which may be coupled to the base station 270 via a backhaul. The backhaul may be constructed according to any of several known interfaces including, for example, E1/T1, ATM, IP, PPP, frame Relay, HDSL, ADSL, or xDSL. It will be understood that a system as shown in fig. 2 may include multiple BSCs 275.
Each BS270 may serve one or more sectors (or regions), each sector covered by a multi-directional antenna or an antenna pointing in a particular direction being radially distant from the BS 270. Alternatively, each partition may be covered by two or more antennas for diversity reception. Each BS270 may be configured to support multiple frequency allocations, with each frequency allocation having a particular frequency spectrum (e.g., 1.25MHz,5MHz, etc.).
The intersection of partitions with frequency allocations may be referred to as a CDMA channel. The BS270 may also be referred to as a Base Transceiver Subsystem (BTS) or other equivalent terminology. In such a case, the term "base station" may be used to generically refer to a single BSC275 and at least one BS 270. The base stations may also be referred to as "cells". Alternatively, each sector of a particular BS270 may be referred to as a plurality of cell sites.
As shown in fig. 2, a Broadcast Transmitter (BT)295 transmits a broadcast signal to the mobile terminal 100 operating within the system. In fig. 2, several Global Positioning System (GPS) satellites 300 are shown. The satellite 300 assists in locating at least one of the plurality of mobile terminals 100.
In fig. 2, a plurality of satellites 300 are depicted, but it is to be understood that useful positioning information may be obtained with any number of satellites. Other techniques that can track the location of the mobile terminal may be used instead of or in addition to GPS tracking techniques. In addition, at least one GPS satellite 300 may selectively or additionally process satellite DMB transmission.
As a typical operation of the wireless communication system, the BS270 receives reverse link signals from various mobile terminals 100. The mobile terminal 100 is generally engaged in conversations, messaging, and other types of communications. Each reverse link signal received by a particular base station 270 is processed within the particular BS 270. The obtained data is forwarded to the associated BSC 275. The BSC provides call resource allocation and mobility management functions including coordination of soft handoff procedures between BSs 270. The BSCs 275 also route the received data to the MSC280, which provides additional routing services for interfacing with the PSTN 290. Similarly, the PSTN290 interfaces with the MSC280, the MSC interfaces with the BSCs 275, and the BSCs 275 accordingly control the BS270 to transmit forward link signals to the mobile terminal 100.
Based on the above mobile terminal hardware structure and communication system, various embodiments of the present invention are proposed.
Referring to fig. 3, a first embodiment of the audio management apparatus according to the present invention provides an audio management apparatus, including:
the voice recognition module 10 is configured to perform voice recognition on an audio file, and acquire a text corresponding to the audio file and time-related information between the audio file and the text.
According to the invention, by extracting the labeling information of the audio file, the automatic label adding of the audio file is realized, so that a lot of manual intervention is reduced, the management of the voice label is more intelligent and automatic, and the user experience is greatly improved.
The audio management device may be deployed in a mobile terminal, a server, and the like, and performs audio management on various audio files such as recording.
Specifically, as an implementation manner, firstly, the voice recognition module 10 performs voice recognition on an audio file, that is, converts voice information in the audio file into text information, and an obtained text is a text corresponding to the audio file.
It should be noted that, in the process of speech recognition, the speech information in the audio file may be divided in units of sentences, and since the audio file includes one or more sentences of speech information, the recognized text includes one or more sentences of corresponding text information. The voice information in the audio file can be divided by taking a time period with a preset length as a unit, and the recognized text comprises one or more sections of corresponding character information because the audio file comprises one or more sections of voice information.
After segmenting the voice information according to the preset unit, the voice recognition module 10 records the time start point and the time end point of each segment of voice information in the audio file to obtain the time information of each segment of voice information, that is, the audio file is divided into different time segments by taking a section as a unit.
Meanwhile, in the process of voice recognition, the associated information of each section of voice information and the corresponding text information obtained by recognition in the audio file is stored, namely the corresponding relation between each section of voice information and each section of text information is obtained.
Then, the speech recognition module 10 establishes an association relationship between each time segment of the audio file and each segment of text information in the text based on the time axis of the audio file according to the time information of each segment of speech information and the correspondence between each segment of speech information and each segment of text information, each time segment of the audio file has corresponding text information, and the text information corresponding to each time segment in the same time segment is the same as the text information corresponding to the time segment.
Thus, the speech recognition module 10 obtains the time association of the audio file with the text.
According to the time association relation between the audio file and the text, text information corresponding to each time point or time period in the audio file can be obtained.
And the information extraction module 20 is configured to extract the labeling information of the text according to a preconfigured recognition model.
After obtaining the text corresponding to the audio file and the time association relationship between the audio file and the text, the information extraction module 20 extracts the label information of the text according to the pre-configured recognition model.
In one embodiment, the information extraction module 20 is configured with a recognition model in advance, and is used for extracting target feature information of a text and labeling the text.
Specifically, the information extraction module 20 uses each section of text information in the text information as a corpus, inputs the corpus into the recognition model, and recognizes and extracts target feature information of each section of text information in the text as labeled information.
It should be noted that the target feature information extracted by the information extraction module 20 is feature information of a preset type, for example, an evaluation object in a text is extracted as the target feature information, and an emotion evaluation in the text is extracted as the target feature information.
In this embodiment, target feature information is taken as an evaluation target, and an example will be described. The evaluation object is a subject expressed in a piece of discussion text. Such as discussions about a mobile phone screen, battery, buttons, application software, etc. in product discussions, discussions about director, actors, producers in a movie. The extraction of the evaluation object therefore implies a great commercial value. Such as:
the mobile phone has low screen resolution and is somewhat disappointed.
In this example sentence, it can be seen that the subject in question is the screen resolution, and therefore the evaluation object of this sentence is the "screen resolution".
After extracting the evaluation object of each section of the text information, the information extraction module 20 takes the evaluation object as the target feature information and labels the corresponding text information.
For example, the text information that the screen resolution of the mobile phone is low and the target feature information of the text information that the mobile phone is disappointed to some extent is the screen resolution, and the annotated information is the screen resolution.
Therefore, the information extraction module 20 obtains the target feature information of each section of text information in the text, that is, obtains the label information of the text.
And the tag obtaining module 30 is configured to obtain a tag of the audio file according to the labeling information of the text and the time-related information of the audio file and the text.
After obtaining the label information of the text, the label obtaining module 30 obtains the label of the audio file according to the label information of the text and the time-related information of the audio file and the text.
Specifically, as an implementation manner, the tag obtaining module 30 takes the labeled information of each section of text information in the text as a tag according to the time association relationship between the text and the audio file, and labels the tagged information to the time period corresponding to the audio file, and keeps the tag of the voice information in each time period of the audio file consistent with the labeled information of the corresponding text information.
Therefore, the tag obtaining module 30 obtains tags of the audio file in each time period, and the tags of the audio file at each time point in the same time period are the same as the time period.
It should be noted that, the user can perform operations such as editing, saving, and deleting on the acquired audio file tag as needed.
Then, the tag obtaining module 30 may establish a tag list according to each tag of the audio file and the corresponding time period information, referring to fig. 13, so that the user can know the evaluation object of each time period of the current audio file according to the tag list.
Referring to fig. 14, in the process of playing the current audio file, the tag obtaining module 30 may display tags near a time axis or a playing progress bar of the audio file, so that a user can know evaluation objects of each time segment of the current audio file according to each tag.
In this embodiment, the voice recognition module 10 performs voice recognition on the audio file to obtain a text corresponding to the audio file and time-related information between the audio file and the text; then, the information extraction module 20 extracts the label information of the text according to the pre-configured recognition model; then, the tag obtaining module 30 obtains the tag of the audio file according to the labeling information of the text and the time-related information of the audio file and the text. In the embodiment, the corresponding text is obtained by performing voice recognition on the audio file, so that the label of the audio file is added according to the label information of the extracted text; and acquiring time correlation information of the audio file and the text, so that the acquired label is added to a time period or a time point corresponding to the audio file, and the accuracy of the position of the label of the audio file is ensured. From this, this embodiment has realized automatic identification, intelligence and has added the label of audio file, need not that the user is manual to carry out addition and the edition of label to audio file, has promoted user experience.
Further, referring to fig. 4, a second embodiment of the audio management apparatus according to the present invention provides an audio management apparatus, based on the embodiment shown in fig. 3, where the information extraction module 20 includes:
and the information extraction unit 21 is configured to extract, according to a pre-configured recognition model, label information corresponding to each sentence of text information in the text.
On the basis of the first embodiment of the audio management device, the embodiment extracts the annotation information by taking a sentence as a unit, so that the language structure is more consistent, repeated annotation information can be reasonably and effectively removed when de-duplication is performed on the annotation information, and the obtained text annotation information is more accurate and accords with the semantics of audio.
Specifically, after segmenting the voice information in the audio file by taking a sentence as a unit, the voice recognition module 10 records a time start point and a time end point of each sentence of voice information in the audio file to obtain the time information of each sentence of voice information, that is, the audio file is divided into different time periods by taking a sentence as a unit.
Meanwhile, in the process of voice recognition, the associated information of each sentence of voice information in the audio file and the corresponding text information obtained by recognition is stored, namely the corresponding relation between each sentence of voice information and each sentence of text information is obtained.
Then, the speech recognition module 10 establishes an association relationship between each time segment of the audio file and each text message in the text based on the time axis of the audio file according to the time information of each sentence of the speech message and the corresponding relationship between each sentence of the speech message and each text message, each time segment of the audio file has corresponding text message, and the text message corresponding to each time segment in the same time segment is the same as the text message corresponding to the time segment.
Therefore, the obtained time association relationship between the audio file and the text comprises the corresponding relationship between each time period or time point in the audio file and the text information in the text.
Then, the information extraction unit 21 inputs the text information of each sentence as a corpus into the recognition model, and recognizes and extracts the target feature information of each sentence of the text information.
In this embodiment, target feature information is taken as an evaluation target, and an example will be described.
The information extraction unit 21 extracts an evaluation target of each piece of text information in the obtained text as the label information corresponding to each piece of text information according to the recognition model.
And a duplicate removal unit 22, configured to perform duplicate removal processing on the labeling information of the text information.
After acquiring the label information corresponding to each sentence of text information in the text, the deduplication unit 22 performs deduplication processing on the label information of the text information.
Specifically, as an embodiment, if the evaluation objects of two adjacent sentences of text information are the same and the label information is the same, the duplication removing unit 22 merges the two sentences of text information and correspondingly merges the corresponding label information;
if the evaluation objects of two adjacent sentences of text information are different, that is, the labeling information is different, the deduplication unit 22 stores the labeling information of the two adjacent sentences of text information respectively.
Thus, the deduplication unit 22 realizes deduplication processing of labeling information of text information.
And the integrating unit 23 is configured to obtain the label information after the deduplication processing as the label information of the text.
After the label information of the text information is deduplicated, the integrating unit 23 obtains label information after deduplication processing as label information of the current text.
Then, the label obtaining module 30 labels the label information of each sentence of text information in the text as a label to a corresponding time slot in the audio file according to the corresponding management relationship between each sentence of text information after the deduplication processing and the audio file, and keeps the label of the voice information of each time slot of the audio file consistent with the label information of the corresponding text information.
Therefore, the tag obtaining module 30 obtains tags of the audio file in each time period, and the tags of the audio file at each time point in the same time period are the same as the time period.
For example, for a sound recording file of a mobile phone release meeting, according to this embodiment, the voice recognition module 10 converts the sound recording file into a text, and the information extraction unit 21 extracts labeled information for each sentence of text information in the text, so as to obtain an evaluation object of each sentence of text information. Then, the deduplication unit 22 performs deduplication processing on the label information of the text information, merges adjacent text information and corresponding label information that are the same as the evaluation object, and the integration unit 23 obtains label information of the text.
Taking an evaluation object of the text information corresponding to the recording file for 0-5 minutes, namely, the annotation information is 'the screen of the mobile phone'; the evaluation object of the text information corresponding to 5-10 minutes, namely the annotation information is 'camera of mobile phone'; and the evaluation object of the text information corresponding to 10-15 minutes, namely the annotation information is 'price of the mobile phone'.
Then, the tag obtaining module 30 obtains the tag of the audio file according to the labeling information of the text and the time correlation information of the sound recording file and the text, and then:
the label of the recording file for 0-5 minutes is 'screen of mobile phone';
the label of the recording file for 5-10 minutes is 'camera of mobile phone';
the label of the recording file for 10-15 minutes is "price of mobile phone".
The sound recording file has the above labels, so that the user can know what the subject of each section of the sound recording file is, and if the subject is the subject in which the user is interested, the user naturally focuses attention.
In this embodiment, the information extraction unit 21 extracts labeling information corresponding to each sentence of text information in the text according to a pre-configured recognition model; the duplication removing unit 22 performs duplication removing processing on the labeling information of the character information; the integrating unit 23 obtains the label information after the deduplication processing as the label information of the text. In the embodiment, sentences are used as units, labeled information of each sentence of text information in the text is extracted respectively, then the labeled information of the text information is subjected to de-duplication processing, adjacent repeated labeled information in the text is effectively removed, adjacent repeated labels in the audio file are correspondingly removed at the same time, adjacent time periods with the same label in the audio file are combined, the label repetition degree of the audio file is reduced, the labels of the audio file are more concise and ordered, and the user experience is improved.
Further, referring to fig. 5, a third embodiment of the audio management device according to the present invention provides an audio management device, based on the embodiment shown in fig. 3 or fig. 4 (taking fig. 3 as an example in this embodiment), the audio management device further includes:
and the association module 40 is configured to establish an association relationship between the tag and the audio file, and link the tag to a time period or a time point corresponding to the audio file.
In this embodiment, after obtaining the tag of the audio file, the association module 40 establishes an association relationship between the tag and the audio file, and links the tag to a time period or a time point corresponding to the audio file. On the basis of the second embodiment or the third embodiment of the audio management device, the embodiment of the invention realizes that the audio file can be connected to the corresponding audio file time point or time period through the tag for playing, so that the tag in the audio file has more practical significance, and the user experience is improved.
Specifically, as an implementation manner, the association module 40 uses the labeled information of each section of text information in the text as a label according to the time association relationship between the text and the audio file, establishes the association relationship between the label and the corresponding time period after labeling the time period corresponding to the audio file, and links each label to the corresponding time period.
As another embodiment, the association module 40 may further respectively establish an association relationship between each tag and any time point in the time period corresponding to the audio file, and link each tag to any time point in the corresponding time period.
For example, each tag is linked to the start time point of the time period corresponding to the audio file.
Therefore, according to each label, the corresponding time slot in the audio file can be linked, and the audio file in the time slot is played; or,
according to each label chain, the corresponding time point of the audio file can be linked, and the audio file is played.
It should be noted that, after the association module 40 establishes the association relationship between the audio file and the tag, the user may adjust the correspondence between the audio file and the tag as needed, and adjust the time point or the time period corresponding to the tag.
Referring to fig. 13, a user may link to a time period corresponding to an audio file to play according to each tag in the tag list; or,
referring to fig. 14, a user may jump to a corresponding time point for playing according to a time axis of an audio file or a tag near a playing progress bar. When the user adjusts the playing progress bar, the association module 40 finds and displays the tag corresponding to the current time point according to the time point of the progress bar and the association relationship between the tag and the audio file.
In this embodiment, after obtaining the tag of the audio file, the association module 40 establishes an association relationship between the tag and the audio file, and links the tag to a time period or a time point corresponding to the audio file. According to the embodiment, the association relationship between the tags and the audio files is established, the tags of the audio files are linked to the time periods or the time points corresponding to the audio files, the time periods or the time points corresponding to the audio files can be skipped to according to the tags for playing, the corresponding tags can be obtained according to the time points of the audio files, the functions of the tags of the audio files are enriched, the user operation is more convenient and faster, and the user experience is improved.
Further, referring to fig. 6, a fourth embodiment of the audio management device according to the present invention provides an audio management device, based on the embodiments shown in fig. 3, fig. 4, or fig. 5 (this embodiment takes fig. 5 as an example), the audio management device further includes:
and the model configuration module 50 is configured to train to obtain the recognition model according to the pre-configured training corpus and the feature template.
In this embodiment, the model configuration module 50 configures training corpora and feature templates in advance, and obtains a recognition model through training, which is used for acquiring text labeling information. On the basis of the first, second and third embodiments of the audio management device of the present invention, the embodiment can adjust the model parameters according to the actual needs by training the recognition model, so that the obtained recognition model can more accurately extract the labeling information of the text, and the accuracy of the audio file label is improved.
Specifically, as an embodiment, the model configuration module 50 configures a preset number of training corpora in advance, where the training corpora are also texts for training. The model configuration module 50 removes the neutral corpus without the explicit evaluation object from the corpus to obtain the available corpus.
Then, the model configuration module 50 obtains the evaluation objects of the available training corpora respectively as the corresponding correct labeling information.
Then, the model configuration module 50 extracts the evaluation object of each available training corpus as the labeling information by using the pre-configured feature template, and trains the pre-configured training model to obtain the recognition model.
It should be noted that the preconfigured training model may be a Support Vector Machine (SVM) model, a Conditional Random Field (CRF) model, and the like, and may be flexibly set as needed.
The recognition model obtained by training the model configuration module 50 can be used to extract the labeling information of the text.
In this embodiment, the model configuration module 50 obtains the recognition model through training according to the pre-configured training corpus and the feature template. According to the embodiment, the recognition model is obtained through the pre-configuration of the training corpus and the feature template, so that the obtained recognition model can more accurately extract the label information of the text, the extraction accuracy of the label information of the text is improved, the accuracy of the label of the audio file is also improved, the accuracy of the label is ensured while the label is automatically obtained, and the user experience is improved.
Further, referring to fig. 7, a fifth embodiment of the audio management apparatus according to the present invention provides an audio management apparatus, based on the above-mentioned embodiment shown in fig. 6, where the model configuration module 50 includes:
the preprocessing unit 51 is configured to preprocess a pre-configured corpus, and acquire correct labeling information of the corpus.
On the basis of the fourth embodiment of the audio management device of the present invention, correct labeling information of the training corpus is pre-configured, and model parameters are corrected in the training process, so that the identification model obtained by training can more accurately extract labeling information meeting the actual requirements of the user.
Specifically, as an embodiment, first, the preprocessing unit 51 preprocesses the corpus, performs subjective detection on the corpus, and removes the neutral corpus without the evaluation object to obtain the usable corpus.
Then, the preprocessing unit 51 performs part-of-speech tagging and dependency relationship analysis on the available corpus, analyzes the sentence structure of the available corpus, and performs word segmentation on the available corpus.
Meanwhile, the preprocessing unit 51 feeds back each available corpus to the tester, and obtains the correct labeling information of each corpus input by the tester.
And the configuration unit 52 is configured to perform feature extraction training on the preprocessed training corpus according to a preconfigured feature template and the correct tagging information, obtain a model parameter, and establish an identification model.
After preprocessing the corpus and acquiring the correct labeling information of the corpus, the configuration unit 52 performs feature extraction training on the preprocessed corpus according to a pre-configured feature template to obtain a parameter model, and establishes an identification model according to model parameters.
Specifically, as an implementation manner, the pre-configured feature template includes features of multiple sentence structure templates, and is used for extracting features of the corpus.
The configuration unit 52 performs feature extraction training on the preprocessed available training corpus by using the pre-configured feature template and training model, and corrects the training process according to the correct labeling information of the available training corpus.
Thus, the configuration unit 52 obtains model parameters of the training model, i.e. weights of the features.
Then, the configuration unit 52 establishes an identification model according to the training model and the model parameters.
Referring to fig. 15, an example will be described in which a recognition model for training and extracting an evaluation target is used as an application scenario.
First, a certain amount of original corpora are configured. The larger the quantity scale of the original corpus is, the more accurate the recognition model obtained by training is.
Then, the original corpus is preprocessed, such as subjectivity detection, part-of-speech analysis, dependency relationship analysis and the like, so as to obtain the available corpus and a correct evaluation object of the available corpus.
And then, using a pre-configured feature template and a training model to perform feature extraction training on the available training corpus, and correcting by using a correct evaluation object of the available training corpus in the training process to obtain an optimal model parameter.
Then, a recognition model is established according to the training model and the model parameters.
Then, the recognition model is subjected to a performance test. Inputting a certain amount of test corpora to perform feature extraction, and obtaining an evaluation object of the test corpora, namely, the annotation information. And acquiring the identification accuracy of the current identification model according to the extracted evaluation object of the test corpus and the preset correct evaluation object because the correct evaluation object of the test corpus is acquired in advance.
If the recognition accuracy of the current recognition model cannot reach the expected value, the recognition model can be corrected by newly training the recognition model in a form of adding the feature template, and the recognition accuracy of the recognition model is improved.
Thereby, the configuration of the recognition model is achieved.
In this embodiment, the preprocessing unit 51 preprocesses the pre-configured corpus to obtain the correct labeling information of the corpus; the configuration unit 52 performs feature extraction training on the preprocessed corpus according to the pre-configured feature templates and the correct labeling information of the corpus to obtain model parameters, and establishes an identification model. According to the embodiment, the feature extraction training is carried out according to the feature template and the training corpus, the optimal model parameters are obtained, the recognition model is established, and the accuracy of the recognition marking information of the recognition model is improved.
Referring to fig. 8, a first embodiment of the audio management method according to the present invention provides an audio management method, which can be implemented by the first embodiment of the audio management apparatus. The audio management method comprises the following steps:
step S10, performing voice recognition on the audio file, and acquiring a text corresponding to the audio file and time association information of the audio file and the text.
According to the invention, by extracting the labeling information of the audio file, the automatic label adding of the audio file is realized, so that a lot of manual intervention is reduced, the management of the voice label is more intelligent and automatic, and the user experience is greatly improved.
The embodiment of the invention can be applied to audio management of various audio files such as recording and the like, and the embodiment takes the recording file recorded by the mobile terminal as the audio file for illustration.
Specifically, as an implementation manner, firstly, the mobile terminal performs voice recognition on an audio file, that is, converts voice information in the audio file into text information, and an obtained text is a text corresponding to the audio file.
It should be noted that, in the process of speech recognition, the speech information in the audio file may be divided in units of sentences, and since the audio file includes one or more sentences of speech information, the recognized text includes one or more sentences of corresponding text information. The voice information in the audio file can be divided by taking a time period with a preset length as a unit, and the recognized text comprises one or more sections of corresponding character information because the audio file comprises one or more sections of voice information.
After segmenting the voice information according to the preset unit, the mobile terminal records the time starting point and the time ending point of each segment of voice information in the audio file to obtain the time information of each segment of voice information, namely dividing the audio file into different time periods by taking a small segment as a unit.
Meanwhile, in the process of voice recognition, the associated information of each section of voice information and the corresponding text information obtained by recognition in the audio file is stored, namely the corresponding relation between each section of voice information and each section of text information is obtained.
Then, the mobile terminal establishes the association relation between each time period of the audio file and each section of character information in the text based on the time axis of the audio file according to the time information of each section of voice information and the corresponding relation between each section of voice information and each section of character information, each time period of the audio file has corresponding text information, and the text information corresponding to each time point in the same time period is the same as the text information corresponding to the time period.
Thereby, the time association relation between the audio file and the text is obtained.
According to the time association relation between the audio file and the text, text information corresponding to each time point or time period in the audio file can be obtained.
And step S20, extracting the labeling information of the text according to the pre-configured recognition model.
After the text corresponding to the audio file and the time association relation between the audio file and the text are obtained, the mobile terminal extracts the labeling information of the text according to a pre-configured recognition model.
In one embodiment, the mobile terminal is configured with a recognition model in advance, and is used for extracting target characteristic information of a text and labeling the text.
Specifically, each section of text information in the text information is used as a corpus, the recognition model is input, and the target characteristic information of each section of text information in the text is recognized and extracted as the labeling information.
The extracted target feature information is feature information of a preset type, for example, an evaluation object in a text is extracted as the target feature information, and an emotion evaluation in the text is extracted as the target feature information.
In this embodiment, target feature information is taken as an evaluation target, and an example will be described. The evaluation object is a subject expressed in a piece of discussion text. Such as discussions about a mobile phone screen, battery, buttons, application software, etc. in product discussions, discussions about director, actors, producers in a movie. The extraction of the evaluation object therefore implies a great commercial value. Such as:
the mobile phone has low screen resolution and is somewhat disappointed.
In this example sentence, it can be seen that the subject in question is the screen resolution, and therefore the evaluation object of this sentence is the "screen resolution".
After extracting the evaluation object of each section of the text information, the mobile terminal takes the evaluation object as target characteristic information and marks the corresponding text information.
For example, the text information that the screen resolution of the mobile phone is low and the target feature information of the text information that the mobile phone is disappointed to some extent is the screen resolution, and the annotated information is the screen resolution.
Therefore, target characteristic information of each section of character information in the text is obtained, namely the labeling information of the text is obtained.
And step S30, acquiring the label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text.
After the label information of the text is obtained, the mobile terminal obtains the label of the audio file according to the label information of the text and the time associated information of the audio file and the text.
Specifically, as an implementation manner, the mobile terminal marks the label information of each section of text information in the text as a label to a time period corresponding to the audio file according to the time association relationship between the text and the audio file, and keeps the label of the voice information of each time period of the audio file consistent with the label information of the corresponding text information.
Therefore, the label of each time period of the audio file is obtained, and the label of each time point in the same time period of the audio file is the same as the time period.
It should be noted that, the user can perform operations such as editing, saving, and deleting on the acquired audio file tag as needed.
Then, the mobile terminal may establish a tag list according to each tag of the audio file and the corresponding time period information, and refer to fig. 13, so that the user can know the evaluation object of each time period of the current audio file according to the tag list.
Referring to fig. 14, in the process of playing the current audio file, the mobile terminal may display the tags near a time axis or a playing progress bar of the audio file, so that the user can know the evaluation objects of the current audio file in each time period according to the tags.
In the embodiment, a text corresponding to an audio file and time associated information of the audio file and the text are acquired by performing voice recognition on the audio file; then, extracting the labeling information of the text according to a pre-configured recognition model; and then, acquiring a label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text. In the embodiment, the corresponding text is obtained by performing voice recognition on the audio file, so that the label of the audio file is added according to the label information of the extracted text; and acquiring time correlation information of the audio file and the text, so that the acquired label is added to a time period or a time point corresponding to the audio file, and the accuracy of the position of the label of the audio file is ensured. From this, this embodiment has realized automatic identification, intelligence and has added the label of audio file, need not that the user is manual to carry out addition and the edition of label to audio file, has promoted user experience.
Further, referring to fig. 9, a second embodiment of the audio management method according to the present invention provides an audio management method, which can be implemented by the second embodiment of the audio management apparatus. Based on the above-mentioned embodiment shown in fig. 8, the step S20 includes:
and step S21, extracting the marking information corresponding to each sentence of character information according to the preset identification model.
On the basis of the first embodiment of the audio management method, the embodiment extracts the labeled information by taking a sentence as a unit, so that the language structure is more consistent, repeated labeled information can be reasonably and effectively removed when the labeled information is de-duplicated, the obtained text labeled information is more accurate, and the text labeled information accords with the semantics of audio.
Specifically, after segmenting the voice information in the audio file by taking a sentence as a unit, the mobile terminal records a time starting point and a time ending point of each sentence of voice information in the audio file to obtain the time information of each sentence of voice information, that is, the audio file is divided into different time periods by taking the sentence as the unit.
Meanwhile, in the process of voice recognition, the associated information of each sentence of voice information in the audio file and the corresponding text information obtained by recognition is stored, namely the corresponding relation between each sentence of voice information and each sentence of text information is obtained.
Then, the mobile terminal establishes the association relation between each time period of the audio file and each text message in the text based on the time axis of the audio file according to the time information of each sentence of voice message and the corresponding relation between each sentence of voice message and each text message, each time period of the audio file has corresponding text message, and the text message corresponding to each time point in the same time period is the same as the text message corresponding to the time period.
Therefore, the obtained time association relationship between the audio file and the text comprises the corresponding relationship between each time period or time point in the audio file and the text information in the text.
Then, the mobile terminal takes each sentence of character information in the text information as a corpus, inputs the recognition model, and recognizes and extracts the target characteristic information of each sentence of character information in the text.
In this embodiment, target feature information is taken as an evaluation target, and an example will be described.
And the mobile terminal extracts an evaluation object of each sentence of character information in the text according to the identification model and uses the evaluation object as the marking information corresponding to each sentence of character information.
And step S22, carrying out duplication elimination processing on the labeling information of the character information.
And after the label information corresponding to each sentence of character information in the text is obtained, carrying out duplication elimination processing on the label information of the character information.
Specifically, as an implementation manner, if the evaluation objects of two adjacent sentences of text information are the same and the labeling information is the same, the two sentences of text information are combined, and the corresponding labeling information is correspondingly combined;
if the evaluation objects of the two adjacent sentences of text information are different, namely the marking information is different, the marking information of the two adjacent sentences of text information is respectively stored.
Therefore, the de-duplication processing of the labeling information of the character information is realized.
And step S23, acquiring the label information after the deduplication processing as the label information of the text.
And after the marked information of the character information is subjected to de-duplication processing, acquiring marked information subjected to de-duplication processing as marked information of the current text.
Then, the mobile terminal marks the label information of each sentence of character information in the text as a label to the corresponding time slot in the audio file according to the corresponding management relation between each sentence of character information after the duplication removal processing and the audio file, and keeps the label of the voice information of each time slot of the audio file consistent with the label information of the corresponding character information.
Therefore, the label of each time period of the audio file is obtained, and the label of each time point in the same time period of the audio file is the same as the time period.
For example, for a sound recording file of a mobile phone conference, according to this embodiment, the sound recording file is first converted into a text, and label information is extracted for each sentence of text information in the text, so as to obtain an evaluation object of each sentence of text information. And then, carrying out duplication elimination processing on the label information of the character information, and combining the adjacent character information with the same evaluation object and the corresponding label information to obtain the label information of the text.
Taking an evaluation object of the text information corresponding to the recording file for 0-5 minutes, namely, the annotation information is 'the screen of the mobile phone'; the evaluation object of the text information corresponding to 5-10 minutes, namely the annotation information is 'camera of mobile phone'; and the evaluation object of the text information corresponding to 10-15 minutes, namely the annotation information is 'price of the mobile phone'.
Then, according to the labeling information of the text and the time correlation information of the sound recording file and the text, acquiring a label of the audio file, and then:
the label of the recording file for 0-5 minutes is 'screen of mobile phone';
the label of the recording file for 5-10 minutes is 'camera of mobile phone';
the label of the recording file for 10-15 minutes is "price of mobile phone".
The sound recording file has the above labels, so that the user can know what the subject of each section of the sound recording file is, and if the subject is the subject in which the user is interested, the user naturally focuses attention.
In this embodiment, according to a pre-configured recognition model, label information corresponding to each sentence of text information in a text is extracted respectively; carrying out duplication elimination processing on the labeling information of the character information; and acquiring the marked information which is subjected to the de-duplication processing and is used as the marked information of the text. In the embodiment, sentences are used as units, labeled information of each sentence of text information in the text is extracted respectively, then the labeled information of the text information is subjected to de-duplication processing, adjacent repeated labeled information in the text is effectively removed, adjacent repeated labels in the audio file are correspondingly removed at the same time, adjacent time periods with the same label in the audio file are combined, the label repetition degree of the audio file is reduced, the labels of the audio file are more concise and ordered, and the user experience is improved.
Further, referring to fig. 10, a third embodiment of the audio management method according to the present invention provides an audio management method, which can be implemented by the third embodiment of the audio management apparatus. Based on the embodiment shown in fig. 8 or fig. 9 (this embodiment takes fig. 8 as an example), after the step of S30, the method further includes:
step S40, establishing the incidence relation between the label and the audio file, and linking the label to the time period or the time point corresponding to the audio file.
In this embodiment, after the tag of the audio file is acquired, the mobile terminal establishes an association relationship between the tag and the audio file, and links the tag to a time period or a time point corresponding to the audio file. On the basis of the second embodiment or the third embodiment of the audio management method, the embodiment of the invention realizes that the audio file can be connected to the corresponding audio file time point or time period through the label for playing, so that the label in the audio file has more practical significance, and the user experience is improved.
Specifically, as an implementation manner, the mobile terminal uses the label information of each section of text information in the text as a label according to the time association relationship between the text and the audio file, establishes the association relationship between the label and the corresponding time period after labeling the time period corresponding to the audio file, and links each label to the corresponding time period.
As another embodiment, the mobile terminal may further respectively establish an association relationship between each tag and any time point in a time period corresponding to the audio file, and link each tag to any time point in the corresponding time period.
For example, each tag is linked to the start time point of the time period corresponding to the audio file.
Therefore, according to each label, the corresponding time slot in the audio file can be linked, and the audio file in the time slot is played; or,
according to each label chain, the corresponding time point of the audio file can be linked, and the audio file is played.
It should be noted that after the association relationship between the audio file and the tag is established, the user may adjust the correspondence between the audio file and the tag as needed, and adjust the time point or the time period corresponding to the tag.
Referring to fig. 13, a user may link to a time period corresponding to an audio file to play according to each tag in the tag list; or,
referring to fig. 14, a user may jump to a corresponding time point for playing according to a time axis of an audio file or a tag near a playing progress bar. When the user adjusts the playing progress bar, the mobile terminal finds the label corresponding to the current time point according to the time point of the progress bar and the association relation between the label and the audio file, and displays the label.
In this embodiment, after the tag of the audio file is acquired, an association relationship between the tag and the audio file is established, and the tag is linked to a time period or a time point corresponding to the audio file. According to the embodiment, the association relationship between the tags and the audio files is established, the tags of the audio files are linked to the time periods or the time points corresponding to the audio files, the time periods or the time points corresponding to the audio files can be skipped to according to the tags for playing, the corresponding tags can be obtained according to the time points of the audio files, the functions of the tags of the audio files are enriched, the user operation is more convenient and faster, and the user experience is improved.
Further, referring to fig. 11, a fourth embodiment of the audio management method according to the present invention provides an audio management method, which can be implemented by the fourth embodiment of the audio management apparatus. Based on the embodiment shown in fig. 10, before the step S10, the method further includes:
and step S50, training to obtain the recognition model according to the pre-configured training corpus and the feature template.
In this embodiment, a training corpus and a feature template are configured in advance, and a recognition model is obtained through training and used for acquiring text labeling information. On the basis of the first embodiment, the second embodiment and the third embodiment of the audio management method, the embodiment can adjust the model parameters according to actual needs by training the recognition model, so that the obtained recognition model can more accurately extract the labeling information of the text, and the accuracy of the audio file label is improved.
Specifically, as an implementation manner, a preset number of training corpora are configured in advance, and the training corpora are also texts used for training. And removing the neutral corpus without a clear evaluation object from the training corpus to obtain the available training corpus.
Then, the evaluation objects of the available training corpora are respectively obtained as the corresponding correct marking information.
And then, extracting the evaluation object of each available training corpus as the labeling information by using a pre-configured characteristic template, and training a pre-configured training model to obtain the recognition model.
It should be noted that the preconfigured training model may be a Support Vector Machine (SVM) model, a Conditional Random Field (CRF) model, and the like, and may be flexibly set as needed.
The trained recognition model can be used for extracting the labeling information of the text.
In this embodiment, the recognition model is obtained by training according to a pre-configured training corpus and a feature template. According to the embodiment, the recognition model is obtained through the pre-configuration of the training corpus and the feature template, so that the obtained recognition model can more accurately extract the label information of the text, the extraction accuracy of the label information of the text is improved, the accuracy of the label of the audio file is also improved, the accuracy of the label is ensured while the label is automatically obtained, and the user experience is improved.
Further, referring to fig. 12, a fifth embodiment of the audio management method according to the present invention provides an audio management method, which can be implemented by the fifth embodiment of the audio management apparatus. Based on the above-mentioned embodiment shown in fig. 11, the step S50 includes:
and step S51, preprocessing the pre-configured corpus to obtain the correct labeling information of the corpus.
On the basis of the fourth embodiment of the audio management method, correct labeling information of the training corpus is configured in advance, and model parameters are corrected in the training process, so that the identification model obtained by training can more accurately extract labeling information meeting the actual requirements of the user.
Specifically, as an implementation manner, first, the corpus is preprocessed, subjective detection of the corpus is performed, and the neutral corpus without the evaluation object is removed to obtain the available corpus.
Then, part-of-speech tagging and dependency relationship analysis are performed on the available training corpora, sentence structures of the available training corpora are analyzed, and word segmentation is performed on the available training corpora.
Meanwhile, feeding back each available corpus to the tester to obtain the correct labeling information of each corpus input by the tester.
And step S52, according to the pre-configured feature template and the correct marking information, performing feature extraction training on the preprocessed training corpus to obtain model parameters, and establishing an identification model.
After preprocessing the training corpus and acquiring correct labeling information of the training corpus, performing feature extraction training on the preprocessed training corpus according to a pre-configured feature template to obtain a parameter model, and establishing an identification model according to model parameters.
Specifically, as an implementation manner, the pre-configured feature template includes features of multiple sentence structure templates, and is used for extracting features of the corpus.
And performing feature extraction training on the preprocessed available training corpus by using a pre-configured feature template and a training model, and correcting the training process according to correct labeling information of the available training corpus.
Thus, model parameters of the training model, namely the weight of each feature, are obtained.
And then, establishing to obtain a recognition model according to the training model and the model parameters.
Referring to fig. 15, an example will be described in which a recognition model for training and extracting an evaluation target is used as an application scenario.
First, a certain amount of original corpora are configured. The larger the quantity scale of the original corpus is, the more accurate the recognition model obtained by training is.
Then, the original corpus is preprocessed, such as subjectivity detection, part-of-speech analysis, dependency relationship analysis and the like, so as to obtain the available corpus and a correct evaluation object of the available corpus.
And then, using a pre-configured feature template and a training model to perform feature extraction training on the available training corpus, and correcting by using a correct evaluation object of the available training corpus in the training process to obtain an optimal model parameter.
Then, a recognition model is established according to the training model and the model parameters.
Then, the recognition model is subjected to a performance test. Inputting a certain amount of test corpora to perform feature extraction, and obtaining an evaluation object of the test corpora, namely, the annotation information. And acquiring the identification accuracy of the current identification model according to the extracted evaluation object of the test corpus and the preset correct evaluation object because the correct evaluation object of the test corpus is acquired in advance.
If the recognition accuracy of the current recognition model cannot reach the expected value, the recognition model can be corrected by newly training the recognition model in a form of adding the feature template, and the recognition accuracy of the recognition model is improved.
Thereby, the configuration of the recognition model is achieved.
In this embodiment, pre-processing is performed on a pre-configured corpus to obtain correct labeling information of the corpus; and according to the pre-configured feature template and the correct marking information of the training corpus, performing feature extraction training on the preprocessed training corpus to obtain model parameters, and establishing a recognition model. According to the embodiment, the feature extraction training is carried out according to the feature template and the training corpus, the optimal model parameters are obtained, the recognition model is established, and the accuracy of the recognition marking information of the recognition model is improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only an alternative embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. An audio management apparatus, characterized in that the audio management apparatus comprises:
the voice recognition module is used for segmenting an audio file according to a preset unit, performing voice recognition on the segmented audio file and acquiring a text corresponding to the audio file and time associated information of the audio file and the text;
the information extraction module is used for extracting target characteristic information of each section of character information in the text as labeling information according to a pre-configured recognition model;
the label obtaining module is used for obtaining a label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text;
wherein the information extraction module comprises:
the information extraction unit is used for respectively extracting target characteristic information corresponding to each sentence of character information in the text according to a pre-configured identification model to be used as the marking information of each sentence of character information;
the duplication removing unit is used for carrying out duplication removing processing on the labeling information of the character information;
and the integration unit is used for acquiring the marked information subjected to the duplicate removal processing as the marked information of the text.
2. The audio management device of claim 1, wherein the audio management device further comprises:
and the association module is used for establishing the association relationship between the label and the audio file and linking the label to the time period or the time point corresponding to the audio file.
3. The audio management device of claim 2, wherein the audio management device further comprises:
and the model configuration module is used for training to obtain the recognition model according to the pre-configured training corpus and the feature template.
4. The audio management device of claim 3, wherein the model configuration module comprises:
the preprocessing unit is used for preprocessing the pre-configured training corpus to acquire correct labeling information of the training corpus;
and the configuration unit is used for performing feature extraction training on the preprocessed training corpus according to a pre-configured feature template and the correct marking information to obtain model parameters and establish an identification model.
5. An audio management method, characterized in that the audio management method comprises the steps of:
segmenting an audio file according to a preset unit, and performing voice recognition on the segmented audio file to acquire a text corresponding to the audio file and time associated information of the audio file and the text;
extracting target characteristic information of each section of character information in the text as labeling information according to a pre-configured recognition model;
acquiring a label of the audio file according to the labeling information of the text and the time correlation information of the audio file and the text;
the step of extracting target feature information of each section of text information in the text as labeling information according to a pre-configured recognition model comprises the following steps:
respectively extracting target characteristic information corresponding to each sentence of character information in the text according to a pre-configured identification model, and using the target characteristic information as the marking information of each sentence of character information;
carrying out duplication elimination processing on the labeling information of the character information;
and acquiring the marked information subjected to the duplicate removal processing as the marked information of the text.
6. The audio management method according to claim 5, wherein after the step of obtaining the label of the audio file according to the labeling information of the text and the time-related information of the audio file and the text, the audio management method further comprises:
and establishing an incidence relation between the label and the audio file, and linking the label to a time period or a time point corresponding to the audio file.
7. The audio management method according to claim 6, wherein before the steps of segmenting the audio file according to the preset unit, performing speech recognition on the segmented audio file, and acquiring the text corresponding to the audio file and the time-related information between the audio file and the text, the audio management method further comprises:
and training to obtain the recognition model according to the preset training corpus and the feature template.
8. The audio management method according to claim 7, wherein the step of training the recognition model according to the pre-configured corpus and feature templates comprises:
preprocessing a pre-configured training corpus to acquire correct labeling information of the training corpus;
and according to a pre-configured feature template and the correct marking information, performing feature extraction training on the preprocessed training corpus to obtain model parameters, and establishing a recognition model.
CN201610339908.4A 2016-05-19 2016-05-19 Audio frequency controller device and method Active CN105913838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610339908.4A CN105913838B (en) 2016-05-19 2016-05-19 Audio frequency controller device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610339908.4A CN105913838B (en) 2016-05-19 2016-05-19 Audio frequency controller device and method

Publications (2)

Publication Number Publication Date
CN105913838A CN105913838A (en) 2016-08-31
CN105913838B true CN105913838B (en) 2019-11-05

Family

ID=56748495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610339908.4A Active CN105913838B (en) 2016-05-19 2016-05-19 Audio frequency controller device and method

Country Status (1)

Country Link
CN (1) CN105913838B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412705A (en) * 2016-09-13 2017-02-15 努比亚技术有限公司 Method for adjusting file progress and terminal
CN107342086A (en) * 2017-06-16 2017-11-10 北京云知声信息技术有限公司 Method of speech processing and device
CN109559764A (en) * 2017-09-27 2019-04-02 北京国双科技有限公司 The treating method and apparatus of audio file
CN110322587B (en) * 2018-03-28 2021-02-05 广州汽车集团股份有限公司 Evaluation recording method, device and equipment in driving process and storage medium
CN109344253A (en) * 2018-09-18 2019-02-15 平安科技(深圳)有限公司 Add method, apparatus, computer equipment and the storage medium of user tag
CN109447863A (en) * 2018-10-23 2019-03-08 广州努比互联网科技有限公司 A kind of 4MAT real-time analysis method and system
CN109599095B (en) * 2018-11-21 2020-05-29 百度在线网络技术(北京)有限公司 Method, device and equipment for marking voice data and computer storage medium
CN109547847B (en) * 2018-11-22 2021-10-22 广州酷狗计算机科技有限公司 Method and device for adding video information and computer readable storage medium
CN111353065A (en) * 2018-12-20 2020-06-30 北京嘀嘀无限科技发展有限公司 Voice archive storage method, device, equipment and computer readable storage medium
CN113571061A (en) * 2020-04-28 2021-10-29 阿里巴巴集团控股有限公司 System, method, device and equipment for editing voice transcription text
CN113643691A (en) * 2021-08-16 2021-11-12 思必驰科技股份有限公司 Far-field voice message interaction method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100565532C (en) * 2008-05-28 2009-12-02 叶睿智 A kind of multimedia resource search method based on the audio content retrieval
CN101763370A (en) * 2008-12-08 2010-06-30 新奥特硅谷视频技术有限责任公司 Method for establishing tags for video and audio data and device therefor
JP2013084074A (en) * 2011-10-07 2013-05-09 Sony Corp Information processing device, information processing server, information processing method, information extracting method and program
CN103065625A (en) * 2012-12-25 2013-04-24 广东欧珀移动通信有限公司 Method and device for adding digital voice tag
CN105138670B (en) * 2015-09-06 2018-12-14 天翼爱音乐文化科技有限公司 Audio file label generating method and system

Also Published As

Publication number Publication date
CN105913838A (en) 2016-08-31

Similar Documents

Publication Publication Date Title
CN105913838B (en) Audio frequency controller device and method
CN106888158B (en) Instant messaging method and device
KR101466027B1 (en) Mobile terminal and its call contents management method
CN105100892B (en) Video play device and method
CN106657650B (en) System expression recommendation method and device and terminal
KR101604692B1 (en) Mobile terminal and method for controlling the same
CN104917896A (en) Data pushing method and terminal equipment
US20090158154A1 (en) Mobile terminal and method of playing data therein
KR101348722B1 (en) Mobile terminal and video file editing mwthod thereof
CN109844734B (en) Picture file management method, terminal and computer storage medium
CN106157970B (en) Audio identification method and terminal
CN106778887B (en) Terminal and method for determining sentence mark sequence based on conditional random field
CN105893498A (en) Method and device for achieving screen capture and method and device for searching for images
CN106506868B (en) Music recommendation method and terminal
CN106250268B (en) Content of text recovery device and method
CN106598538B (en) Instruction set updating method and system
CN105096962B (en) A kind of information processing method and terminal
CN106024013B (en) Voice data searching method and system
CN106790942A (en) Voice messaging intelligence store method and device
CN111160047A (en) Data processing method and device and data processing device
CN104967749A (en) Device and method for processing picture and text information
CN106055611A (en) Display device for scanning processes, mobile terminal and method
CN106357929A (en) Previewing method based on audio file and mobile terminal
CN107329958B (en) Language conversion method and device based on webpage
CN106385502B (en) Photo arrangement method and mobile terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant