CN110895654A - Segmentation method, segmentation system and non-transitory computer readable medium - Google Patents

Segmentation method, segmentation system and non-transitory computer readable medium Download PDF

Info

Publication number
CN110895654A
CN110895654A CN201910105172.8A CN201910105172A CN110895654A CN 110895654 A CN110895654 A CN 110895654A CN 201910105172 A CN201910105172 A CN 201910105172A CN 110895654 A CN110895654 A CN 110895654A
Authority
CN
China
Prior art keywords
caption
paragraph
sentence
segmentation
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910105172.8A
Other languages
Chinese (zh)
Other versions
CN110895654B (en
Inventor
蓝国诚
詹诗涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Delta Electronics Inc
Original Assignee
Delta Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Delta Electronics Inc filed Critical Delta Electronics Inc
Priority to SG10201905236WA priority Critical patent/SG10201905236WA/en
Publication of CN110895654A publication Critical patent/CN110895654A/en
Application granted granted Critical
Publication of CN110895654B publication Critical patent/CN110895654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/437Administration of user profiles, e.g. generation, initialisation, adaptation, distribution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure relates to a segmentation method, a segmentation system, and a non-transitory computer readable medium. The segmentation method comprises the following steps: receiving subtitle information; wherein, the caption information comprises a plurality of caption sentences; selecting caption sentences according to a set value, and dividing the selected caption sentences into a first paragraph; common segmented vocabulary judgment is carried out aiming at the first caption sentence; wherein the first caption sentence is one of the caption sentences; and generating a second paragraph or merging the first caption sentence into the first paragraph according to the judgment result of the common segmented vocabulary judgment.

Description

Segmentation method, segmentation system and non-transitory computer readable medium
Technical Field
The present disclosure relates to a segmentation method, a segmentation system and a non-transitory computer readable medium, and more particularly, to a segmentation method, a segmentation system and a non-transitory computer readable medium for subtitles.
Background
The on-line learning platform is a network service that stores a plurality of learning data in a server, so that a user can connect to the server through the internet to browse the learning data at any time. In the existing various online learning platforms, the types of learning materials provided include films, audios, presentations, documents or forums.
Because the amount of learning materials stored in the online learning platform is huge, in order to facilitate the use of users, the characters of the learning materials need to be automatically segmented and paragraph keywords need to be established. Therefore, how to process according to the difference between contents of the learning movie to achieve the functions of segmenting similar subjects in the learning movie and labeling keywords is a problem to be solved in the field.
Disclosure of Invention
A first aspect of the present disclosure is to provide a segmentation method. The segmentation method comprises the following steps: receiving subtitle information; wherein, the caption information comprises a plurality of caption sentences; selecting caption sentences according to a set value, and dividing the selected caption sentences into a first paragraph; common segmented vocabulary judgment is carried out aiming at the first caption sentence; wherein the first caption sentence is one of the caption sentences; and generating a second paragraph or merging the first caption sentence into the first paragraph according to the judgment result of the common segmented vocabulary judgment.
A second aspect of the present disclosure is to provide a segmentation system, which includes a storage unit and a processor. The storage unit is used for storing the caption information, the segmentation result, the annotation corresponding to the first paragraph and the annotation corresponding to the second paragraph. The processor is electrically connected with the storage unit and used for receiving the caption information; wherein, the subtitle information contains a plurality of subtitle sentences, and the processor comprises: a segmentation unit, a common word detection unit, and a paragraph generation unit. The segmentation unit is used for selecting the caption sentences according to a specific sequence by using a set value and dividing the selected caption sentences into first paragraphs. The common word detection unit is electrically connected with the segmentation unit and is used for judging common segmented words aiming at the first caption sentence; wherein the first caption sentence is one of the plurality of caption sentences. The paragraph generating unit is electrically connected with the common word detecting unit and used for generating a second paragraph or merging the first caption sentence into the first paragraph according to the judgment result of the common segmented word judgment.
In a third aspect, the present application provides a non-transitory computer readable medium containing at least one program of instructions for execution by a processor to perform a segmentation method, comprising: receiving subtitle information; wherein, the caption information comprises a plurality of caption sentences; selecting caption sentences according to a set value, and dividing the selected caption sentences into a first paragraph; common segmented vocabulary judgment is carried out aiming at the first caption sentence; wherein the first caption sentence is one of the caption sentences; and generating a second paragraph or merging the first caption sentence into the first paragraph according to the judgment result of the common segmented vocabulary judgment.
The present disclosure relates to a segmentation method, a segmentation system and a non-transitory computer readable medium, which mainly solve the problem of consuming a lot of labor and time for marking film segments manually. The method comprises the steps of firstly calculating keywords corresponding to each caption sentence, carrying out common segmentation vocabulary judgment on the caption sentences, generating a second paragraph or merging the first caption sentence into the first paragraph according to the judgment result of the common segmentation vocabulary judgment to generate a segmentation result, and achieving the functions of segmenting similar subjects in a learning film and labeling the keywords.
Drawings
In order to make the aforementioned and other objects, features, advantages and embodiments of the present disclosure more comprehensible, the following description is made with reference to the accompanying drawings:
FIG. 1 is a schematic diagram of a segmentation system depicted in accordance with some embodiments of the present application;
FIG. 2 is a flow diagram of a segmentation method according to some embodiments of the present application;
fig. 3 is a flowchart of step S240 according to some embodiments of the present application;
fig. 4 is a flowchart of step S241 according to some embodiments of the present application; and
fig. 5 is a flowchart of step S242 according to some embodiments of the present application.
[ description of reference ]
100: segmentation system
110: memory cell
130: processor with a memory having a plurality of memory cells
DB 1: common segmented vocabulary database
DB 2: course database
131: keyword extraction unit
132: segmentation unit
133: common word detection unit
134: paragraph generation unit
135: annotation generating unit
200: segmentation method
S210 to S250, S241 to S242, S2411 to S2413, S2421 to S2423: step (ii) of
Detailed Description
Reference will now be made in detail to the present embodiments of the present application, examples of which are illustrated in the accompanying drawings. It should be understood, however, that these implementation details should not be used to limit the application. That is, in some embodiments of the disclosure, such practical details are not necessary. In addition, for simplicity, some conventional structures and elements are shown in the drawings in a simple schematic manner.
When an element is referred to as being "connected" or "coupled," it can be referred to as being "electrically connected" or "electrically coupled. "connected" or "coupled" may also be used to indicate that two or more elements are in mutual engagement or interaction. Moreover, although terms such as "first," "second," …, etc., may be used herein to describe various elements, these terms are used merely to distinguish one element or operation from another element or operation described in similar technical terms. Unless the context clearly dictates otherwise, the terms do not specifically refer or imply an order or sequence nor are they intended to limit the invention.
Please refer to fig. 1. Fig. 1 is a schematic diagram of a segmentation system 100 depicted in accordance with some embodiments of the present application. As shown in fig. 1, the segmentation system 100 includes a storage unit 110 and a processor 130. The storage unit 110 is electrically connected to the processor 130, and the storage unit 110 is used for storing the subtitle information, the segmentation result, the common segmentation vocabulary database DB1, the course database DB2, the annotation corresponding to the first paragraph, and the annotation corresponding to the second paragraph.
As described above, the processor 130 includes the keyword extracting unit 131, the segmenting unit 132, the common word detecting unit 133, the paragraph generating unit 134, and the annotation generating unit 135. The segmentation unit 132 is electrically connected to the keyword extraction unit 131 and the common word detection unit 133, the paragraph generation unit 134 is electrically connected to the common word detection unit 133 and the annotation generation unit 135, and the common word detection unit 133 is electrically connected to the annotation generation unit 135.
In various embodiments of the present invention, the storage device 110 can be implemented as a storage device, a hard disk, a portable disk, a memory card, etc. The processor 130 may be implemented as an integrated circuit such as a micro control unit (microcontroller), a microprocessor (microprocessor), a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (ASIC), a logic circuit, or other similar components or combinations thereof.
Please refer to fig. 2. Fig. 2 is a flow diagram of a segmentation method 200 depicted in accordance with some embodiments of the present application. In an embodiment, the segmentation method 200 shown in fig. 2 can be applied to the segmentation system 100 of fig. 1, and the processor 130 is configured to segment the subtitle information according to the following steps of the segmentation method 200 to generate a segmentation result and an annotation corresponding to each paragraph. As shown in fig. 2, the segmentation method 200 first performs step S210 to receive the subtitle information. In one embodiment, the caption information includes a plurality of caption sentences. For example, the subtitle information is a subtitle file of a movie, the subtitle file of the movie divides the content of the movie into a plurality of subtitle sentences according to the playing time of the movie, and the subtitle sentences are also sorted according to the playing time of the movie.
Next, the segmentation method 200 performs step S220 to select a caption sentence according to a setting value, and divides the selected caption sentence into current paragraphs. In an embodiment, the setting value may be any positive integer, where the setting value takes 3 as an example, so that 3 subtitles are selected to form the current paragraph in this step according to the time of playing the movie. For example, if there are N subtitles sentences in total, the 1 st to 3 rd subtitles sentences can be selected to form the current paragraph.
Next, the segmentation method 200 performs step S230 to perform a common segmentation vocabulary determination for the current caption sentence. In one embodiment, the common segmented vocabulary is stored in the common segmented vocabulary database DB1, and the common word detecting unit 133 detects whether the common segmented vocabulary exists. Common segmented words can be divided into common beginning words and common ending words. For example, common beginning words can be "next," "begin description," etc., and common ending words can be "described above to this," "paragraphs from this today to this," etc. In this step, it is detected whether a common segmented vocabulary is present and the common segmented vocabulary type (common beginning vocabulary or common ending vocabulary) is present.
Next, the segmentation method 200 performs step S240 to generate a next paragraph or incorporate the current caption sentence into the current paragraph according to the judgment result of the common segmentation vocabulary judgment. In one embodiment, it is determined whether to generate a new paragraph or to incorporate the currently executed caption sentence into the current paragraph according to the detection result of the aforementioned common word detection unit 133. For example, the current paragraph is composed of the 1 st caption sentence to the 3 rd caption sentence, the currently executed caption sentence may be the 4 th caption sentence, and the 4 th caption sentence may be merged into the current paragraph or the 4 th caption sentence may be used as the start of a new paragraph according to the determination result.
As mentioned above, after the current caption sentence is merged into the current paragraph in step S240, the common segmented vocabulary determination of the next caption sentence is performed, so the determination in step S230 is performed again. For example, if the 4 th caption sentence is merged into the current paragraph, the usual segmentation vocabulary judgment of the 5 th caption sentence is performed. If the next paragraph is generated after the execution of step S240, the following steps are executed to select the caption sentence according to the specific sequence by using the setting value, and the selected caption sentence is divided into the next paragraph, so the operation of step S220 is executed again. For example, if the 4 th caption sentence is classified as being behind the next paragraph, the 5 th caption sentence, the 6 th caption sentence and the 7 th caption sentence are reselected to be added to the next paragraph. Therefore, the segmentation operation is repeatedly executed until the caption sentence is segmented, and finally, a segmentation result is generated.
Next, the step S240 further includes steps S241 to S242, please refer to fig. 3, and fig. 3 is a flowchart of the step S240 according to some embodiments of the present disclosure. As shown in fig. 3, the segmentation method 200 further performs step S241 to segment the current caption sentence into a next paragraph if the current caption sentence is associated with the common segmented vocabulary, and selects the caption sentence according to a specific sequence by using a setting value, and adds the selected caption sentence into the next paragraph. The step S241 further includes steps S2411 to S2413, please further refer to fig. 4, and fig. 4 is a flowchart of the step S241 according to some embodiments of the present disclosure. As shown in fig. 4, the segmentation method 200 further performs step S2411 to determine whether the current subtitle sentence is associated with one of the beginning segmented word and the ending segmented word according to the determination result. Following the above embodiment, according to the determination result of step S230, it can be determined whether the current subtitle sentence is associated with the beginning segmented vocabulary or the ending segmented vocabulary.
In light of the above, the segmentation method 200 further performs step S2412, if the current caption sentence is associated with the beginning segmentation vocabulary, the current caption sentence is used as the starting sentence of the next paragraph. For example, if the 4 th caption sentence has the word "next" in the foregoing judgment result, the 4 th caption is used as the starting sentence of the next paragraph.
In light of the above, the segmentation method 200 further performs step S2413, if the current caption sentence is associated with the ending segmentation vocabulary, the current caption sentence is used as the ending sentence of the current paragraph. For example, if the 4 th caption sentence is detected to have the word "described above, that is, the 4 th caption is used as the final sentence of the current paragraph. After the operation of step S241 is completed, the selection of the caption sentence according to the specific sequence by using the setting value is performed, and the selected caption sentence is divided into the next paragraph, so the operation of step S220 is performed repeatedly, which is not described herein again.
Next, the segmentation method 200 further performs step S242, if the current caption sentence is not associated with the common segmentation vocabulary, the current caption sentence and the current paragraph are subjected to similarity value calculation, and if the current caption sentence is similar to the common segmentation vocabulary, the first caption sentence is merged into the current paragraph. Step S242 further includes steps S2421 to S2423, please further refer to fig. 5, and fig. 5 is a flowchart of step S242 according to some embodiments of the present disclosure. As shown in fig. 5, the segmentation method 200 further performs step S2421 to compare whether a difference between at least one feature corresponding to the current caption sentence and at least one feature corresponding to the current paragraph is greater than a threshold value.
In another embodiment, the method further includes extracting a plurality of keywords from the caption sentence, where the extracted keywords are at least one feature corresponding to the current caption sentence. And calculating the keywords corresponding to the caption sentences by using a TF-IDF statistical method (Term Frequency-Inverse document Frequency). The TF-IDF statistical method is used to evaluate the importance of a word to a document in a database, where the importance of a word increases in direct proportion to the number of times it appears in the document, but decreases in inverse proportion to the frequency with which it appears in the database. In this embodiment, the TF-IDF statistical method may calculate the keywords of the current caption sentence. Then, a similarity value between at least one feature (keyword) of the current caption sentence and at least one feature (keyword) of the current paragraph is calculated, and it can be determined that the current caption sentence is closer to the content of the current paragraph as the calculated similarity value is higher.
In light of the above, the segmentation method 200 further performs step S2422, and if the difference value is smaller than the threshold value, the current subtitle sentence is merged into the current paragraph. In an embodiment, the threshold value is used to filter the similarity value, and when the similarity value is not less than the threshold value, it indicates that the current caption sentence is similar to the content of the current paragraph, so that the current caption sentence can be merged into the current paragraph. For example, if the similarity value between the 4 th caption sentence and the current paragraph is not less than the threshold value, it indicates that the content of the 4 th caption sentence is similar to that of the current paragraph, so the 4 th caption sentence can be added to the current paragraph.
In the above, the segmentation method 200 further performs step S2423, if the difference value is not less than the threshold value, the current caption sentence is used as the starting sentence of the next paragraph, and the caption sentences are selected according to the specific sequence by using the set value, so as to divide the selected caption sentences into the next paragraph. For example, when the similarity value is smaller than the threshold value, it indicates that the current caption sentence is different from the content of the current paragraph, and therefore the current caption sentence is determined as the starting sentence of the second paragraph. For example, if the similarity value between the 4 th caption sentence and the current paragraph is smaller than the threshold value, it indicates that the content of the 4 th caption sentence is different from that of the current paragraph, and therefore the 4 th caption sentence is used as the starting sentence of the next paragraph. After the operation of step S252 is completed, the selection of the caption sentence according to the specific sequence by using the setting value is performed, and the selected caption sentence is divided into the next paragraphs, so the operation of step S230 is performed repeatedly, which is not described herein again.
From the above segmentation operation, it can be known that, after each segmentation calculation of one caption sentence, the segmentation calculation of the next caption sentence is executed, until all caption sentences are executed, if the number of the remaining caption sentences is less than the set value, the segmentation calculation may not be performed on the remaining caption sentences, but the remaining caption sentences are directly merged into the current paragraph, for example, if the number of the remaining caption sentences is 2, the remaining caption sentences is less than the set value (the set value is set to 3), so the remaining 2 caption sentences may be merged into the current paragraph.
Then, after the above-mentioned segmentation step is performed, the segmentation method 200 performs step S250 to generate annotations corresponding to the paragraphs. For example, if the caption sentence is divided into 3 paragraphs after the caption sentence is executed, the annotations of the 3 paragraphs are calculated respectively, and the annotations may be generated according to the keywords corresponding to the caption sentence in the paragraph. Finally, the divided paragraphs and the annotations corresponding to the paragraphs are stored in the course database DB2 of the storage unit 110. For example, if the difference value is smaller than the threshold value, it indicates that the current caption sentence is similar to the current paragraph, and therefore the keyword of the caption sentence can be used as at least one feature corresponding to the current paragraph. If the difference value is not less than the threshold value, it indicates that the current caption sentence is not similar to the current paragraph, so the keyword of the caption sentence can be used as at least one feature corresponding to the next paragraph.
According to the embodiments of the present application, the problem that a lot of labor and time are consumed to mark a film paragraph manually in the past is mainly solved. Firstly, calculating the key words corresponding to each caption sentence, judging common segmented words aiming at the caption sentences, generating the next paragraph or merging the first caption sentence into the current paragraph according to the judgment result of the common segmented words to generate the segmentation result, and achieving the functions of segmenting similar subjects in the learning film and marking the key words
Additionally, the above illustration includes exemplary steps in sequential order, but the steps need not be performed in the order shown. It is within the contemplation of the disclosure that these steps may be performed in a different order. Steps may be added, substituted, changed in order, and/or omitted as appropriate within the spirit and scope of embodiments of the disclosure.
Although the present disclosure has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made by one skilled in the art without departing from the spirit and scope of the disclosure, and therefore, the scope of the disclosure should be determined by that of the appended claims.

Claims (17)

1. A segmentation method, comprising:
receiving subtitle information; wherein, the caption information comprises a plurality of caption sentences;
selecting the plurality of caption sentences according to a set value, and dividing the selected caption sentences into a first paragraph;
performing common segmentation vocabulary judgment on a first caption sentence; wherein the first caption sentence is one of the plurality of caption sentences; and
and generating a second paragraph or merging the first caption sentence into the first paragraph according to a judgment result of the common segmentation vocabulary judgment.
2. The segmentation method according to claim 1, wherein the common segmentation vocabulary determination is performed for a second caption sentence after the first caption sentence is merged into the first segment; wherein the second caption sentence follows the first caption sentence according to a specific sequence.
3. The segmentation method of claim 1, wherein when the second paragraph is generated, the caption sentences are selected according to a specific sequence by using the setting value, and the selected caption sentences are added to the second paragraph.
4. The segmentation method according to claim 1, wherein the generating the second paragraph or incorporating the first caption sentence into the first paragraph according to the judgment result of the common segmented vocabulary judgment further comprises:
if the first caption sentence is associated with the common segmentation vocabulary, performing segmentation processing to generate a second paragraph, selecting the plurality of caption sentences according to a specific sequence by using the set value, and adding the selected caption sentences into the second paragraph; and
and if the first caption sentence is not associated with the common segmented vocabulary, performing similarity value calculation on the first caption sentence and the first paragraph, and if the first caption sentence is similar to the common segmented vocabulary, merging the first caption sentence into the first paragraph.
5. The segmentation method of claim 4, wherein the segmentation process comprises:
determining whether the first caption sentence is associated with one of a beginning segmentation vocabulary and an ending segmentation vocabulary according to the judgment result;
if the first caption sentence is associated with the beginning segmentation vocabulary, taking the first caption sentence as the starting sentence of the second paragraph; and
and if the first caption sentence is associated with the ending segmented word, taking the first caption sentence as the ending sentence of the first segment.
6. The segmentation method of claim 4, wherein the similarity value calculation comprises:
comparing whether a difference value between at least one characteristic corresponding to the first caption sentence and at least one characteristic corresponding to the first paragraph is larger than a threshold value;
if the difference value is smaller than the threshold value, merging the first caption sentence into the first paragraph; and
if the difference value is not less than the threshold value, the first caption sentence is used as the initial sentence of the second paragraph, and the plurality of caption sentences are selected according to the specific sequence by using the set value, so that the selected caption sentences are divided into the second paragraph.
7. The segmentation method of claim 6, wherein a plurality of keywords are extracted from the plurality of caption sentences, the plurality of keywords being at least one feature corresponding to the first caption sentence.
8. The segmentation method of claim 7, wherein the at least one feature corresponding to the first paragraph is generated from the keywords extracted from the caption sentences in the first paragraph.
9. A segmentation system, comprising:
the storage unit is used for storing subtitle information, a segmentation result, a common segmentation vocabulary database, an annotation corresponding to a first paragraph and an annotation corresponding to a second paragraph; and
a processor electrically connected to the memory unit for receiving the caption information; wherein the caption information comprises a plurality of caption sentences, and the processor comprises:
a segmentation unit for selecting the plurality of caption sentences by using a set value and dividing the selected caption sentences into a first segment;
a common word detection unit electrically connected to the segmentation unit for performing a common segmentation vocabulary judgment for a first caption sentence; wherein the first caption sentence is one of the plurality of caption sentences; and
and the paragraph generation unit is electrically connected with the common word detection unit and used for generating a second paragraph or merging the first caption sentence into the first paragraph according to a judgment result judged by the common segmented words.
10. The segmentation system of claim 9, wherein the common word detection unit is further configured to perform the common segmentation vocabulary determination for a second caption sentence after the first caption sentence is merged into the first segment;
wherein the second caption sentence follows the first caption sentence according to a specific sequence.
11. The segmentation system of claim 9, wherein the segmentation unit is further configured to select the plurality of caption sentences according to a specific order by using the setting value when the second paragraph is generated, and add the selected caption sentences to the second paragraph.
12. The segmentation system as claimed in claim 9, wherein the paragraph generation unit is further configured to perform the following steps according to the determination result:
if the first caption sentence is associated with the common segmentation vocabulary, performing segmentation processing to generate a second paragraph, selecting the caption sentences according to a specific sequence by using the set value, and adding the selected caption sentences into the second paragraph; and
and if the first caption sentence is not associated with the common segmented vocabulary, performing similarity value calculation on the first caption sentence and the first paragraph, and if the first caption sentence is similar to the common segmented vocabulary, merging the first caption sentence into the first paragraph.
13. The segmentation system of claim 12, wherein the segmentation process comprises:
determining whether the first caption sentence is associated with one of a beginning segmentation vocabulary and an ending segmentation vocabulary according to the judgment result;
if the first caption sentence is associated with the beginning segmentation vocabulary, taking the first caption sentence as the starting sentence of the second paragraph; and
and if the first caption sentence is associated with the ending segmented word, taking the first caption sentence as the ending sentence of the first segment.
14. The segmentation system of claim 12, wherein the similarity value calculation comprises:
comparing whether a difference value between at least one characteristic corresponding to the first caption sentence and at least one characteristic corresponding to the first paragraph is larger than a threshold value;
if the difference value is smaller than the threshold value, merging the first caption sentence into the first paragraph; and
if the difference value is not less than the threshold value, the first caption sentence is used as the initial sentence of the second paragraph, and the plurality of caption sentences are selected according to the specific sequence by using the set value, so that the selected caption sentences are divided into the second paragraph.
15. The segmentation system of claim 14, further comprising:
and the keyword extraction unit is electrically connected with the segmentation unit and used for extracting a plurality of keywords from the plurality of caption sentences, wherein the keywords are at least one characteristic corresponding to the first caption sentence.
16. The segmentation system of claim 15, wherein the at least one feature corresponding to the first paragraph is generated from the keywords extracted from the caption sentences in the first paragraph.
17. A non-transitory computer readable medium containing at least one program of instructions for execution by a processor to perform a segmentation method, comprising:
receiving subtitle information; wherein, the caption information comprises a plurality of caption sentences;
selecting the plurality of caption sentences according to a set value, and dividing the selected caption sentences into a first paragraph;
performing common segmentation vocabulary judgment on a first caption sentence; wherein the first caption sentence is one of the plurality of caption sentences; and
and generating a second paragraph or merging the first caption sentence into the first paragraph according to a judgment result of the common segmentation vocabulary judgment.
CN201910105172.8A 2018-09-07 2019-02-01 Segmentation method, segmentation system and non-transitory computer readable medium Active CN110895654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG10201905236WA SG10201905236WA (en) 2018-09-07 2019-06-10 Segmentation method, segmentation system and non-transitory computer-readable medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862728082P 2018-09-07 2018-09-07
US62/728,082 2018-09-07

Publications (2)

Publication Number Publication Date
CN110895654A true CN110895654A (en) 2020-03-20
CN110895654B CN110895654B (en) 2024-07-02

Family

ID=69745778

Family Applications (5)

Application Number Title Priority Date Filing Date
CN201910105172.8A Active CN110895654B (en) 2018-09-07 2019-02-01 Segmentation method, segmentation system and non-transitory computer readable medium
CN201910104946.5A Active CN110891202B (en) 2018-09-07 2019-02-01 Segmentation method, segmentation system and non-transitory computer readable medium
CN201910105173.2A Pending CN110889034A (en) 2018-09-07 2019-02-01 Data analysis method and data analysis system
CN201910104937.6A Active CN110888896B (en) 2018-09-07 2019-02-01 Data searching method and data searching system thereof
CN201910266133.6A Pending CN110888994A (en) 2018-09-07 2019-04-03 Multimedia data recommendation system and multimedia data recommendation method

Family Applications After (4)

Application Number Title Priority Date Filing Date
CN201910104946.5A Active CN110891202B (en) 2018-09-07 2019-02-01 Segmentation method, segmentation system and non-transitory computer readable medium
CN201910105173.2A Pending CN110889034A (en) 2018-09-07 2019-02-01 Data analysis method and data analysis system
CN201910104937.6A Active CN110888896B (en) 2018-09-07 2019-02-01 Data searching method and data searching system thereof
CN201910266133.6A Pending CN110888994A (en) 2018-09-07 2019-04-03 Multimedia data recommendation system and multimedia data recommendation method

Country Status (4)

Country Link
JP (3) JP6829740B2 (en)
CN (5) CN110895654B (en)
SG (5) SG10201905236WA (en)
TW (5) TWI709905B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI756703B (en) * 2020-06-03 2022-03-01 南開科技大學 Digital learning system and method thereof
CN114595854A (en) * 2020-11-19 2022-06-07 英业达科技有限公司 Method for tracking and predicting product quality based on social information
CN117351794B (en) * 2023-10-13 2024-06-04 浙江上国教育科技有限公司 Online course management system based on cloud platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19980014531A (en) * 1996-08-13 1998-05-25 김광수 How to Learn Foreign Dictation Dictation Using Caption Video CD Playback Device
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method
CN105047203A (en) * 2015-05-25 2015-11-11 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN106331893A (en) * 2016-08-31 2017-01-11 科大讯飞股份有限公司 Real-time subtitle display method and system

Family Cites Families (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07311539A (en) * 1994-05-17 1995-11-28 Hitachi Ltd Teaching material edition supporting system
JP2002041823A (en) * 2000-07-27 2002-02-08 Nippon Telegr & Teleph Corp <Ntt> Information distributing device, information receiving device and information distributing system
JP3685733B2 (en) * 2001-04-11 2005-08-24 株式会社ジェイ・フィット Multimedia data search apparatus, multimedia data search method, and multimedia data search program
JP2002341735A (en) * 2001-05-16 2002-11-29 Alice Factory:Kk Broadband digital learning system
CN1432932A (en) * 2002-01-16 2003-07-30 陈雯瑄 English examination and score estimation method and system
TW200411462A (en) * 2002-12-20 2004-07-01 Hsiao-Lien Wang A method for matching information exchange on network
WO2004090752A1 (en) * 2003-04-14 2004-10-21 Koninklijke Philips Electronics N.V. Method and apparatus for summarizing a music video using content analysis
JP4471737B2 (en) * 2003-10-06 2010-06-02 日本電信電話株式会社 Grouping condition determining device and method, keyword expansion device and method using the same, content search system, content information providing system and method, and program
JP4426894B2 (en) * 2004-04-15 2010-03-03 株式会社日立製作所 Document search method, document search program, and document search apparatus for executing the same
JP2005321662A (en) * 2004-05-10 2005-11-17 Fuji Xerox Co Ltd Learning support system and method
JP2006003670A (en) * 2004-06-18 2006-01-05 Hitachi Ltd Educational content providing system
US20080176202A1 (en) * 2005-03-31 2008-07-24 Koninklijke Philips Electronics, N.V. Augmenting Lectures Based on Prior Exams
US9058406B2 (en) * 2005-09-14 2015-06-16 Millennial Media, Inc. Management of multiple advertising inventories using a monetization platform
WO2008023470A1 (en) * 2006-08-21 2008-02-28 Kyoto University Sentence search method, sentence search engine, computer program, recording medium, and document storage
TW200825900A (en) * 2006-12-13 2008-06-16 Inst Information Industry System and method for generating wiki by sectional time of handout and recording medium thereof
JP5010292B2 (en) * 2007-01-18 2012-08-29 株式会社東芝 Video attribute information output device, video summarization device, program, and video attribute information output method
JP5158766B2 (en) * 2007-10-23 2013-03-06 シャープ株式会社 Content selection device, television, content selection program, and storage medium
TW200923860A (en) * 2007-11-19 2009-06-01 Univ Nat Taiwan Science Tech Interactive learning system
CN101382937B (en) * 2008-07-01 2011-03-30 深圳先进技术研究院 Multimedia resource processing method based on speech recognition and on-line teaching system thereof
US8140544B2 (en) * 2008-09-03 2012-03-20 International Business Machines Corporation Interactive digital video library
CN101453649B (en) * 2008-12-30 2011-01-05 浙江大学 Key frame extracting method for compression domain video stream
JP5366632B2 (en) * 2009-04-21 2013-12-11 エヌ・ティ・ティ・コミュニケーションズ株式会社 Search support keyword presentation device, method and program
JP5493515B2 (en) * 2009-07-03 2014-05-14 富士通株式会社 Portable terminal device, information search method, and information search program
EP2524362A1 (en) * 2010-01-15 2012-11-21 Apollo Group, Inc. Dynamically recommending learning content
JP2012038239A (en) * 2010-08-11 2012-02-23 Sony Corp Information processing equipment, information processing method and program
US8839110B2 (en) * 2011-02-16 2014-09-16 Apple Inc. Rate conform operation for a media-editing application
CN102222227B (en) * 2011-04-25 2013-07-31 中国华录集团有限公司 Video identification based system for extracting film images
CN102348049B (en) * 2011-09-16 2013-09-18 央视国际网络有限公司 Method and device for detecting position of cut point of video segment
CN102509007A (en) * 2011-11-01 2012-06-20 北京瑞信在线***技术有限公司 Method, system and device for multimedia teaching evaluation and multimedia teaching system
JP5216922B1 (en) * 2012-01-06 2013-06-19 Flens株式会社 Learning support server, learning support system, and learning support program
US9846696B2 (en) * 2012-02-29 2017-12-19 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and methods for indexing multimedia content
US20130263166A1 (en) * 2012-03-27 2013-10-03 Bluefin Labs, Inc. Social Networking System Targeted Message Synchronization
US9058385B2 (en) * 2012-06-26 2015-06-16 Aol Inc. Systems and methods for identifying electronic content using video graphs
TWI513286B (en) * 2012-08-28 2015-12-11 Ind Tech Res Inst Method and system for continuous video replay
WO2014100893A1 (en) * 2012-12-28 2014-07-03 Jérémie Salvatore De Villiers System and method for the automated customization of audio and video media
JP6205767B2 (en) * 2013-03-13 2017-10-04 カシオ計算機株式会社 Learning support device, learning support method, learning support program, learning support system, and server device
TWI549498B (en) * 2013-06-24 2016-09-11 wu-xiong Chen Variable audio and video playback method
CN104572716A (en) * 2013-10-18 2015-04-29 英业达科技有限公司 System and method for playing video files
KR101537370B1 (en) * 2013-11-06 2015-07-16 주식회사 시스트란인터내셔널 System for grasping speech meaning of recording audio data based on keyword spotting, and indexing method and method thereof using the system
US20150206441A1 (en) * 2014-01-18 2015-07-23 Invent.ly LLC Personalized online learning management system and method
CN104123332B (en) * 2014-01-24 2018-11-09 腾讯科技(深圳)有限公司 The display methods and device of search result
US9892194B2 (en) * 2014-04-04 2018-02-13 Fujitsu Limited Topic identification in lecture videos
US20150293928A1 (en) * 2014-04-14 2015-10-15 David Mo Chen Systems and Methods for Generating Personalized Video Playlists
US20160239155A1 (en) * 2015-02-18 2016-08-18 Google Inc. Adaptive media
JP6334431B2 (en) * 2015-02-18 2018-05-30 株式会社日立製作所 Data analysis apparatus, data analysis method, and data analysis program
CN104978961B (en) * 2015-05-25 2019-10-15 广州酷狗计算机科技有限公司 A kind of audio-frequency processing method, device and terminal
TWI571756B (en) * 2015-12-11 2017-02-21 財團法人工業技術研究院 Methods and systems for analyzing reading log and documents corresponding thereof
CN105978800A (en) * 2016-07-04 2016-09-28 广东小天才科技有限公司 Method, system and server for pushing questions to mobile terminal
CN106202453B (en) * 2016-07-13 2020-08-04 网易(杭州)网络有限公司 Multimedia resource recommendation method and device
CN106231399A (en) * 2016-08-01 2016-12-14 乐视控股(北京)有限公司 Methods of video segmentation, equipment and system
CN108122437A (en) * 2016-11-28 2018-06-05 北大方正集团有限公司 Adaptive learning method and device
CN107256262B (en) * 2017-06-13 2020-04-14 西安电子科技大学 Image retrieval method based on object detection
CN107623860A (en) * 2017-08-09 2018-01-23 北京奇艺世纪科技有限公司 Multi-medium data dividing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR19980014531A (en) * 1996-08-13 1998-05-25 김광수 How to Learn Foreign Dictation Dictation Using Caption Video CD Playback Device
CN102937972A (en) * 2012-10-15 2013-02-20 上海外教社信息技术有限公司 Audiovisual subtitle making system and method
CN105047203A (en) * 2015-05-25 2015-11-11 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN106331893A (en) * 2016-08-31 2017-01-11 科大讯飞股份有限公司 Real-time subtitle display method and system

Also Published As

Publication number Publication date
CN110891202B (en) 2022-03-25
TWI709905B (en) 2020-11-11
TW202011231A (en) 2020-03-16
TWI725375B (en) 2021-04-21
TW202011232A (en) 2020-03-16
SG10201906347QA (en) 2020-04-29
SG10201905236WA (en) 2020-04-29
SG10201905532QA (en) 2020-04-29
TWI699663B (en) 2020-07-21
TWI700597B (en) 2020-08-01
JP2020042771A (en) 2020-03-19
TW202011749A (en) 2020-03-16
TWI696386B (en) 2020-06-11
CN110895654B (en) 2024-07-02
JP6829740B2 (en) 2021-02-10
JP2020042770A (en) 2020-03-19
CN110891202A (en) 2020-03-17
JP2020042777A (en) 2020-03-19
TW202011222A (en) 2020-03-16
CN110889034A (en) 2020-03-17
SG10201907250TA (en) 2020-04-29
CN110888896A (en) 2020-03-17
CN110888896B (en) 2023-09-05
CN110888994A (en) 2020-03-17
SG10201905523TA (en) 2020-04-29
TW202011221A (en) 2020-03-16

Similar Documents

Publication Publication Date Title
CN107436922B (en) Text label generation method and device
CN102483743B (en) Detecting writing systems and languages
CN110020424B (en) Contract information extraction method and device and text information extraction method
US20120278705A1 (en) System and Method for Automatically Extracting Metadata from Unstructured Electronic Documents
CN109582833B (en) Abnormal text detection method and device
JP6335898B2 (en) Information classification based on product recognition
CN110895654B (en) Segmentation method, segmentation system and non-transitory computer readable medium
US9098487B2 (en) Categorization based on word distance
US20180081861A1 (en) Smart document building using natural language processing
CN107357824B (en) Information processing method, service platform and computer storage medium
US20110276523A1 (en) Measuring document similarity by inferring evolution of documents through reuse of passage sequences
US20130323690A1 (en) Providing an uninterrupted reading experience
CN106610990A (en) Emotional tendency analysis method and apparatus
CN114116973A (en) Multi-document text duplicate checking method, electronic equipment and storage medium
CN114048740B (en) Sensitive word detection method and device and computer readable storage medium
CN112699671B (en) Language labeling method, device, computer equipment and storage medium
CN111046627B (en) Chinese character display method and system
WO2024139834A1 (en) Search word determining method and apparatus, computer device, and storage medium
CN116029280A (en) Method, device, computing equipment and storage medium for extracting key information of document
CN109977423B (en) Method and device for processing word, electronic equipment and readable storage medium
US20180307669A1 (en) Information processing apparatus
CN111651987B (en) Identity discrimination method and device, computer readable storage medium and electronic equipment
CN114169331A (en) Address resolution method, device, computer equipment and storage medium
CN112686055B (en) Semantic recognition method and device, electronic equipment and storage medium
CN115455179B (en) Sensitive vocabulary detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant