CN108268539A - Video matching system based on text analyzing - Google Patents

Video matching system based on text analyzing Download PDF

Info

Publication number
CN108268539A
CN108268539A CN201611266235.0A CN201611266235A CN108268539A CN 108268539 A CN108268539 A CN 108268539A CN 201611266235 A CN201611266235 A CN 201611266235A CN 108268539 A CN108268539 A CN 108268539A
Authority
CN
China
Prior art keywords
video
subtitle
index
keyword
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611266235.0A
Other languages
Chinese (zh)
Inventor
李菁菁
黎哲明
蔡鸿明
姜丽红
步丰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201611266235.0A priority Critical patent/CN108268539A/en
Publication of CN108268539A publication Critical patent/CN108268539A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A kind of video matching system based on text analyzing, including:Caption analysis module, index module and search module, wherein:The time that word content and word content in caption analysis module extraction subtitle file occur in video, word content is segmented using stammerer participle, and using TF IDF algorithms the subtitle keyword of word content obtained to the word content after participle and at the beginning of subtitle keyword occurs in video and the end time, index module establishs or updates video index according to subtitle keyword and its using hash indexing method after starting and end time, subtitle keyword of the search module in search key input by user and video index compares and returns to the list of videos of similitude maximum, the present invention is realized puies forward the automatic process for establishing index according to subtitle, it ensure that the accuracy of search result, user is helped quickly to position search key corresponding time interval in video.

Description

Video matching system based on text analyzing
Technical field
The present invention relates to a kind of technology of field of video retrieval, specifically a kind of video matching based on text analyzing System.
Background technology
Network courses have been widely used in as a kind of educational media under current internet environment, more and more people Knowledge is obtained by online education.Existing instructional video has the characteristics that the time is short but quantity is more.Existing video frequency searching Technology is to be based on course description or video labeling, but course description cannot reflect the knowledge occurred in curriculum video completely Point, in fact it could happen that description and the unmatched situation of content.Automated video mark needs to carry out key-frame extraction and right to video Content in key frame is identified, but recognition effect is bad for instructional video, and accuracy rate is not high, and artificial Mark corresponds to the video dependent on mark person the familiarity and abstract ability of course, while annotation results cannot equally be contained All knowledge contents in lid curriculum video.
Invention content
The present invention is directed to deficiencies of the prior art, proposes a kind of video matching system based on text analyzing, It can effectively ensure that the accuracy of search result.
The present invention is achieved by the following technical solutions:
The present invention includes:Caption analysis module, index module and search module, wherein:Caption analysis module extracts subtitle The time that word content and word content in file occur in video divides word content using stammerer participle Word, and the subtitle keyword of word content is obtained using TF-IDF algorithms to the word content after participle and subtitle keyword exists In video occur at the beginning of and the end time, index module is according to subtitle keyword and its after starting and end time Video index is establishd or updated using hash indexing method, search module is according to search key input by user and video index In subtitle keyword compare and return to the list of videos of similitude maximum.
The stammerer participle is a kind of powerful Chinese word segmentation component, including accurate model, syntype and search engine Three kinds of participle patterns of pattern, the present invention carry out text analyzing using accurate model.Stammerer participle is realized efficient based on prefix dictionary The scanning of word figure, generate Chinese character in sentence and be possible into the directed acyclic graph (DAG) that word situation is formed, and using dynamically rule Draw and search maximum probability path, find out the maximum cutting word combination based on word frequency, and for unregistered word, using based on Chinese character into word The HMM model of ability, while use Viterbi algorithm.
The TF-IDF algorithms, i.e. term frequency-inverse document frequency algorithm occur in a document by a keyword The weight of number and inverse document frequency, i.e. keyword in a document obtains the TF-IDF values of the keyword, some keyword pair The importance of document is higher, then its TF-IDF values are bigger.The present invention obtains the pass in subtitle file using TF-IDF algorithms Keyword and its keyword value, at the beginning of determining the keyword in corresponding video according to its sequence and the end time.
The hash indexing method is to carry out Hash operation using subtitle keyword as index key, by Hash operation result It is deposited in a Hash table with corresponding line pointer information.The retrieval of hash index can be avoided multiple with one-time positioning I/O is accessed, and improves search efficiency.
The present invention relates to a kind of matching process according to above system, include the following steps:
Step 1) reads the subtitle file in video by caption analysis module, in the word of the subtitle in current video Appearance is analyzed, and extracts subtitle keyword;
Step 2) sends the subtitle set of keywords of acquisition to index module, and index module passes through training subtitle keyword Video index or the existing video index of update are established, and will be in new index storage to database;
Step 3) obtains index file, and pass through input by user search when user inputs search key in systems Rope keyword to carry out Similarity measures with the subtitle keyword in video index, obtains the highest set of keywords of similitude, Return to the corresponding list of videos of user and temporal information that search key occurs in video.
Technique effect
Compared with prior art, the present invention realizes education video and corresponds to subtitle keyword according to caption recognition establishes automatically The process of index builds term vector set by training word2vec, so as to by calculating the cosine similarity between word, Search key with subtitle keyword is matched, is effectively guaranteed the accuracy of search result, it is crucial in extraction subtitle Subtitle keyword is obtained during word, the time interval occurred is corresponded in video, user is helped quickly to position search key and is existed Corresponding time interval in video.
Description of the drawings
Fig. 1 is present system structure diagram.
Specific embodiment
The experiment of the present invention is deployed on Ali's cloud host of 18 core 16G memory.It is downloaded after Telnet host first The binary versions of word2vec then by Chinese and English corpus totally 370 ten thousand parts of articles of training wikipedia, take three Hour, obtain an output file for containing all term vectors, size 8G.Meanwhile for a video caption file First by being cut into the form of several small documents, 8 processes are opened simultaneously equally on this machine to above-mentioned small documents Parallel carries out participle operation and is output in file.This step improves 65% place compared to the processing mode of individual process Manage speed.Finally the index for building completion is stored in memory database Redis, reduces disk I/O, improves inquiry velocity about 20%.
As shown in Figure 1, the present embodiment includes:Caption analysis module, index module and search module, wherein:Caption analysis The time that word content and word content in module extraction subtitle file occur in video, using stammerer participle to word Content is segmented, and obtains the subtitle keyword and word of word content using TF-IDF algorithms to the word content after participle At the beginning of curtain keyword occurs in video and the end time, index module according to subtitle keyword and its time started and Video index is establishd or updated using hash indexing method after end time, search module is according to search key input by user It is compared with the subtitle keyword in video index and returns to the list of videos of similitude maximum.
The caption analysis module is the basis of index module, and curriculum video file includes video file and subtitle text Part, caption analysis module receive the subtitle file in curriculum video, the word content in subtitle file are analyzed, extraction text The time point that subtitle keyword and subtitle keyword in word content occur in curriculum video.
The caption analysis module by write script obtain subtitle file in word content and its it is corresponding It is the time relationship occurred in video, word content is segmented using stammerer participle later, to the word after the completion of participle Content using TF-IDF algorithms obtain subtitle keyword and its video occur at the beginning of and the end time.
Memory module is preferably further provided in the present apparatus, memory module storage video index and video file.
The index module is mainly used for building video index, if there are no establishing video index in current system, Video index is then established according to the subtitle keyword obtained in caption analysis module, otherwise will update existing video index, it Afterwards by new video index storage to the database in memory module, facilitate query video.
The method that the index module uses hash index, read first video file subtitle keyword and its Section is gathered, and reversely establishes subtitle keyword to the relationship of video and time interval, that is, builds subtitle keyword, video file With the content item of time interval, cryptographic Hash then is calculated to subtitle keyword, corresponding entry is written in Hash bucket.For There is hash-collision situation, we are solved in a manner that Hash table adds chained list.It can be added by the way of hash index The structure and renewal process indexed soon.The update method of index uses original place more new strategy, i.e., directly in existing index structure On modify.After increasing new curriculum video in video library and completing subtitle keyword extraction, directly in existing index Increase corresponding subtitle keyword, video file and time interval entry in structure newly.Original place update can pass through cryptographic Hash Whether directly positioning subtitle keyword has existed in original video index, so as to determine the mistake of additional entry or newly-increased entry Journey.
The search module is the module that query result is generated to search key input by user.Detailed process is root The search key provided according to user is matched with the subtitle keyword in video index, and the subtitle calculated in video index closes Similitude between key word and the search key of inquiry returns to the corresponding list of videos of similitude maximum, and from database Middle reading correlated curriculum video returns to user.Specific matching process is to build term vector set, meter by training word2vec The cosine similarity between search key and index key is calculated to obtain best between subtitle keyword and search key With result.Word2vec reads in the word in sliding window by constructing double-deck neural network, in input layer, their vector is added Together, the node of hidden layer is formed.Output layer is then a binary tree built by Hofman tree algorithm, in hidden layer Each node and the node of binary tree have the company sides of Weighted Coefficients.In given context, for a word w to be predicted, At this moment it just allows the binary coding maximum probability of prediction word, then solves parameter by using the method that gradient declines.It is logical Cross network struction into term vector model there is very high linguistics to evaluate, the relationship between two term vectors, can directly from It is embodied in the difference of the two vectors.Such as C (king)-C (queen)=C (man)-C (woman).
The memory module includes file system and database, to store video index and all curriculum videos, side Video index is obtained when being inquired after just, is also used in postorder keyword match, query video result is provided.
When system works, caption analysis module reads the subtitle file in video, to the word of the subtitle in current video Content is analyzed, and is extracted subtitle keyword, is sent the subtitle set of keywords of acquisition to index module, index module passes through Training subtitle keyword establishes video index or the existing video index of update, and will be in new index storage to database.When When user inputs search key in systems, obtain index file, and by search key input by user come and video Subtitle keyword in index carries out Similarity measures, obtains the highest set of keywords of similitude, it is corresponding to return to user The temporal information that list of videos and search key occur in video.
Compared with prior art, the present invention realizes education video and corresponds to subtitle keyword according to caption recognition and build automatically The process that lithol draws builds term vector set, so as to similar by calculating the cosine between word by training word2vec Degree, search key with subtitle keyword is matched, is effectively guaranteed the accuracy of search result, is closed in extraction subtitle Subtitle keyword is obtained during key word, the time interval occurred is corresponded in video, user is helped quickly to position search key Corresponding time interval in video.It is compared with existing method, search time reduces about 8%, and search accuracy rate improves About 6%.
Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims (5)

1. a kind of video matching system based on text analyzing, which is characterized in that including:Caption analysis module, index module and Search module, wherein:What word content and word content in caption analysis module extraction subtitle file occurred in video Time segments word content, and obtain text using TF-IDF algorithms to the word content after participle using stammerer participle At the beginning of the subtitle keyword and subtitle keyword of word content occur in video and the end time, index module according to Subtitle keyword and its video index, search module are establishd or updated using hash indexing method after starting and end time Subtitle keyword in search key input by user and video index compares and returns to the video row of similitude maximum Table.
2. the video matching system according to claim 1 based on text analyzing, it is characterized in that, stammerer participle packet Three kinds of accurate model, syntype and search engine pattern participle patterns are included, it is efficient to be based on the realization of prefix dictionary for wherein accurate model The scanning of word figure, generate Chinese character in sentence and be possible into the directed acyclic graph that word situation is formed, and use Dynamic Programming is looked into Maximum probability path is looked for, finds out the maximum cutting word combination based on word frequency.
3. the video matching system according to claim 2 based on text analyzing, it is characterized in that, the TF-IDF is calculated Method, i.e. term frequency-inverse document frequency algorithm, the number and inverse document frequency occurred in a document by a keyword, i.e., The weight of keyword in a document is obtained the TF-IDF values of the keyword, can be obtained in subtitle file using TF-IDF algorithms Keyword and its keyword value, at the beginning of determining the keyword in corresponding video according to its sequence and at the end of Between.
4. the video matching system according to claim 1 based on text analyzing, it is characterized in that, the hash index side Method is to carry out Hash calculation to the subtitle keyword extracted to obtain cryptographic Hash, and Hash operation result and corresponding row are referred to Needle information is deposited in a Hash table, and video caption key word index is established with this.
5. a kind of matching process of the system according to any of the above-described claim, which is characterized in that include the following steps:
Step 1) reads the subtitle file in video by caption analysis module, to the word content of the subtitle in current video into Row analysis, extracts subtitle keyword;
Step 2) sends the subtitle set of keywords of acquisition to index module, and index module is established by training subtitle keyword Video index or the existing video index of update, and will be in new index storage to database;
Step 3) obtains index file when user inputs search key in systems, and passes through search input by user and close Key word to carry out Similarity measures with the subtitle keyword in video index, obtains the highest set of keywords of similitude, returns The temporal information occurred in video to the corresponding list of videos of user and search key.
CN201611266235.0A 2016-12-31 2016-12-31 Video matching system based on text analyzing Pending CN108268539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611266235.0A CN108268539A (en) 2016-12-31 2016-12-31 Video matching system based on text analyzing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611266235.0A CN108268539A (en) 2016-12-31 2016-12-31 Video matching system based on text analyzing

Publications (1)

Publication Number Publication Date
CN108268539A true CN108268539A (en) 2018-07-10

Family

ID=62770149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611266235.0A Pending CN108268539A (en) 2016-12-31 2016-12-31 Video matching system based on text analyzing

Country Status (1)

Country Link
CN (1) CN108268539A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502661A (en) * 2019-07-08 2019-11-26 天脉聚源(杭州)传媒科技有限公司 A kind of video searching method, system and storage medium
CN111008300A (en) * 2019-11-20 2020-04-14 四川互慧软件有限公司 Keyword-based timestamp positioning search method in audio and video
KR102119246B1 (en) * 2019-06-10 2020-06-04 (주)사맛디 System, method and program for searching image data by using deep-learning algorithm
CN111324768A (en) * 2020-02-12 2020-06-23 新华智云科技有限公司 Video searching system and method
WO2020155750A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment
CN111949827A (en) * 2020-07-29 2020-11-17 深圳神目信息技术有限公司 Video plagiarism detection method, device, equipment and medium
WO2020251236A1 (en) * 2019-06-10 2020-12-17 (주)사맛디 Image data retrieval method, device, and program using deep learning algorithm
WO2020251233A1 (en) * 2019-06-10 2020-12-17 (주)사맛디 Method, apparatus, and program for obtaining abstract characteristics of image data
CN112817916A (en) * 2021-02-07 2021-05-18 中国科学院新疆理化技术研究所 Data acquisition method and system based on IPFS
CN112836008A (en) * 2021-02-07 2021-05-25 中国科学院新疆理化技术研究所 Index establishing method based on decentralized storage data
CN112910674A (en) * 2019-12-04 2021-06-04 ***通信集团设计院有限公司 Physical site screening method and device, electronic equipment and storage medium
CN113051966A (en) * 2019-12-26 2021-06-29 ***通信集团重庆有限公司 Video keyword processing method and device
CN113743352A (en) * 2021-09-15 2021-12-03 央视国际网络无锡有限公司 Method and device for comparing similarity of video contents
CN114143613A (en) * 2021-12-03 2022-03-04 北京影谱科技股份有限公司 Video subtitle time alignment method, system and storage medium
CN114780789A (en) * 2022-06-22 2022-07-22 山东建筑大学 Assembly type component construction monitoring video positioning method based on natural language query
CN117033673A (en) * 2023-05-16 2023-11-10 广州比地数据科技有限公司 Multimedia content extraction system based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021857A (en) * 2006-10-20 2007-08-22 鲍东山 Video searching system based on content analysis
CN101137030A (en) * 2006-09-01 2008-03-05 索尼株式会社 Apparatus, method and program for searching for content using keywords from subtitles
CN101630524A (en) * 2008-07-18 2010-01-20 广明光电股份有限公司 Method for searching multimedia contents
CN103686200A (en) * 2013-12-27 2014-03-26 乐视致新电子科技(天津)有限公司 Intelligent television video resource searching method and system
US20140140681A1 (en) * 2012-11-21 2014-05-22 Hon Hai Precision Industry Co., Ltd. Video content search method, system, and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101137030A (en) * 2006-09-01 2008-03-05 索尼株式会社 Apparatus, method and program for searching for content using keywords from subtitles
CN101021857A (en) * 2006-10-20 2007-08-22 鲍东山 Video searching system based on content analysis
CN101630524A (en) * 2008-07-18 2010-01-20 广明光电股份有限公司 Method for searching multimedia contents
US20140140681A1 (en) * 2012-11-21 2014-05-22 Hon Hai Precision Industry Co., Ltd. Video content search method, system, and device
CN103686200A (en) * 2013-12-27 2014-03-26 乐视致新电子科技(天津)有限公司 Intelligent television video resource searching method and system

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155750A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium
WO2020251233A1 (en) * 2019-06-10 2020-12-17 (주)사맛디 Method, apparatus, and program for obtaining abstract characteristics of image data
KR102119246B1 (en) * 2019-06-10 2020-06-04 (주)사맛디 System, method and program for searching image data by using deep-learning algorithm
WO2020251236A1 (en) * 2019-06-10 2020-12-17 (주)사맛디 Image data retrieval method, device, and program using deep learning algorithm
CN110502661A (en) * 2019-07-08 2019-11-26 天脉聚源(杭州)传媒科技有限公司 A kind of video searching method, system and storage medium
CN111008300A (en) * 2019-11-20 2020-04-14 四川互慧软件有限公司 Keyword-based timestamp positioning search method in audio and video
CN112910674B (en) * 2019-12-04 2023-04-18 ***通信集团设计院有限公司 Physical site screening method and device, electronic equipment and storage medium
CN112910674A (en) * 2019-12-04 2021-06-04 ***通信集团设计院有限公司 Physical site screening method and device, electronic equipment and storage medium
CN113051966A (en) * 2019-12-26 2021-06-29 ***通信集团重庆有限公司 Video keyword processing method and device
CN111324768A (en) * 2020-02-12 2020-06-23 新华智云科技有限公司 Video searching system and method
CN111324768B (en) * 2020-02-12 2023-07-28 新华智云科技有限公司 Video searching system and method
CN111523302A (en) * 2020-07-06 2020-08-11 成都晓多科技有限公司 Syntax analysis method and device, storage medium and electronic equipment
CN111949827A (en) * 2020-07-29 2020-11-17 深圳神目信息技术有限公司 Video plagiarism detection method, device, equipment and medium
CN111949827B (en) * 2020-07-29 2023-10-24 深圳神目信息技术有限公司 Video plagiarism detection method, device, equipment and medium
CN112817916A (en) * 2021-02-07 2021-05-18 中国科学院新疆理化技术研究所 Data acquisition method and system based on IPFS
CN112836008B (en) * 2021-02-07 2023-03-21 中国科学院新疆理化技术研究所 Index establishing method based on decentralized storage data
CN112817916B (en) * 2021-02-07 2023-03-31 中国科学院新疆理化技术研究所 Data acquisition method and system based on IPFS
CN112836008A (en) * 2021-02-07 2021-05-25 中国科学院新疆理化技术研究所 Index establishing method based on decentralized storage data
CN113743352A (en) * 2021-09-15 2021-12-03 央视国际网络无锡有限公司 Method and device for comparing similarity of video contents
CN114143613A (en) * 2021-12-03 2022-03-04 北京影谱科技股份有限公司 Video subtitle time alignment method, system and storage medium
CN114143613B (en) * 2021-12-03 2023-07-21 北京影谱科技股份有限公司 Video subtitle time alignment method, system and storage medium
CN114780789A (en) * 2022-06-22 2022-07-22 山东建筑大学 Assembly type component construction monitoring video positioning method based on natural language query
CN117033673A (en) * 2023-05-16 2023-11-10 广州比地数据科技有限公司 Multimedia content extraction system based on artificial intelligence
CN117033673B (en) * 2023-05-16 2024-04-05 广州比地数据科技有限公司 Multimedia content extraction system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN108268539A (en) Video matching system based on text analyzing
Bakhtin et al. Real or fake? learning to discriminate machine from human generated text
CN110245259B (en) Video labeling method and device based on knowledge graph and computer readable medium
Maharjan et al. A multi-task approach to predict likability of books
CN107608960B (en) Method and device for linking named entities
CN108304373B (en) Semantic dictionary construction method and device, storage medium and electronic device
CN106126619A (en) A kind of video retrieval method based on video content and system
WO2013170587A1 (en) Multimedia question and answer system and method
CN108228758A (en) A kind of file classification method and device
US20190317986A1 (en) Annotated text data expanding method, annotated text data expanding computer-readable storage medium, annotated text data expanding device, and text classification model training method
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN109635157A (en) Model generating method, video searching method, device, terminal and storage medium
CN106537387B (en) Retrieval/storage image associated with event
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN112966117A (en) Entity linking method
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN114297388A (en) Text keyword extraction method
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN111930937A (en) BERT-based intelligent government affair text multi-classification method and system
CN113408282B (en) Method, device, equipment and storage medium for topic model training and topic prediction
Sheikh et al. Document level semantic context for retrieving OOV proper names
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN115906835B (en) Chinese question text representation learning method based on clustering and contrast learning
CN112528653A (en) Short text entity identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180710