CN110490031B

CN110490031B - Universal digital identification method, storage medium, electronic device and system

Info

Publication number: CN110490031B
Application number: CN201810464001.XA
Authority: CN
Inventors: 钟志伟; 陈少杰; 张文明
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Zhongke Dingcheng Industrial Co ltd
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2022-08-16
Anticipated expiration: 2038-05-15
Also published as: CN110490031A

Abstract

The invention discloses a method, a storage medium, electronic equipment and a system for universal digital identification, which relate to the field of Internet live broadcast management, and the method comprises the steps of capturing pictures of a digital identification area required in live broadcast according to a preset picture capturing rule; arranging the captured pictures according to a time axis, adjusting the captured pictures to a preset resolution ratio and then using the captured pictures as the pictures to be recognized, or adjusting the captured pictures to the preset resolution ratio and then arranging the captured pictures according to the time axis to obtain the pictures to be recognized; inputting the pictures to be identified into a convolutional neural network according to a time sequence, and extracting first digital characteristic information of each picture by the convolutional neural network according to the time sequence; inputting the first digital characteristic information into a recurrent neural network according to a time sequence, and obtaining second digital characteristic information by the recurrent neural network in combination with the time sequence of the picture and the first digital characteristic information; and identifying the second digital characteristic information and outputting an identification result. The method and the device can accurately identify the numbers in the pictures of the live video screenshots of the Internet.

Description

Universal digital identification method, storage medium, electronic device and system

Technical Field

The invention relates to the field of Internet live broadcast management, in particular to a method for realizing universal digital identification by utilizing a deep neural network.

Background

In internet live broadcast, in order to improve the live broadcast participation and experience of users, competition, red envelope and other activities are often set to encourage the anchor to communicate with the users. The number of indexes reached in the live broadcast process is often related to guessing and the like. After the number reaches the standard, activities such as anchor locking guess and the like are needed to prevent some users from getting wrong, but the users may be excessively involved in the live broadcast content in the live broadcast process, and forgetting to lock the guess leads some users to get wrong data through guess. For example, in absolute survival or live broadcast of hero alliance games, the number of people in the game can be exceeded when the anchor starts to guess, and when the number of people in the game exceeds the number of people in the anchor, the anchor is excessively concentrated in the game and does not close the guess, and a user can still bet, so that the guess loses fairness and the live broadcast atmosphere is influenced. If the system can autonomously identify the numbers on the pictures in the live video, corresponding operations can be intelligently completed, such as guessing, locking guessing and the like after the preset numbers are reached in guessing.

People often use more traditional OCR techniques to perform digital recognition. It first segments the digital picture and then identifies each number after segmentation. However, the OCR technology segmentation needs higher accuracy, and if there is adhesion and intersection between numbers, the numbers are abnormal, or the resolution of the numbers is low, the segmentation is prone to errors, which leads to recognition errors.

Therefore, a method for accurately identifying the numbers in the pictures is urgently needed in internet live broadcast.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method for realizing general digital identification by using a deep neural network, which can more accurately identify the numbers in the live video screenshot pictures of the Internet.

In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:

a method for universal digital identification is applied to digital capture of live pictures and comprises the following steps:

capturing pictures of a digital identification area required in live broadcasting according to a preset picture capturing rule;

arranging the captured pictures according to a time axis, adjusting the captured pictures to a preset resolution ratio and then using the captured pictures as the pictures to be recognized, or adjusting the captured pictures to the preset resolution ratio and then arranging the captured pictures according to the time axis to obtain the pictures to be recognized;

inputting the pictures to be identified into a convolutional neural network according to a time sequence, and extracting first digital characteristic information of each picture by the convolutional neural network according to the time sequence;

inputting the first digital characteristic information into a recurrent neural network according to a time sequence, and obtaining second digital characteristic information by the recurrent neural network in combination with the time sequence of the picture and the first digital characteristic information;

and identifying the second digital characteristic information and outputting an identification result.

On the basis of the technical scheme, the grab rule is as follows:

identifying a live broadcast interface for playing the video stream;

performing feature matching on the live broadcast interface to acquire a region which accords with a preset feature pattern in the live broadcast interface;

the area is subjected to a grab picture.

On the basis of the technical scheme, the grab rule is as follows:

presetting a corresponding relation between a live broadcast room theme and a live broadcast picture capture area;

and acquiring pictures of the corresponding capture area in the live broadcast picture according to the live broadcast theme.

On the basis of the above technical solution, the extracting, by the convolutional neural network according to the time sequence, the first digital feature information specifically includes:

the convolutional neural network extracts features from the picture to be identified by using a multilayer residual error network and obtains first digital feature information.

On the basis of the above technical solution, the obtaining of the second digital feature information by the recurrent neural network in combination with the time sequence of the picture and the first digital feature information specifically includes:

the recurrent neural network uses the LSTM network to output second digital feature information in combination with the first digital feature information of the preceding and following pictures of the first digital feature information picture when identifying the first digital feature information.

On the basis of the above technical solution, identifying the second digital feature information and outputting the identification result specifically includes:

and matching the second digital characteristic information with the digital label by using the CTC, and obtaining an identification result according to the digital label.

On the basis of the above technical solution, the recurrent neural network has a first rule, and the first rule is: and the time lapse corresponds to the change rule of the first digital characteristic information under the preset scene, and the numerical value of the identification result is equal to or increases the change rule of the corresponding first digital characteristic information along with the time sequence.

On the basis of the technical scheme, the second digital feature information is identified and the identification result is output to have a second rule, wherein the second rule is as follows: and the recognition result is a natural number smaller than the preset value.

An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that runs on the processor: the processor implements the above technical solution when executing the computer program.

A system for universal number identification, comprising:

the capturing module is used for capturing pictures in a digital identification area required in live broadcasting according to a preset picture capturing rule;

the image editing module is used for arranging the captured images according to a time axis and adjusting the captured images to a preset resolution ratio to be used as the images to be recognized, or the image editing module is used for adjusting the captured images to the preset resolution ratio and arranging the captured images according to the time axis to obtain the images to be recognized;

the extraction module acquires the pictures to be identified and extracts the first digital characteristic information of each picture according to the time sequence through a convolutional neural network;

the adjusting module is provided with a recurrent neural network, and the recurrent neural network is combined with the time sequence of the picture and the first digital characteristic information to obtain second digital characteristic information;

and the result output module is used for identifying the second digital characteristic information and outputting an identification result.

Compared with the prior art, the invention has the advantages that:

according to the invention, firstly, the images are integrally captured and zoomed, so that the order and uniformity of the images are ensured, and the network identification is facilitated; then, extracting first digital characteristic information from the pictures one by one through convolutional nerves, and calculating each characteristic representing the pictures without simply depending on the division of the pictures, so that each characteristic of the pictures can be calculated, namely the first digital characteristic information can contain more picture information; the change of the front picture and the back picture is considered through the recurrent neural network, the pictures are considered from the dimension of time, and the digital identification is further more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings corresponding to the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method of universal number recognition according to the present invention;

FIG. 2 is a schematic diagram of a residual error unit of the universal digital recognition system according to the present invention;

fig. 3 is a schematic structural diagram of a system for universal number recognition according to the present invention.

Detailed Description

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides a general digital recognition method, which is applied to digital capture of a live broadcast picture and implemented by using a deep neural network, and the method is applicable to digital recognition in video screenshot in live broadcast, and includes:

s1: and capturing pictures needing digital identification in live broadcasting according to a preset picture capturing rule.

Specifically, the rule of the grab is:

a live interface for playing the video stream is identified. When the live broadcast is in progress, besides the live broadcast video stream picture, there may be areas beside the live broadcast video stream picture, such as advertisement display columns, user recommendation columns, etc., which may interfere with the subsequent setting of the capture area. Therefore, it is necessary to first discriminate which area is the live interface for playing the video stream in the live broadcast. The picture for acquiring the played video stream may be the parameter for setting the played video stream in the read code, or may be the picture position preset by the page or the client, which is not described in detail herein.

And performing feature matching on the live broadcast interface to acquire the area where the live broadcast interface conforms to the preset style. After an interface for playing video streams in a live broadcast interface is obtained, besides the numbers to be identified, the interface generally has symbolic patterns such as characters or patterns beside the numbers. The pattern can explain and highlight the numbers, so that the characteristics of the pattern can be preset and recognized, and a grabbing area can be defined after the recognition. The demarcated area can be all areas surrounded by the patterns or areas near the character area, and different settings are carried out according to different live broadcast contents.

The area is subjected to a grab picture. After the grabbing area is set, the live broadcast can be grabbed.

For example, if grabbing is performed for the interface of "take place absolutely and get away from the game" live broadcast: firstly, in the live interface, the picture of the game, namely the interface for playing the video stream, is identified. If statistics on the number of people who currently live is needed, matching can be performed for the characteristics of the words "live" in the picture. Since the relative positions of the numbers and the characteristic typeface are fixed in the game, after the area where the typeface exists is found, the area needing to be grabbed can be uniquely confirmed, namely the area of the numbers beside the 'survival' typeface is grabbed.

Optionally, the main graph rule may also be a preset corresponding relationship between a preset live broadcast room theme and a live broadcast picture capture area; and then, according to the live broadcast theme, obtaining the picture of the corresponding capture area in the live broadcast picture.

In particular, the topic of the live room may be a tag, a name or an anchor autonomous selection of the live room. Because the digital area needing to be identified in part of the live broadcast interface is fixed, some fixed areas are arranged at the upper left corner of the interface, some fixed areas are arranged at the upper right corner of the interface, and the like, the positions needing to be grabbed can be preset according to different live broadcast contents; when the anchor carries out live broadcast, the corresponding digital area needing to be intercepted is searched according to the label, the live broadcast name, the self-service selection of the anchor and the like, and the digital area is grabbed.

For example, when a host selects a singing software for singing and broadcasting, the current song playing time needs to be captured. The region to be captured corresponding to the singing software is searched first, and after the capture region is determined, image capture is performed.

S2: and arranging the captured pictures according to a time axis, and adjusting the captured pictures to a preset resolution ratio to be used as the pictures to be identified.

Specifically, identification numbers needed to be identified in a live broadcast picture are usually located in a fixed area, only the area needs to be selected for picture capture, captured pictures are arranged according to a time axis to read a time stamp of each picture during capture, and the pictures are automatically arranged according to the time stamps. The specific method of adjusting a picture to a preset resolution is to crop and reduce a larger picture, expand its background for a too small picture, and then scale the picture to a specified size. More specific tools for zooming the image are known to those skilled in the art, and are not described herein.

For example: although the positions of the pictures needed to be identified for the live broadcast of different live broadcast contents can be different, the digital display positions of the live broadcast contents do not dynamically float, and therefore, a specific digital capture area can be selected according to the live broadcast contents. In the process of live network broadcasting, the fixed area for acquiring the number can be large or small and is influenced by the resolution selected by the live broadcast and the live broadcast range, and when the picture identification is carried out on the number in the live broadcast, the range of the number to be identified is generally 0-99, so that the input resolution of the convolutional neural network is preferably set to be 38 × 22, the larger picture is cut and reduced, the background of the picture can be expanded for the undersized picture, and the picture is then reduced to the specified size. After the input picture is converted into the uniform resolution, the neural network can not obtain different results due to different resolutions of the same picture, and the method can be more accurate.

As an optional implementation, the captured pictures may also be arranged in time axis after being adjusted to a preset resolution, and the picture to be recognized is obtained.

That is, the resolution adjustment is performed after the photos are arranged in time stamps. Or the pictures to be recognized which meet the requirements can be obtained by firstly adjusting the resolution and then arranging the pictures according to the time axis.

S3: and inputting the pictures to be identified into the convolutional neural network according to the time sequence, and extracting the first digital characteristic information of each picture by the convolutional neural network according to the time sequence. Specifically, the first digital feature information here is a feature vector with a vector.

Taking live broadcasting as an example, in the live broadcasting process, the numbers in the live broadcasting picture are generally continuously performed, so the obtained pictures are also sequenced according to the live broadcasting time sequence, the convolutional neural network extracts the first digital feature information according to the time sequence to be recognized, a series of first digital feature information can be extracted, and the information corresponds to the pictures to be recognized one by one, namely the first digital feature information is also arranged according to the time sequence.

In addition, the convolutional neural network is trained, and can extract first digital feature information from the digital features in the picture. Compared with an OCR technology which relies on the segmentation of the picture too much, the trained convolutional neural network extracts the first digital characteristic information of the whole picture more accurately, and the pattern and the range of the digital picture are wider.

As a preferred embodiment, the convolutional neural network uses a 13-layer residual network to extract features from the picture to be recognized and obtain the first digital feature information, wherein the 13-layer residual network is composed of 6 residual units and 1 convolutional layer.

As shown in fig. 2, each residual unit includes 6 sub-units: the first conversion subunit, the first Batch Norm subunit, the first ReLu subunit, the second conversion subunit, the second Batch Norm subunit and the second ReLu subunit. Wherein 1 volume subunit, 1 Batch Norm subunit, 1 ReLu subunit are as one deck, and promptly every residual error unit is two-layer structure: the total of 12 layers of 6 residual error units plus 1 convolutional layer is 13 layers of residual error networks. Each layer of residual error network can identify different characteristics on the picture and finally obtain first digital characteristic information. The first digital feature information is a feature vector extracted from the picture by the convolutional neural network, and each column of features corresponds to a rectangular area of the original picture. Preferably, the convolutional neural network yields 9 x 1 feature vectors for the live picture number range 0-99.

The residual network included in the convolutional neural network is not limited to 13 layers, and may include a plurality of residual networks. The person skilled in the art can reasonably deduce that the more residual error networks used by the method, the more feature types that can be extracted, the more accurate the first digital feature information identified and obtained by the method, but the more resources and time consumed by the calculation amount, and therefore the method uses the 13-layer residual error network as a compromise, so that live picture identification can be identified more accurately and quickly while consuming less resources. In addition, the calculation of the 6 sub-units, i.e., the first conversion sub-unit, the first Batch Norm sub-unit, the first ReLu sub-unit, the second conversion sub-unit, the second Batch Norm sub-unit, and the second ReLu sub-unit, included in the residual error unit for the picture is known to those skilled in the art, and will not be described herein again.

It should be further noted that the contribution subunit is a unit for performing Convolution calculation on data, the Batch Norm subunit is a Batch normalization subunit, which is a subunit for performing normalization operation on the output of each layer in the neural network, and the ReLu subunit is a unit with an activation function commonly used in the artificial neural network, which is a modified linear unit and can more efficiently perform gradient descent and backward propagation, i.e., avoid the problems of gradient explosion and gradient disappearance.

S4: and inputting the first digital characteristic information into a recurrent neural network according to the time sequence, and combining the time sequence of the picture and the first digital characteristic information by the recurrent neural network to obtain second digital characteristic information.

Specifically, after the convolutional neural network extracts the first digital feature information in a time sequence, the recurrent neural network can search for possible relations among the numbers in the sequence of the first digital feature information, that is, the recurrent neural network can consider the first digital feature information before and after a first digital feature information when identifying the first digital feature information, that is, the first digital feature information extracted before or after the first digital feature information. Compared with the convolutional neural network which only identifies the picture at a single time point, the method which is added with the convolutional neural network can expand the identification of the picture in time length. After the cyclic neural network is set, the first digital characteristic information obtained by the convolutional neural network can be further identified, the digital change in live broadcast is logically connected, and the most accurate result is finally obtained.

And identifying the second digital characteristic information and outputting an identification result. Since the picture passes through the convolutional neural network and the cyclic neural network, the obtained second digital feature information may include a plurality of possible results, for example, when the second digital feature information sent by the cyclic neural network is 4422 and the identification number is 0 to 99, the results may be 44, 22 and 42, while the two numbers 4 and 2 are indeed identified through the convolutional neural network and the cyclic neural network, the two more numbers are repeated, and therefore the actual identification number is 42. Through the identification of the second digital characteristic information, the second digital characteristic information can be further generalized and logically judged, so that a more logical result is obtained, namely more accurate.

It should be noted that, the above-mentioned inputting the convolutional neural network and the cyclic neural network in sequence may be that a plurality of pictures are first input into the convolutional neural network and first digital feature information is obtained according to the picture sequence thereof, and then the plurality of first digital feature information is sequentially input into the cyclic neural network, or that 1 picture is input into the convolutional neural network according to the picture sequence to obtain one piece of first digital feature information, then the first digital feature information is input into the cyclic neural network, and at the same time, the subsequent pictures are continuously input into the convolutional neural network in sequence. As long as the pictures are input into the recurrent neural network in the order of the pictures so that it does not go wrong regarding the context.

As an alternative embodiment, the recurrent neural network uses an LSTM (Long Short-Term Memory) network to output the second digital feature information in combination with the first digital feature information of the preceding and following pictures of the first digital feature information picture when identifying the first digital feature information, and the LSTM network is a time recursive neural network suitable for processing and predicting important events with relatively Long intervals and delays in time series. The method can effectively solve the problems of small gradient and gradient explosion of the recurrent neural network, and meanwhile, the image sequence capable of being related in the calculation of the LSTM network is wider, so that the method is more accurate.

As a preferred embodiment, the recurrent neural network has a first rule, and the first rule is: and the time lapse and the change rule corresponding to the first digital characteristic information under the preset scene.

In a particular live scenario, such as the identification of the number of virtual gifts as they are presented in succession, or the identification of the number of beats in a game. As a preferred embodiment, the first rule is specifically: the numerical values of the identification results are equal or increase along with the time sequence, and the change rule of the corresponding first digital characteristic information is changed.

In particular, in the live broadcast room, the rules of the numbers are often determined by the live broadcast content, for example, for some competitive games, the number of players is required to be reached or the higher the number of players is, the better the number of players is, and the rules of the numbers are gradually increased along with time, for example, the hero alliance is, and the number of people obtained by the live broadcast in one game is continuously increased. The rotating network takes this rule into account where the relationship between time and picture can be better linked. If the number identified in the previous picture is 04, the current picture may be 00 or 08, and the first rule is that the number grows in time sequence, then the number of the current picture can only be 08. Therefore, the change rule of the first digital feature information is that the number corresponding to the first digital feature information in the current picture is only larger than or equal to the number corresponding to the first digital feature information in the previous picture.

The cyclic neural network increases the condition for identifying the picture through the first rule, so that the picture identification logic is clearer, and the calculation condition is wider.

It is understood that the first rule may have different options according to the application scenario, for example, in the case of recognizing the countdown, the first rule may only be reduced.

S5: and identifying the second digital characteristic information and outputting an identification result.

As an alternative embodiment, a CTC (connection Temporal Classification) is used when identifying the second digital feature information, and the CTC can resolve the alignment relationship between the input and the output.

For example, in broadcast, the number is identified in the range of 0-99. CTC is implemented by inserting blanks on both sides of the number, so that each feature vector can correspond to a label, e.g., __44__ 22-corresponding to the 9 x 1 feature vector identified at a time, and the predicted result is 42.

As a preferred embodiment, the recognizing the second digital feature information and outputting the recognition result has a second rule that: and the recognition result is a natural number smaller than the preset value.

Specifically, in a live broadcast scenario, the range to be identified for the number in the live broadcast is 0 to 99, and the second rule is: the preset recognition result is a natural number less than 99.

The number to be identified in the live broadcast room generally has the maximum value, for example, the live broadcast room for some live broadcast singing can be compared with the original singing to obtain the percentage of similarity. And as a percentage, singing is caused by human errors and deviation, the percentage is less than 100%, namely the number to be identified is definitely a 1-bit or 2-bit number, and the second digital characteristic information can be better identified by limiting the number of bits. If the identification result is 224 or 244 or 24, if the limited number of bits is 2, only 24 is provided, so that the identification can be more convenient and accurate.

It should be noted that, in addition to the first rule and the second rule, any other content designed according to the live broadcast content can be added to the method for implementing universal digital recognition by using the deep neural network. If the count is gradually decreased in some elimination games, the numerical rule is sequentially decreased with time.

Embodiments of the invention are described below in their entirety by way of example:

firstly, a group of pictures which are cut from a live broadcast time according to a time sequence are obtained, and the group of pictures are zoomed to a preset resolution. The neural network can not obtain different results because of different size resolutions of the same picture, and can be more accurate.

And inputting the group of zoomed pictures into a convolutional neural network according to a time sequence. The convolutional neural network uses a 13-layer residual error network to extract features from the picture to be identified according to the time sequence and obtain first digital feature information. The first digital feature information is a feature vector with the height of 1 and the width of 9, the features of each column correspond to a rectangular area of the original picture, and each first digital feature information corresponds to one of a group of pictures, so that a group of first digital feature information is obtained. The method for extracting a plurality of characteristics corresponding to the image by the convolutional neural network is different from the method for only identifying the shape of the segmented image by an OCR technology, can more finely find the detailed characteristics of the image, and uses numerical values to express the detailed characteristics.

And inputting the obtained first digital feature information into a recurrent neural network, wherein the recurrent neural network uses an LSTM (Long Short-Term Memory) network to predict by combining a front-back time sequence in the first digital feature information, so as to obtain second digital feature information. Through the recurrent neural network, the first digital characteristic information can be changed according to the time sequence, and the information can be acquired more accurately by taking the first digital characteristic information into consideration.

CTC (connection objective Classifier) is a second digital feature information corresponding digital label. Specifically, the second digital feature information is a non-numeric corresponding blank and is a numeric corresponding numeric label, so that the feature vector in the second digital feature information of each column can correspond to a label (blank or number).

According to the embodiment, the method for extracting the digital picture features can be realized by only using a small ResNet residual error network, so that the resource consumption is low, and the identification efficiency does not need manual intervention.

In addition, corresponding to the above general number identification method, the present invention further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the timer setting method described in each of the above embodiments are implemented. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk, or an optical disk.

In addition, corresponding to the above general number recognition method, the present invention further provides an electronic device, in which a computer program is stored, and the steps of the timer setting method according to each of the above embodiments are implemented when the computer program is executed by a processor. It should be noted that the electronic device includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor executes the computer program to implement the steady moving terminal display method of the foregoing embodiment.

As shown in fig. 3, an embodiment of the present invention further provides a system for universal digital identification, which includes a capture module 1, a picture editing module 2, an extraction module 3, an adjustment module 4, and a result output module 5:

the capturing module 1 is used for capturing pictures in a digital identification area required in live broadcasting according to a preset picture capturing rule;

the picture editing module 2 is used for arranging the captured pictures according to a time axis and adjusting the captured pictures to a preset resolution ratio to be used as the pictures to be recognized, or the picture editing module is used for adjusting the captured pictures to the preset resolution ratio and arranging the captured pictures according to the time axis to obtain the pictures to be recognized;

the extraction module 3 is used for acquiring the pictures to be identified and extracting the first digital characteristic information of each picture according to the time sequence through a convolutional neural network;

the adjusting module 4 is provided with a recurrent neural network, and the recurrent neural network combines the time sequence of the pictures and the first digital characteristic information to obtain second digital characteristic information;

and the result output module 5 is used for identifying the second digital characteristic information and outputting an identification result.

Optionally, the grab rule is as follows: identifying a live broadcast interface for playing the video stream; performing feature matching on the live broadcast interface to acquire a region which accords with a preset feature pattern in the live broadcast interface; the area is subjected to a grab picture.

Optionally, the grab rule is as follows: presetting a corresponding relation between a live broadcast room theme and a live broadcast picture capture area; and acquiring a picture of a corresponding capture area in a live broadcast picture according to the live broadcast theme.

Optionally, the extracting, by the convolutional neural network according to the time sequence, the first digital feature information specifically includes:

Optionally, the obtaining, by the recurrent neural network, the second digital feature information by combining the time sequence of the picture and the first digital feature information specifically includes:

the recurrent neural network uses an LSTM (Long Short-Term Memory) network to output second digital feature information by combining the first digital feature information of the front and rear pictures of the first digital feature information picture when identifying the first digital feature information.

Optionally, identifying the second digital feature information and outputting the identification result specifically includes:

and matching a digital label for the second digital feature information by using a CTC (connection semantic Temporal Classification) and obtaining an identification result according to the digital label.

Optionally, the recurrent neural network has a first rule, where the first rule is: and the time lapse and the change rule corresponding to the first digital characteristic information under the preset scene.

Optionally, the first rule is: the numerical values of the identification results are equal or increase along with the time sequence, and the change rule of the corresponding first digital characteristic information is changed.

Optionally, the recognizing the second digital feature information and outputting the recognition result have a second rule, where the second rule is: and the recognition result is a natural number smaller than the preset value.

Various modifications and specific examples in the foregoing method embodiments are also applicable to the system of the present embodiment, and the detailed description of the method is clear to those skilled in the art, so that the detailed description is omitted here for the sake of brevity.

Generally speaking, the method, the storage medium, the electronic device and the system for identifying the universal number provided by the embodiment of the invention output the identification value by cutting the picture to have uniform resolution and sequentially inputting the picture to the convolutional neural network and the cyclic neural network, and can better consider the time relation between the front and back information of the number through more characteristics compared with the traditional OCR technology and the like so as to enable the identification to be more accurate.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A general digital recognition method is applied to digital capture of live pictures and is characterized by comprising the following steps:

identifying the second digital characteristic information and outputting an identification result;

the method for extracting the first digital feature information by the convolutional neural network according to the time sequence specifically comprises the following steps:

the convolutional neural network extracts features from the picture to be identified by using a multilayer residual error network and obtains first digital feature information;

the recurrent neural network has a first rule, and the first rule is that: the time lapse corresponds to the change rule of the first digital characteristic information under the preset scene, and the numerical value of the identification result is equal to or increases the change rule of the corresponding first digital characteristic information along with the time sequence;

the second digital characteristic information is identified, and the identification result is output to have a second rule, wherein the second rule is as follows: and the recognition result is a natural number smaller than the preset value.

2. A method for universal number recognition according to claim 1, wherein the grab rule is:

identifying a live broadcast interface for playing the video stream;

the area is subjected to a grab picture.

3. A method for universal number recognition according to claim 1, wherein the grab rule is:

4. The method for universal digital signature as claimed in claim 1, wherein the recurrent neural network combines the time sequence of the pictures and the first digital signature to obtain the second digital signature specifically:

the recurrent neural network uses the LSTM network to output second digital feature information in conjunction with the first digital feature information of preceding and following pictures of the first digital feature information picture when identifying the first digital feature information.

5. The method for universal number recognition according to claim 1, wherein recognizing the second digital feature information and outputting the recognition result specifically comprises:

6. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that runs on the processor, characterized in that: the processor, when executing the computer program, implements the method of any of claims 1 to 5.

7. A system for universal number identification, comprising:

the capturing module (1) is used for capturing pictures in a digital identification area required in live broadcasting according to a preset picture capturing rule;

the image editing module (2) is used for arranging the captured images according to a time axis and adjusting the captured images to a preset resolution ratio to be used as the images to be recognized, or the image editing module is used for adjusting the captured images to the preset resolution ratio and then arranging the captured images according to the time axis to obtain the images to be recognized;

the extraction module (3) is used for acquiring the pictures to be identified, and extracting first digital characteristic information of each picture according to a time sequence through a convolutional neural network;

the adjusting module (4) is provided with a recurrent neural network, and the recurrent neural network is combined with the time sequence of the pictures and the first digital characteristic information to obtain second digital characteristic information; the result output module (5) is used for identifying the second digital characteristic information and outputting an identification result;

the method for extracting the first digital characteristic information by the convolutional neural network according to the time sequence specifically comprises the following steps: