WO2021159992A1 - 图片文本处理方法、装置、电子设备和存储介质 - Google Patents

图片文本处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021159992A1
WO2021159992A1 PCT/CN2021/074801 CN2021074801W WO2021159992A1 WO 2021159992 A1 WO2021159992 A1 WO 2021159992A1 CN 2021074801 W CN2021074801 W CN 2021074801W WO 2021159992 A1 WO2021159992 A1 WO 2021159992A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
picture
target
punctuation mark
display interface
Prior art date
Application number
PCT/CN2021/074801
Other languages
English (en)
French (fr)
Inventor
孟婉婷
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP21753248.0A priority Critical patent/EP4102347A4/en
Publication of WO2021159992A1 publication Critical patent/WO2021159992A1/zh
Priority to US17/816,794 priority patent/US20220366711A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1456Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on user interactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/0485Scrolling or panning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area

Definitions

  • This application relates to picture recognition technology, in particular to a picture text processing method, device, electronic equipment and storage medium.
  • the embodiments of the present application provide a method, device, electronic device, and storage medium for processing image text, which can simplify user operation procedures and level complexity.
  • An image text processing method including:
  • the location information is used to indicate the user's operating position on the picture
  • a text display interface is superimposed on the picture, and the target text is displayed on the text display interface.
  • a picture text processing device including:
  • the obtaining module is used to obtain user operation instructions carrying position information; the position information is used to indicate the user's operation position on the picture;
  • a recognition module configured to recognize the target text corresponding to the location information from the picture according to the user operation instruction
  • the display module is used to superimpose and display a text display interface on the picture, and display the target text on the text display interface.
  • An electronic device including a memory and a processor, and a computer program stored in the memory.
  • the processor When the computer program is executed by the processor, the processor is executed as described in any one of the embodiments of the image and text processing method. The steps of the picture text processing method described.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the image text processing method as described in any one of the image text processing method embodiments are realized.
  • FIG. 1 is an application environment diagram of a method for processing picture text in an embodiment
  • Fig. 2 is a flowchart of a method for processing picture text in an embodiment
  • 2a and 2b are respectively schematic diagrams of displaying pictures and texts in an embodiment
  • Fig. 3 is a flowchart of a method for processing picture text in an embodiment
  • FIG. 4 is a flowchart of a method for processing picture text in an embodiment
  • FIG. 5 is a flowchart of another image text processing method provided by an embodiment
  • Fig. 6 is a flowchart of a method for processing picture text in an embodiment
  • FIG. 7 is a flowchart of a method for processing picture text in an embodiment
  • Fig. 7a, Fig. 7b, Fig. 8, Fig. 9a, Fig. 9b, Fig. 9c are respectively schematic diagrams of displaying picture text in an embodiment
  • Figure 10 is a structural block diagram of a picture text processing device in an embodiment
  • Figure 11 is a structural block diagram of a picture text processing device in an embodiment
  • Fig. 12 is a structural block diagram of an electronic device in an embodiment.
  • first, second, etc. used in this application can be used herein to describe various elements or elements, but these elements are not limited by these terms. These terms are only used to distinguish the first element or element from another element or element.
  • first client may be referred to as the second client, and similarly, the second client may be referred to as the first client. Both the first client and the second client are clients, but they are not the same client.
  • Fig. 1 is a schematic diagram of an application environment of a picture text processing method in an embodiment.
  • the application environment includes users and terminals.
  • the terminal wants the user to display the picture.
  • the user can perform operations such as long-pressing, double-clicking, and sliding on the picture.
  • the terminal receives the above operation from the user, it recognizes the text corresponding to the operation position on the picture and displays it on the upper layer of the picture. Displayed on the text display interface.
  • the terminal may be a mobile phone, a computer, an iPad, a game console, etc., which is not limited in the embodiment of the present application.
  • the image text processing method provided by the embodiments of the present application can be used to solve the problems of complex hierarchical display and complex user operations in the method used in extracting text in the image in the prior art.
  • Fig. 2 is a flowchart of a method for processing picture text in an embodiment.
  • the image and text processing method in this embodiment is described by taking an example of running on the terminal in FIG. 1.
  • the image text processing method includes the following operations:
  • the user can input user operation instructions in a variety of ways, for example, long-press a certain position of the picture, double-click a certain position of the picture, or perform operations such as sliding on the picture, etc.
  • the operation position can be the user long The pressed position, the position where the user double-clicks the picture, or the position where the user slides on the picture, etc., are not limited in the embodiment of the present application.
  • the user operation instruction is used to instruct the terminal to recognize the text corresponding to the user's operation position on the picture.
  • the user when the user browses a picture through the display interface of the terminal, if the picture contains text and the user needs to operate on the text, the user can trigger the user operation instruction by long pressing, double clicking, or sliding , Instruct the terminal to recognize the text corresponding to the operating position.
  • the target text may be a sentence of text in the picture, a paragraph of text, or even all the text, which is not limited in the embodiment of the present application.
  • the terminal After obtaining the user operation instruction, the terminal starts to recognize the target text corresponding to the location information from the picture.
  • the terminal can recognize all the text on the picture, and then determine the target text corresponding to the operation position from all the text.
  • the terminal may first intercept a certain range of small pictures from the picture according to the location information, then recognize the text on the intercepted small picture, and then determine the text corresponding to the user's operating position from the text on the small picture.
  • the target text corresponding to the position information may be extended forward and backward according to the operation position indicated by the position information, and a sentence text, a paragraph text, etc., that extend before and after the operation position are determined as the target text; or
  • the operation position is the center, extending a certain size up and down, and the left and right sides use the picture width as the size to extend a target area, and a complete sentence or paragraph in the target area is used as the target text; or, the position information can also be used
  • the sentence between the two punctuation marks before and after the corresponding operation position is used as the target text, etc., which are not limited in the embodiment of the present application.
  • S203 Superimpose and display the text display interface on the picture, and display the target text on the text display interface.
  • the text display interface is superimposed on the picture, and the target text is displayed on the text display interface.
  • the text display interface can be a display interface generated in advance. After the terminal recognizes the target text, the text display interface can be directly called to display the target text. Alternatively, the terminal can also generate a text display interface in real time after recognizing the target text, and superimpose it on the picture to display the target text, which is not limited in the embodiment of the present application.
  • the size of the text display interface may be preset or obtained according to the size of the target text, which is not limited in the embodiment of the present application.
  • the text displayed on the text display interface can be edited.
  • the user can perform operations such as copying, sharing, and editing the text displayed on the text display interface.
  • a picture is displayed on the terminal display interface.
  • the user needs to operate some text in the picture, he can press and hold the corresponding position on the picture with his finger.
  • a user operation instruction will be triggered.
  • the operation instruction will record the position information of the user long press.
  • the terminal recognizes the corresponding target text according to the position information and superimposes the display on the picture.
  • the text display interface displays the recognized target text on the text display interface.
  • the terminal obtains a user operation instruction carrying location information, and according to the user operation instruction, recognizes the target text corresponding to the location information from the image, superimposes the text display interface on the image, and displays it in the text
  • the target text is displayed on the interface.
  • the user can trigger the user operation instruction at the corresponding position on the picture.
  • the terminal recognizes the target text corresponding to the operation position.
  • the terminal can directly superimpose the text display interface on the picture and display the text display interface.
  • the target text is displayed on the upper level without jumping to the next level of display interface to display the text, making the hierarchical display easier, and the user can directly operate the target text displayed on the text display interface without jumping to the next level of display interface Operate the target text above to simplify the user operation process.
  • the terminal recognizes and displays the target text corresponding to the operation position on the text display interface. There is no need to display all the text in the picture, which reduces the load caused by the text displayed on the terminal.
  • the user can directly operate on the required text. User operation time.
  • the terminal can recognize the target text in a variety of different ways. The following describes different methods of recognizing the target text.
  • Fig. 3 is a flowchart of a method for processing image text according to an embodiment.
  • the embodiment of the present application relates to the process of the terminal identifying all the text on the picture, and then determining the specific time limit of the target text from all the texts according to the position information.
  • the method includes the following operations:
  • S301 Recognize all the text on the picture according to the user's operation instruction.
  • the terminal after obtaining the user operation instruction, the terminal recognizes all the text on the picture.
  • the terminal may use optical character recognition (Optical Character Recognition, OCR) technology to recognize the text on the picture, and may also use a neural network algorithm to recognize the text on the picture, which is not limited in this embodiment of the application.
  • OCR Optical Character Recognition
  • S302 Determine the target text from all the texts according to the location information.
  • the terminal needs to determine the target text from all texts based on the location information, that is, to determine the target text from all texts based on the user's operating position.
  • the operation position extension can be determined as the target text based on a sentence of semantic extension in a unit of one sentence, or a paragraph can be determined as the target text by a paragraph of the operation position extension in accordance with the semantic extension. Be restricted.
  • the terminal first recognizes all the text on the image according to the user operation instruction, and then determines the target text from all the texts according to the position information, which can accurately recognize the target text in combination with the semantic information, and avoid the appearance of semantics. Problems such as incompleteness and sentence segmentation improve the accuracy of text recognition.
  • operation S302 "determine target text from all texts based on location information" may include the following operations:
  • the terminal may extend before and after the operation position according to the semantic direction, and determine the first punctuation mark before the operation position and the second punctuation mark after the operation position.
  • the terminal may extend forward and backward according to the semantics, and the period ".” at the end of the first line of the text is determined as the first punctuation mark, and the first comma in the second line of the text " ,” is determined as the second punctuation mark.
  • the first punctuation mark is the first specific punctuation mark whose operation position is forward
  • the second punctuation mark is the first punctuation mark whose operation position is backward.
  • punctuation marks can be determined according to semantic information, and the punctuation marks before and after a complete sentence are used as specific punctuation marks to determine a sentence as the target text.
  • the specific punctuation mark may be a period, a question mark, an exclamation mark and other punctuation marks, which are not limited in the embodiment of the present application.
  • S402 Determine the text between the first punctuation mark and the second punctuation mark as the target text.
  • the terminal determines the text between two adjacent punctuation marks as the target text. For example, “GGGGGHHHHHHHHHHHHHHHhKKKKK,” in Figure 2a is determined as the target text, or the two adjacent specific punctuation marks are determined as the target text.
  • the text between the symbols is determined as the target text, as shown in Figure 2b, "GGGGGHHHHHHHHHHHHhKKKKK, XXXXXXXX, XXXXXXXXXXXX?" is determined as the target text.
  • the terminal determines the first punctuation mark before the operation position indicated by the position information and the second punctuation mark after the operation position from all the text, and the first punctuation mark and the second punctuation mark are combined
  • the text between the symbols is determined as the target text, and the target text can be quickly and accurately identified through punctuation.
  • Fig. 5 is a flowchart of another image text processing method provided by an embodiment.
  • the embodiment of the present application relates to a specific implementation process in which a terminal determines a target area on a picture according to an operating position, recognizes the text of the target area, and determines the target text from the text of the target area.
  • the method includes the following operations:
  • S501 Determine a target area on the picture according to the operating position indicated by the position information.
  • the terminal may determine a target area on the picture according to the operation position indicated by the position information.
  • a rectangular frame is formed with the operation position as the center, and the width of the rectangular frame is a preset length.
  • the width of the rectangular box is equal to the width of the picture, and the rectangular box is used as the target area.
  • the terminal may directly recognize the text in the target area on the picture after determining the target area on the picture, or the terminal may also capture the target area from the picture after determining the target area, and then recognize the intercepted target The text in the area. The text outside the target area on the picture is not recognized.
  • the terminal may use OCR technology to recognize the text on the picture, and may also use a neural network algorithm to recognize the text on the picture, which is not limited in this embodiment of the application.
  • S503 Determine the target text from the text in the target area according to the location information.
  • the terminal needs to determine the target text from the text in the target area according to the position information, that is, to determine the target text from the text in the target area according to the user's operating position.
  • the operation position extension can be determined as the target text based on a sentence of semantic extension in a unit of one sentence, or a paragraph can be determined as the target text by a paragraph of the operation position extension in accordance with the semantic extension. Be restricted.
  • operation S503 "determine the target text from the text in the target area according to the location information" may include the following operations:
  • S602. Determine the text between the first punctuation mark and the second punctuation mark as the target text.
  • the first punctuation mark is the first specific punctuation mark whose operation position is forward
  • the second punctuation mark is the first punctuation mark whose operation position is backward.
  • the terminal determines the target area on the picture according to the operating position indicated by the position information, recognizes the text in the target area, and determines the target text from the text in the target area according to the position information. It is enough to recognize the text in the target area, and there is no need to recognize all the text on the picture, which reduces the terminal load caused by the recognized text.
  • the terminal can also insert a drag handle into the text of the picture for the user to select the desired text.
  • the foregoing image text processing method further includes the following operations:
  • S701 Determine the beginning and end positions of the target text on the picture, and insert drag handles at the beginning and end positions respectively.
  • the terminal can also insert a drag handle at the beginning and end of the target text on the picture, and the user can drag the drag handle to select the desired text.
  • a drag handle at the beginning and end of the target text on the picture, and the user can drag the drag handle to select the desired text.
  • two cursor-shaped drag handles are inserted at the beginning and end of the target text. The user can drag the handle at the start position or the end position on the display interface of the terminal, and select The required text.
  • the user operates the drag handle to trigger a drag operation instruction.
  • a drag operation instruction is generated.
  • the terminal may obtain the text between the two drag handles as the new target text according to the drag instruction, and display the new target text in the text display interface.
  • operation S703 may include: determining the position of the two drag handles according to the drag operation instruction; identifying the text information between the positions of the two drag handles from the picture as the updated target text; displaying in the text The updated target text is displayed in the interface.
  • the terminal obtains the positions of the two drag handles according to the drag operation instruction, and recognizes the text information between the positions of the two drag handles from the picture as the updated target text, as shown in FIG. 7b .
  • the text between the two drag handles is "GGGGGHHHHHHHHHHHHhKKKKK,XXXXXXXXX,XXXXXXXXXXX?XXXXXXXXXXXXX,XXXXXXXXXXXXXXX,” then the terminal will use the text "GGGGGHHHHHHHHHHHHHHhKKKKK,XXXXXXXXXXXX,XXXXXXXXXXXX?XXXXXXXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXX,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • the size of the text display interface is proportional to the size of the target text.
  • the size of the text display interface is proportional to the size of the target text, that is, the terminal can adjust the size of the text display interface according to the size of the target text, or adjust the size of the target text according to the size of the text display interface.
  • the size makes the proportion of the text display interface more beautiful and coordinated.
  • the terminal determines the beginning and end positions of the target text on the picture, and inserts drag handles at the beginning and end positions respectively, and obtains the drag operation instruction of the user on the drag handle, and according to the drag operation instruction Update the text displayed in the text display interface.
  • the user can select the desired text by dragging the handle, so that the terminal can accurately identify the text information required by the user.
  • the user operation is simple and convenient, which is great Meet user needs.
  • the terminal avoids the level jump, and the level operation is simple.
  • some controls can also be set on the text display interface to implement the setting of the target text and the text display interface.
  • an operation control is provided on the text display interface, and the image text processing method further includes: when it is detected that the operation control is triggered, performing a target operation corresponding to the operation control on the target text.
  • operation controls can be set on the text display interface to implement different operations on the target text.
  • a copy control and a share control are provided on the text display interface
  • the target operation corresponding to the copy control is a copy operation
  • the target operation corresponding to the share control is a sharing operation.
  • the target text in the text display interface is copied
  • the terminal detects that the user clicks the sharing control the target text in the text display interface is shared to the application or page specified by the user.
  • Other operation controls can also be set according to requirements, and the embodiment of the present application is not limited to this.
  • the text display interface is provided with functional controls
  • the image text processing method described above further includes: setting the attributes of the target text and/or the attributes of the text display interface when it is detected that the functional controls are triggered.
  • the attributes of the target text include at least one of the font size, font format, and font color of the target text; the attributes of the text display interface include at least one of the background pattern, background color, shape, size, and position of the text display interface.
  • a functional control "settings" can be set on the text display interface, and the user clicks on the functional control, as shown in Figure 9b, a setting interface pops up, which can include fonts Setting options such as size, font format, font color, background pattern, background color, shape, size, and position of the text display interface, the user can set the attributes of the target text and the text display interface in this setting interface. Or, as shown in Figure 9c, you can directly set the font size, font format, font color, background pattern, background color, shape, size, position and other functional controls on the text display interface. Which content operation the user needs to set The corresponding function control is sufficient.
  • an operation control is provided on the text display interface, and when the operation control is detected to be triggered, the target operation corresponding to the operation control is performed on the target text, and/or the text display interface is provided with
  • the function control when it is detected that the function control is triggered, set the properties of the target text and/or the properties of the text display interface. It is convenient for users to set the attributes of the target text or text display interface to meet the needs of different users.
  • the user can also directly drag the text display interface.
  • the above image text processing method can further include: receiving a movement operation instruction input by the user; the movement operation instruction includes a movement track; Move the text display interface according to the movement track.
  • the user can directly drag the text display interface, and the terminal records the user's movement operation trajectory, and moves the text display interface according to the movement operation trajectory to meet the needs of the user.
  • the user can move the text display interface to any position of the display interface, for example, drag the text display interface up or down, or drag the text display interface to a position where there is no text on the picture, etc., this application
  • the embodiment is not limited.
  • Fig. 10 is a structural block diagram of a picture text processing apparatus according to an embodiment. As shown in Figure 10, the device includes:
  • the obtaining module 21 is used to obtain user operation instructions carrying position information; the position information is used to indicate the user's operation position on the picture;
  • the recognition module 22 is used to recognize the target text corresponding to the location information from the picture according to the user operation instruction;
  • the display module 23 is used to display a text display interface superimposed on the picture, and display the target text on the text display interface.
  • the recognition module 22 is configured to recognize all the text on the picture according to user operation instructions; and determine the target text from all the texts according to the position information.
  • the recognition module 22 is used to determine, from all texts, a first punctuation mark whose operation position indicated by the position information is forward, and a second punctuation mark whose operation position is backward, the first punctuation mark and the second punctuation mark.
  • the punctuation marks are adjacent; the text between the first punctuation mark and the second punctuation mark is determined as the target text.
  • the recognition module 22 is configured to determine the target area on the picture according to the operating position indicated by the position information; recognize the text in the target area; and determine the target text from the text in the target area according to the position information.
  • the recognition module 22 is used to determine, from the text in the target area, a first punctuation mark whose operation position indicated by the position information is forward, and a second punctuation mark whose operation position is backward, and the first punctuation mark Adjacent to the second punctuation mark; the text between the first punctuation mark and the second punctuation mark is determined as the target text.
  • the first punctuation mark is the first specific punctuation mark whose operation position is forward
  • the second punctuation mark is the first punctuation mark whose operation position is backward
  • the apparatus further includes:
  • the inserting module 24 is used to determine the beginning and end positions of the target text on the picture, and insert drag handles at the beginning and end positions respectively;
  • the obtaining module 21 is also used to obtain the drag operation instruction of the user on the drag handle;
  • the display module 23 is also used to update the text displayed in the text display interface according to the drag operation instruction.
  • the display module 23 is further configured to determine the positions of the two drag handles according to the drag operation instruction; identify the text information between the positions of the two drag handles from the picture as the updated target text; Display the updated target text in the text display interface.
  • the apparatus further includes:
  • the detection module 25 is configured to perform a target operation corresponding to the operation control on the target text when it is detected that the operation control is triggered.
  • the target operation is a copy operation
  • the target operation is a sharing operation.
  • the detection module 25 is also used to set the attributes of the target text and/or the text display interface when it detects that the function control is triggered.
  • the attributes of the target text include at least one of the font size, font format, and font color of the target text; the attributes of the text display interface include the background pattern, background color, shape, size, and position of the text display interface. at least one.
  • the size of the text display interface is proportional to the size of the target text.
  • the display module 23 is further configured to receive a movement operation instruction input by the user; the movement operation instruction includes a movement track; and the text display interface moves according to the movement track.
  • each module in the picture text processing device is only for illustration. In other embodiments, the picture text processing device can be divided into different modules as needed to complete all or part of the functions of the picture text processing device.
  • Each module in the above-mentioned picture and text processing device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the corresponding operations of the above-mentioned modules.
  • Fig. 12 is a schematic diagram of the internal structure of an electronic device in an embodiment.
  • the electronic device includes a processor and a memory connected through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the memory may include a non-volatile storage medium and internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement a picture text processing method provided in the following embodiments.
  • the internal memory provides a cached operating environment for the operating system computer program in the non-volatile storage medium.
  • the electronic device can be any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and a wearable device.
  • each module in the image and text processing apparatus provided in the embodiment of the present application may be in the form of a computer program.
  • the computer program can be run on a terminal or a server.
  • the program module constituted by the computer program can be stored in the memory of the electronic device.
  • the computer program is executed by the processor, the operation of the method described in the embodiment of the present application is realized.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • a computer program product containing instructions that, when it runs on a computer, causes the computer to execute a method for processing image and text.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请提供的图片文本处理方法、装置、电子设备和存储介质,终端获取携带位置信息的用户操作指令,根据用户操作指令,从图片上识别位置信息对应的目标文本,在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本。终端可以直接在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本,无需跳转至下一级显示界面展示文本,使得层级显示更加的简单,简化用户操作流程,并减少了用户操作时间。

Description

图片文本处理方法、装置、电子设备和存储介质
本申请要求于2020年02月11日提交中国专利局,申请号为2020100864146,发明名称为“图片文本处理方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图片识别技术,特别是涉及一种图片文本处理方法、装置、电子设备和存储介质。
背景技术
一直以来,文本都在人们生活中发挥着重要的作用。文本中包含丰富而精确的信息对基于视觉的应用来说非常重要。目前,越来越多的图片中包括文本,在很多场景中,需要识别图片中的文本。
例如,在一些应用中,当用户需要提取图片中的文本时,可以点击“识别文字”按钮,终端识别图片中的文本,并从显示图片的当前页面跳转至下一级页面中,显示该文本,用户可以在下一级页面中该文本进行编辑、复制等操作。
发明内容
本申请实施例提供了一种图片文本处理方法、装置、电子设备和存储介质,可以简化用户操作流程和层级复杂度。
一种图片文本处理方法,包括:
获取携带位置信息的用户操作指令;所述位置信息用于指示用户在图片上的操作位置;
根据所述用户操作指令,从所述图片上识别所述位置信息对应的目标文本;
在所述图片上叠加展示文本显示界面,并在所述文本显示界面上展示所述目标文本。
一种图片文本处理装置,包括:
获取模块,用于获取携带位置信息的用户操作指令;所述位置信息用于指示用户在图片上的操作位置;
识别模块,用于根据所述用户操作指令,从所述图片上识别所述位置信息对应的目标文本;
显示模块,用于在所述图片上叠加展示文本显示界面,并在所述文本显示界面上展示所述目标文本。
一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如图片文本处理方法实施例中任一项所述的图片文本处理方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如图片文本处理方法实施例中任一项所述的图片文本处理方法的步骤。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图片文本处理方法的应用环境图;
图2为一个实施例中图片文本处理方法的流程图;
图2a和图2b分别为一个实施例中图片文本显示示意图;
图3为一个实施例中图片文本处理方法的流程图;
图4为一个实施例中图片文本处理方法的流程图;
图5为一个实施例提供的另一种图片文本处理方法的流程图;
图6为一个实施例中图片文本处理方法的流程图;
图7为一个实施例中图片文本处理方法的流程图;
图7a、图7b、图8、图9a、图9b、图9c分别为一个实施例中图片文本显示示意图;
图10为一个实施例中图片文本处理装置的结构框图;
图11为一个实施例中图片文本处理装置的结构框图;
图12为一个实施例中电子设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及 实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
可以理解,本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种元素或者元件,但这些元件不受这些术语限制。这些术语仅用于将第一个元素或元件与另一个元素或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一客户端称为第二客户端,且类似地,可将第二客户端称为第一客户端。第一客户端和第二客户端两者都是客户端,但其不是同一客户端。
图1为一个实施例中图片文本处理方法的应用环境示意图。如图1所示,该应用环境包括用户和终端。该终端想用户展示图片,用户可以对该图片进行长按、双击、滑动等操作,终端接收到用户的上述操作时,将图片上该操作位置对应的文本识别出来,并在叠加显示于图片上层的文本显示界面上进行显示。其中,该终端可以为手机、电脑、iPad、游戏机等,本申请实施例不加以限制。
本申请实施例提供的图片文本处理方法可以用于解决现有技术中在提取图片中的文本时,所采用的方法存在层级显示复杂和用户操作复杂的问题。
图2为一个实施例中图片文本处理方法的流程图。本实施例中的图片文本处理方法,以运行于图1中的终端上为例进行描述。如图2所示,该图片文本处理方法包括以下操作:
S201、获取携带位置信息的用户操作指令;位置信息用于指示用户在图片上的操作位置。
其中,用户可以通过多种方式输入用户操作指令,例如,长按图片的某个位置,双击图片的某个位置,或者,在图片上进行滑动等操作等,相应的,操作位置可以为用户长按的位置,用户双击图片的位置,或者,用户在图片上滑动的位置等,本申请实施例中不加以限制。该用户操作指令用于指示终端识别用户在图片上的操作位置对应的文本。
在本实施例中,当用户通过终端的显示界面浏览图片时,若图片上包含文本,并且,用户需要对该文本进行操作时,用户可以通过长按、双击、滑动的方式触发该用户操作指令,指示终端识别该操作位置对应的文本。
S202、根据用户操作指令,从图片上识别位置信息对应的目标文本。
其中,目标文本可以是图片中的一句文本、一段本文、甚至是所有的文本,本申请实施例不加以限制。
在本实施例中,终端获取到该用户操作指令之后,开始从图片上识别该位置信息对应的目标文本。终端可以将图片上的全部文本都识别出来,然后再从全部文本中确定该操作位置对应的目标文本。或者,终端也可以先根据该位置信息从图片上截取一定范围的小图片,然后识别该截取的小图片上的文本,再从小图片上的文本中确定用户的操作位置对应的文本。
在本实施例中,该位置信息对应的目标文本,可以是根据该位置信息指示的操作位置前后延伸,将该操作位置前后延伸的一句文本、一个段落文本等确定为目标文本;也可以是以该操作位置为中心,上下延展一定的尺寸,左右以图片宽度为尺寸,延伸出一个目标区域,将该目标区域中的一个语句完整的句子或者段落作为目标文本;或者,还可以将该位置信息对应的操作位置前后的两个标点符号之间的语句作为目标文本,等等,本申请实施例中不加以限制。
S203、在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本。
在本实施例中,终端识别到目标文本之后,在图片上叠加展示文本显示界面,将该目标文本在文本显示界面上进行显示。该文本显示界面可以预先生成的一个显示界面,当终端识别到目标文本之后,可以直接调用该文本显示界面显示目标文本。或者,终端也可以在识别到目标文本之后,实时的生成一个文本显示界面,并叠加显示在图片上方,用来显示目标文本,本申请实施例中不加以限制。该文本显示界面的尺寸可以是预先设置好的,也可以是根据目标文本的尺寸得到的,本申请实施例中不加以限制。
并且,该文本显示界面上显示的文本是可以编辑了,例如,用户可以该文本显示界面上显示的文本进行复制、分享、编辑等操作。
如图2a所示,终端显示界面上展示一张图片,当用户需要对该图片中的一些文字进行操作时,可以通过手指长按图片上的相应位置。如图2b,当用户长按图片上的相应位置时,会触发用户操作指令,该操作指令会记录用户长按的位置信息,终端根据该位置信息识别相应的目标文本,并在图片上叠加展示文本显示界面,在该文本显示界面上显示识别的目标文本。
本申请实施例提供的图片文本处理方法,终端获取携带位置信息的用户操作指令,根据用户操作指令,从图片上识别位置信息对应的目标文本,在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本。当用户需要对图片上文本进行操作时,可在图片上的相应位置触发用户操作指令, 终端识别该操作位置对应的目标文本,终端可以直接在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本,无需跳转至下一级显示界面展示文本,使得层级显示更加的简单,并且,用户可直接对文本显示界面上显示的目标文本进行操作,无需跳转至下一级显示界面上对目标文本进行操作,简化用户操作流程,另外,用户需要哪些文本,在图片对该相应的文本位置进行操作,终端识别并在文本显示界面上展示该操作位置对应的目标文本即可,终端无需展示图片上所有的文本,减轻了终端显示文本所带来的负荷,而且,用户可以直接对需要的文本进行操作,无需像现有技术一样再从所有的文本中查找需要的文本,减少了用户操作时间。
在图2所示实施例中,终端可以通过多种不同的方式来识别目标文本,下面分别介绍识别目标文本的不同方法。
图3为一个实施例提供的一种图片文本处理方法的流程图。本申请实施例涉及的是终端识别图片上的所有文本,再根据位置信息从所有文本中确定目标文本的具体时限过程。如图3所示,该方法包括以下操作:
S301、根据用户操作指令,识别图片上的全部文本。
在本实施例中,终端获取到该用户操作指令之后,识别图片上的全部文本。终端可以采用光学字符识别(Optical Character Recognition,OCR)技术识别图片上的文本,也可以采用神经网络算法识别图片上的文本,本申请实施例中不加以限制。
S302、根据位置信息从全部文本中确定目标文本。
在本实施例中,终端需要根据该位置信息从全部文本中确定目标文本,也即根据用户的操作位置从全部文本中确定目标文本。可以以一句话为单位,将操作位置延伸按照语义延伸的一句话确定为目标文本,也可以一个段落为单位,将操作位置延伸按照语义延伸的一个段落确定为目标文本,本申请实施例中不加以限制。
本申请实施例提供的图片文本处理方法,终端根据用户操作指令,先识别图片上的全部文本,再根据位置信息从全部文本中确定目标文本,可以结合义信息准确的识别目标文本,避免出现语义不全、断句等问题,提高了文本识别的准确性。
在一个实施例中,如图4所示,操作S302“根据位置信息从全部文本中确定目标文本”可以包括以下操作:
S401、从全部文本中,确定位置信息指示的操作位置向前的第一标点符 号,和操作位置向后的第二标点符号,第一标点符号和第二标点符号相邻。
在本实施例中,终端可以根据语义方向,在操作位置前后进行延伸,确定操作位置之前的第一标点符号,和操作位置之后的第二标点符号。如图2a所示,在用户手指长按的位置,按照语义向前后延伸,将文本的第一行结尾的句号“。”确定为第一标点符号,将文本第二行的第一个逗号“,”确定为第二标点符号。
可选地,第一标点符号为操作位置向前的第一个特定标点符号,第二标点符号为操作位置向后的第一个标点符号。在本实施例中,可以根据语义信息确定标点符号,将一个完整的语句前后的标点作为特定标点符号,来确定一个语句作为目标文本。例如,该特定标点符号可以为句号、问号、感叹号等标点符号,本申请实施例中不加以限制。如图2b所示,在用户手指长按的位置,按照语义向前后延伸,将文本的第一行结尾的句号“。”确定为第一标点符号,将文本第三行的第一个问号“?”确定为第二标点符号。
S402、将第一标点符号和第二标点符号之间的文本确定为目标文本。
在本实施例中,终端将相邻的两个标点符号之间的文本确定为目标文本,例如,将图2a中的“GGGGGHHHHHHHHHHHHhKKKKK,”确定为目标文本,或者,将相邻的两个特定标点符号之间的文本确定为目标文本,如图2b所示,将“GGGGGHHHHHHHHHHHHhKKKKK,XXXXXXXXXX,XXXXXXXXXXX?”确定为目标文本。
本申请实施例提供的图片文本处理方法,终端从全部文本中,确定位置信息指示的操作位置之前的第一标点符号,和操作位置之后的第二标点符号,将第一标点符号和第二标点符号之间的文本确定为目标文本,通过标点符号能快速准确的识别出目标文本。
图5为一个实施例提供的另一种图片文本处理方法的流程图。本申请实施例涉及的是终端根据操作位置在图片上确定目标区域,识别目标区域的文本,从目标区域的文本中确定目标文本的具体实现过程。如图5所示,该方法包括以下操作:
S501、根据位置信息指示的操作位置,在图片上确定目标区域。
在本实施例中,终端可以根据该位置信息指示的操作位置,在图片上确定一个目标区域,例如,以该操作位置为中心,形成一个矩形框,该矩形框的宽度为预设长度,该矩形框的宽度等于图片宽度,将该矩形框作为目标区域。
S502、识别目标区域内的文本。
在本申请实施例中,终端可以在图片上确定目标区域后,直接识别图片上目标区域内的文本,或者,终端也可以确定目标区域之后,从图片上截取该目标区域,然后识别截取的目标区域中的文本。对于图片上目标区域之外的文本不作识别。可选地,终端可以采用OCR技术识别图片上的文本,也可以采用神经网络算法识别图片上的文本,本申请实施例中不加以限制。
S503、根据位置信息,从目标区域内的文本中确定目标文本。
在本实施例中,终端需要根据该位置信息从目标区域内的文本中确定目标文本,也即根据用户的操作位置从目标区域内的文本中确定目标文本。可以以一句话为单位,将操作位置延伸按照语义延伸的一句话确定为目标文本,也可以一个段落为单位,将操作位置延伸按照语义延伸的一个段落确定为目标文本,本申请实施例中不加以限制。
在一个实施例中,如图6所示,操作S503“根据位置信息,从目标区域内的文本中确定目标文本”可以包括以下操作:
S601、从目标区域内的文本中,确定位置信息指示的操作位置向前的第一标点符号,和操作位置向后的第二标点符号,第一标点符号和第二标点符号相邻。
S602、将第一标点符号和第二标点符号之间的文本确定为目标文本。
可选地,第一标点符号为操作位置向前的第一个特定标点符号,第二标点符号为操作位置向后的第一个标点符号。
在本申请实施例中,操作S601和操作S602的实现原理和有益效果,可参照图4的操作S401和操作S402的实现原理和有益效果,此处不再赘述。
本申请实施例提供的图片文本处理方法,终端根据位置信息指示的操作位置,在图片上确定目标区域,识别目标区域内的文本,根据位置信息,从目标区域内的文本中确定目标文本,终端识别目标区域的文本即可,不需要识别图片上的全部文本,减少了识别文本带来的终端负荷。
在一个实施例中,终端还可以在图片的文本中***拖动把手,供用户选择所需的文本。如图7所示,上述图片文本处理方法还包括以下操作:
S701、在图片上确定目标文本的首尾位置,并在首尾位置处分别***拖动把手。
在本实施例中,终端确定目标文本之后,还可以在图片上目标文本的开始和结束的位置***拖动把手,用户可以对该拖动把手进行拖动,来选中所 需的文本。如图7a所示,在目标文本的首尾位置分别***两个光标形状的拖动把手,用户可以在终端的显示界面上拖动开始位置处的把手,或者拖动结束位置处的把手,选择所需的文本。
S702、获取用户对拖动把手的拖动操作指令。
在本实施例中,用户对该拖动把手进行操作,可触发拖动操作指令,如图7b所示,用户将目标文本结束位置处的拖动把手拖动到第三行文本的末尾,用户结束拖动操作后,生成拖动操作指令。
S703、根据拖动操作指令更新文本显示界面中展示的文本。
在本实施例中,终端可以根据拖动指令获取两个拖动把手之间的本文作为新的目标文本,在文本显示界面中展示该新的目标文本。
可选地,操作S703可以包括:根据拖动操作指令确定两个拖动把手的位置;从图片上识别两个拖动把手的位置之间的本文信息,作为更新后的目标文本;在文本显示界面中展示更新后的目标文本。
在本实施例中,终端根据该拖动操作指令获取两个拖动把手的位置,从图片上识别两个拖动把手的位置之间的文本信息作为更新后的目标文本,如图7b所示,两个拖动把手之间的文本为“GGGGGHHHHHHHHHHHHhKKKKK,XXXXXXXXXX,XXXXXXXXXXX?XXXXXXXXXXXXX,XXXXXXXXXXXXX,”则终端将文本“GGGGGHHHHHHHHHHHHhKKKKK,XXXXXXXXXX,XXXXXXXXXXX?XXXXXXXXXXXXX,XXXXXXXXXXXXX,”作为更新后的目标文本,在文本显示区域内显示。
可选地,在本实施例中,文本显示界面的尺寸与目标文本的尺寸成正比。
在本实施例中,文本显示界面的尺寸与目标文本的尺寸成正比,也即,终端可以根据目标文本的尺寸来调节文本显示界面的尺寸,或者,根据文本显示界面的尺寸来调节目标文本的尺寸,使得文本显示界面的比例更加美观、协调。
本申请实施例提供的图片文本处理方法,终端在图片上确定目标文本的首尾位置,并在首尾位置处分别***拖动把手,获取用户对拖动把手的拖动操作指令,根据拖动操作指令更新文本显示界面中展示的文本,当用户需要更新目标文本时,用户可以通过拖动把手选择所需的文本,使得终端可以准确的识别用户所需的文本信息,用户操作简单方便,极大的满足用户需求。而且,避免了终端进行层级跳转,层级操作简单。
在一些实施例中,还可以在文本显示界面上设置一些控件,实现对目标文本和文本显示界面的设置。可选地,文本显示界面上设置有操作控件,该 图片文本处理方法还包括:检测到操作控件被触发时,对目标文本执行操作控件对应的目标操作。
在本实施例中,可以在文本显示界面设置操作控件,以实现对目标文本的不同操作。如图8所示,文本显示界面上设置有复制控件和分享控件,复制控件对应的目标操作为复制操作,分享控件对应的目标操作为分享操作。例如,当终端检测到用户点击复制控件时,复制文本显示界面中的目标文本,当终端检测用户点击分享控件时,将文本显示界面中的目标文本分享至用户指定的应用或页面中。还可以根据需求设置其它的操作控件,本申请实施例中不以此为限。
在一个实施例中,文本显示界面上设置有功能控件,上述图片文本处理方法还包括:检测到功能控件被触发时,设置目标文本的属性和/或文本显示界面的属性。其中,目标文本的属性包括目标文本的字体尺寸、字体格式、字体颜色中的至少一个;文本显示界面的属性包括文本显示界面的背景图案、背景颜色、形状、尺寸、位置中的至少一个。
在本实施例中,如图9a所示,可以在文本显示界面上设置一个功能控件“设置”,用户点击该功能控件,如图9b所示,弹出一个设置界面,该设置界面中可以包括字体尺寸、字体格式、字体颜色、文本显示界面的背景图案、背景颜色、形状、尺寸、位置等设置选项,用户可以在该设置界面设置目标文本的属性和文本显示界面的属性。或者,如图9c所示,还可以直接在文本显示界面上设置字体尺寸、字体格式、字体颜色、背景图案、背景颜色、形状、尺寸、位置等多个功能控件,用户需要设置哪项内容操作对应的功能控件即可。
本申请实施例提供的图片文本处理方法,文本显示界面上设置有操作控件,在检测到操作控件被触发时,对目标文本执行操作控件对应的目标操作,和/或,文本显示界面上设置有功能控件,在检测到功能控件被触发时,设置目标文本的属性和/或文本显示界面的属性。方便用户对目标文本或文本显示界面的属性进行设置,满足不同用户的需求。
在一些场景中,为了满足用户需求,用户还可以对文本显示界面直接拖动,可选地,上述图片文本处理方法还可以包括:接收用户输入的移动操作指令;移动操作指令中包括移动轨迹;根据移动轨迹移动文本显示界面。
在本实施例中,用户可直接对该文本显示界面进行拖动,终端记录用户的移动操作轨迹,根据该移动操作轨迹移动文本显示界面,以满足用户需求。 例如,用户可以将文本显示界面移动到显示界面的任意位置,例如,可以将文本显示界面向上或向下拖动,或者将文本显示界面拖动到图片上没有文本的位置,等等,本申请实施例不加以限制。
应该理解的是,虽然图2-图7的流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图2-图7中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。
图10为一个实施例的图片文本处理装置的结构框图。如图10所示,该装置包括:
获取模块21,用于获取携带位置信息的用户操作指令;位置信息用于指示用户在图片上的操作位置;
识别模块22,用于根据用户操作指令,从图片上识别位置信息对应的目标文本;
显示模块23,用于在图片上叠加展示文本显示界面,并在文本显示界面上展示目标文本。
在一个实施例中,识别模块22,用于根据用户操作指令,识别图片上的全部文本;根据位置信息从全部文本中确定目标文本。
在一个实施例中,识别模块22,用于从全部文本中,确定位置信息指示的操作位置向前的第一标点符号,和操作位置向后的第二标点符号,第一标点符号和第二标点符号相邻;将第一标点符号和第二标点符号之间的文本确定为目标文本。
在一个实施例中,识别模块22,用于根据位置信息指示的操作位置,在图片上确定目标区域;识别目标区域内的文本;根据位置信息,从目标区域内的文本中确定目标文本。
在一个实施例中,识别模块22,用于从目标区域内的文本中,确定位置信息指示的操作位置向前的第一标点符号,和操作位置向后的第二标点符号,第一标点符号和第二标点符号相邻;将第一标点符号和第二标点符号之间的文本确定为目标文本。
在一个实施例中,第一标点符号为操作位置向前的第一个特定标点符号,第二标点符号为操作位置向后的第一个标点符号。
在一个实施例中,如图11所示,装置还包括:
***模块24,用于在图片上确定目标文本的首尾位置,并在首尾位置处分别***拖动把手;
获取模块21还用于获取用户对拖动把手的拖动操作指令;
显示模块23还用于根据拖动操作指令更新文本显示界面中展示的文本。
在一个实施例中,显示模块23还用于根据拖动操作指令确定两个拖动把手的位置;从图片上识别两个拖动把手的位置之间的本文信息,作为更新后的目标文本;在文本显示界面中展示更新后的目标文本。
在一个实施例中,如图11所示,装置还包括:
检测模块25,用于检测到操作控件被触发时,对目标文本执行操作控件对应的目标操作。
在一个实施例中,操作控件为复制控件时,目标操作为复制操作;
操作控件为分享控件时,目标操作为分享操作。
在一个实施例中,检测模块25,还用于检测到功能控件被触发时,设置目标文本的属性和/或文本显示界面的属性。
在一个实施例中,目标文本的属性包括目标文本的字体尺寸、字体格式、字体颜色中的至少一个;文本显示界面的属性包括文本显示界面的背景图案、背景颜色、形状、尺寸、位置中的至少一个。
在一个实施例中,文本显示界面的尺寸与目标文本的尺寸成正比。
在一个实施例中,显示模块23还用于接收用户输入的移动操作指令;移动操作指令中包括移动轨迹;根据移动轨迹移动文本显示界面移动。
本申请实施例提供的图片文本处理装置的实现原理和有益效果可参照方法实施例的实现原理和有益效果,此处不再赘述。
上述图片文本处理装置中各个模块的划分仅用于举例说明,在其他实施例中,可将图片文本处理装置按照需要划分为不同的模块,以完成上述图片文本处理装置的全部或部分功能。
关于图片文本处理装置的具体限定可以参见上文中对于图片文本处理方法的限定,在此不再赘述。上述图片文本处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储 器中,以便于处理器调用执行以上各个模块对应的操作。
图12为一个实施例中电子设备的内部结构示意图。如图12所示,该电子设备包括通过***总线连接的处理器和存储器。其中,该处理器用于提供计算和控制能力,支撑整个电子设备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作***和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种图片文本处理方法。内存储器为非易失性存储介质中的操作***计算机程序提供高速缓存的运行环境。该电子设备可以是手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑、穿戴式设备等任意终端设备。
本申请实施例中提供的图片文本处理装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在终端或服务器上运行。该计算机程序构成的程序模块可存储在电子设备的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。
本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行图片文本处理方法的操作。
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行图片文本处理方法。
本申请所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范 围应以所附权利要求为准。

Claims (20)

  1. 一种图片文本处理方法,其特征在于,包括:
    获取携带位置信息的用户操作指令;所述位置信息用于指示用户在图片上的操作位置;
    根据所述用户操作指令,从所述图片上识别所述位置信息对应的目标文本;
    在所述图片上叠加展示文本显示界面,并在所述文本显示界面上展示所述目标文本。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述用户操作指令,从所述图片上识别所述位置信息对应的目标文本,包括:
    根据所述用户操作指令,识别所述图片上的全部文本;
    根据所述位置信息从所述全部文本中确定所述目标文本。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述位置信息从所述全部文本中确定所述目标文本,包括:
    从所述全部文本中,确定所述位置信息指示的操作位置向前的第一标点符号,和所述操作位置向后的第二标点符号,所述第一标点符号和所述第二标点符号相邻;
    将所述第一标点符号和所述第二标点符号之间的文本确定为所述目标文本。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述操作指令,从所述图片上识别所述位置信息对应的文本,包括:
    根据所述位置信息指示的操作位置,在所述图片上确定目标区域;
    识别所述目标区域内的文本;
    根据所述位置信息,从所述目标区域内的文本中确定所述目标文本。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述位置信息,从所述目标区域内的文本中确定所述目标文本,包括:
    从所述目标区域内的文本中,确定所述位置信息指示的操作位置向前的第一标点符号,和所述操作位置向后的第二标点符号,所述第一标点符号和所述第二标点符号相邻;
    将所述第一标点符号和所述第二标点符号之间的文本确定为所述目标文本。
  6. 根据权利要求3或5所述的方法,其特征在于,所述第一标点符号为 所述操作位置向前的第一个特定标点符号,所述第二标点符号为所述操作位置向后的第一个标点符号。
  7. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    在所述图片上确定所述目标文本的首尾位置,并在所述首尾位置处分别***拖动把手;
    获取用户对所述拖动把手的拖动操作指令;
    根据所述拖动操作指令更新所述文本显示界面中展示的文本。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述拖动操作指令更新所述文本显示界面中展示的文本,包括:
    根据所述拖动操作指令确定两个所述拖动把手的位置;
    从所述图片上识别两个所述拖动把手的位置之间的本文信息,作为更新后的目标文本;
    在所述文本显示界面中展示所述更新后的目标文本。
  9. 根据权利要求1-5任一项所述的方法,其特征在于,所述文本显示界面上设置有操作控件,所述方法还包括:
    检测到所述操作控件被触发时,对所述目标文本执行所述操作控件对应的目标操作。
  10. 根据权利要求9所述的方法,其特征在于,所述操作控件为复制控件时,所述目标操作为复制操作;
    所述操作控件为分享控件时,所述目标操作为分享操作。
  11. 根据权利要求1-5任一项所述的方法,其特征在于,所述文本显示界面上设置有功能控件,所述方法还包括:
    检测到所述功能控件被触发时,设置所述目标文本的属性和/或所述文本显示界面的属性。
  12. 根据权利要求11所述的方法,其特征在于,所述目标文本的属性包括所述目标文本的字体尺寸、字体格式、字体颜色中的至少一个;
    所述文本显示界面的属性包括所述文本显示界面的背景图案、背景颜色、形状、尺寸、位置中的至少一个。
  13. 根据权利要求1-5任一项所述的方法,其特征在于,所述文本显示界面的尺寸与所述目标文本的尺寸成正比。
  14. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    接收用户输入的移动操作指令;所述移动操作指令中包括移动轨迹;
    根据所述移动轨迹移动所述文本显示界面移动。
  15. 一种图片文本处理装置,其特征在于,包括:
    获取模块,用于获取携带位置信息的用户操作指令;所述位置信息用于指示用户在图片上的操作位置;
    识别模块,用于根据所述用户操作指令,从所述图片上识别所述位置信息对应的目标文本;
    显示模块,用于在所述图片上叠加展示文本显示界面,并在所述文本显示界面上展示所述目标文本。
  16. 根据权利要求15所述的图片文本处理装置,其特征在于,
    所述识别模块,用于根据所述用户操作指令,识别图片上的全部文本;根据所述位置信息从所述全部文本中确定所述目标文本。
  17. 根据权利要求16所述的图片文本处理装置,其特征在于,
    所述识别模块,用于从所述全部文本中,确定所述位置信息指示的操作位置向前的第一标点符号,和所述操作位置向后的第二标点符号,所述第一标点符号和所述第二标点符号相邻;将所述第一标点符号和所述第二标点符号之间的文本确定为所述目标文本。
  18. 根据权利要求15所述的图片文本处理装置,其特征在于,
    所述识别模块,用于根据所述位置信息指示的操作位置,在图片上确定目标区域;识别所述目标区域内的文本;根据位置信息,从所述目标区域内的文本中确定所述目标文本。
  19. 一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至14中任一项所述的图片文本处理方法的操作。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至14中任一项所述的图片文本处理方法的操作。
PCT/CN2021/074801 2020-02-11 2021-02-02 图片文本处理方法、装置、电子设备和存储介质 WO2021159992A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21753248.0A EP4102347A4 (en) 2020-02-11 2021-02-02 IMAGE TEXT PROCESSING METHOD AND DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIA
US17/816,794 US20220366711A1 (en) 2020-02-11 2022-08-02 Method for processing text in image, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010086414.6A CN111338540B (zh) 2020-02-11 2020-02-11 图片文本处理方法、装置、电子设备和存储介质
CN202010086414.6 2020-02-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/816,794 Continuation US20220366711A1 (en) 2020-02-11 2022-08-02 Method for processing text in image, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021159992A1 true WO2021159992A1 (zh) 2021-08-19

Family

ID=71181476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074801 WO2021159992A1 (zh) 2020-02-11 2021-02-02 图片文本处理方法、装置、电子设备和存储介质

Country Status (4)

Country Link
US (1) US20220366711A1 (zh)
EP (1) EP4102347A4 (zh)
CN (1) CN111338540B (zh)
WO (1) WO2021159992A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338540B (zh) * 2020-02-11 2022-02-18 Oppo广东移动通信有限公司 图片文本处理方法、装置、电子设备和存储介质
CN112199004A (zh) * 2020-10-10 2021-01-08 Vidaa美国公司 一种用户界面的显示方法及显示设备
CN112613270B (zh) * 2020-12-22 2024-05-28 百色学院 对目标文本进行样式推荐的方法、***、设备及存储介质
CN112684970B (zh) * 2020-12-31 2022-11-29 腾讯科技(深圳)有限公司 虚拟场景的适配显示方法、装置、电子设备及存储介质
CN113157194B (zh) * 2021-03-15 2023-08-08 合肥讯飞读写科技有限公司 文本显示方法以及电子设备、存储装置
CN113138933A (zh) * 2021-05-13 2021-07-20 网易(杭州)网络有限公司 数据表的测试方法、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729897A (zh) * 2017-11-03 2018-02-23 成都野望数码科技有限公司 一种文本操作方法、装置及终端
CN109002759A (zh) * 2018-06-07 2018-12-14 Oppo广东移动通信有限公司 文本识别方法、装置、移动终端以及存储介质
CN110427139A (zh) * 2018-11-23 2019-11-08 网易(杭州)网络有限公司 文本处理方法及装置、计算机存储介质、电子设备
CN110674814A (zh) * 2019-09-25 2020-01-10 深圳传音控股股份有限公司 一种图片识别翻译方法、终端及介质
CN111338540A (zh) * 2020-02-11 2020-06-26 Oppo广东移动通信有限公司 图片文本处理方法、装置、电子设备和存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6704034B1 (en) * 2000-09-28 2004-03-09 International Business Machines Corporation Method and apparatus for providing accessibility through a context sensitive magnifying glass
US20120131520A1 (en) * 2009-05-14 2012-05-24 Tang ding-yuan Gesture-based Text Identification and Selection in Images
US20120185787A1 (en) * 2011-01-13 2012-07-19 Microsoft Corporation User interface interaction behavior based on insertion point
US8661339B2 (en) * 2011-05-31 2014-02-25 Apple Inc. Devices, methods, and graphical user interfaces for document manipulation
KR20140030361A (ko) * 2012-08-27 2014-03-12 삼성전자주식회사 휴대단말기의 문자 인식장치 및 방법
KR102068604B1 (ko) * 2012-08-28 2020-01-22 삼성전자 주식회사 휴대단말기의 문자 인식장치 및 방법
KR102145515B1 (ko) * 2013-04-24 2020-08-18 삼성전자주식회사 화면 제어 방법 및 그 전자 장치
US10078444B2 (en) * 2013-06-25 2018-09-18 Lg Electronics Inc. Mobile terminal and method for controlling mobile terminal
US10423706B2 (en) * 2014-10-31 2019-09-24 Xiaomi Inc. Method and device for selecting information
US10453353B2 (en) * 2014-12-09 2019-10-22 Full Tilt Ahead, LLC Reading comprehension apparatus
KR20170085419A (ko) * 2016-01-14 2017-07-24 삼성전자주식회사 터치 입력에 기반한 동작 방법 및 그 전자 장치
CN110659633A (zh) * 2019-08-15 2020-01-07 坎德拉(深圳)科技创新有限公司 图像文本信息的识别方法、装置以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729897A (zh) * 2017-11-03 2018-02-23 成都野望数码科技有限公司 一种文本操作方法、装置及终端
CN109002759A (zh) * 2018-06-07 2018-12-14 Oppo广东移动通信有限公司 文本识别方法、装置、移动终端以及存储介质
CN110427139A (zh) * 2018-11-23 2019-11-08 网易(杭州)网络有限公司 文本处理方法及装置、计算机存储介质、电子设备
CN110674814A (zh) * 2019-09-25 2020-01-10 深圳传音控股股份有限公司 一种图片识别翻译方法、终端及介质
CN111338540A (zh) * 2020-02-11 2020-06-26 Oppo广东移动通信有限公司 图片文本处理方法、装置、电子设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4102347A4

Also Published As

Publication number Publication date
US20220366711A1 (en) 2022-11-17
CN111338540B (zh) 2022-02-18
EP4102347A4 (en) 2023-08-02
EP4102347A1 (en) 2022-12-14
CN111338540A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021159992A1 (zh) 图片文本处理方法、装置、电子设备和存储介质
US11631050B2 (en) Syncing physical and electronic document
US20180027206A1 (en) Device and method for inputting note information into image of photographed object
US20190036855A1 (en) Method, system and apparatus for adding network comment information
US7966352B2 (en) Context harvesting from selected content
JP6013583B2 (ja) 有効インターフェース要素の強調のための方式
EP3183640B1 (en) Device and method of providing handwritten content in the same
JP5248696B1 (ja) 電子機器、手書き文書作成方法、及び手書き文書作成プログラム
US20160210040A1 (en) Actionable content displayed on a touch screen
US20150277686A1 (en) Systems and Methods for the Real-Time Modification of Videos and Images Within a Social Network Format
WO2019000681A1 (zh) 信息排版方法、装置、设备及计算机存储介质
CN107122113B (zh) 生成图片的方法及装置
US20160321238A1 (en) Electronic device, method and storage medium
WO2020114280A1 (zh) 笔记本的笔记页处理方法、计算机设备和存储介质
CN112965681A (zh) 图像处理方法、装置、设备、及存储介质
KR20160044487A (ko) E-리더 애플리케이션에서 고정 포맷 문서를 네비게이팅하는 기법
CN112269523B (zh) 对象编辑处理方法、装置及电子设备
US20170017370A1 (en) Device and method for processing data
CN116610243A (zh) 显示控制方法、装置、电子设备及存储介质
CN115437736A (zh) 一种笔记记录方法和装置
WO2024149183A1 (zh) 文档显示方法、装置及电子设备
CN114143454B (zh) 拍摄方法、装置、电子设备及可读存储介质
CN117130508A (zh) 笔记记录方法、装置、存储介质及电子设备
CN117193609A (zh) 图像处理方法、装置、计算机设备、存储介质和程序产品
CN117149019A (zh) 文件编辑方法、终端、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21753248

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021753248

Country of ref document: EP

Effective date: 20220909