[go: up one dir, main page]

WO2020082562A1 - Procédé, appareil, dispositif et support de mémoire d'identification de symbole - Google Patents

Procédé, appareil, dispositif et support de mémoire d'identification de symbole Download PDF

Info

Publication number
WO2020082562A1
WO2020082562A1 PCT/CN2018/122832 CN2018122832W WO2020082562A1 WO 2020082562 A1 WO2020082562 A1 WO 2020082562A1 CN 2018122832 W CN2018122832 W CN 2018122832W WO 2020082562 A1 WO2020082562 A1 WO 2020082562A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
character
dictionary
target
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/122832
Other languages
English (en)
Chinese (zh)
Inventor
周罡
王彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of WO2020082562A1 publication Critical patent/WO2020082562A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/196Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
    • G06V30/1983Syntactic or structural pattern recognition, e.g. symbolic string recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of text recognition technology, and in particular, to a character recognition method, device, equipment, and storage medium.
  • Optical Character Recognition is mainly through electronic equipment, such as a scanner or a digital camera, to check the characters printed on the paper, determine the shape by detecting the dark and bright patterns, and then use the character recognition method to translate the shape into computer text.
  • OCR Optical Character Recognition
  • the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by the recognition software for further editing and processing by the word processing software.
  • the recognition speed is often low.
  • the main purpose of this application is to propose a character recognition method, device, equipment and storage medium, aiming to improve the efficiency of text recognition.
  • the character recognition method includes the following steps:
  • the reference character that is not stored is filtered by a fuzzy matching algorithm to obtain a target character, and the target character is displayed.
  • the present application also proposes a character recognition device, the character recognition device includes:
  • Acquisition module used to acquire the text to be recognized
  • a calling module for calling a word segmentation tool pre-stored in the first preset area, and dividing the text to be recognized into a plurality of reference characters of a preset length by the word segmentation tool;
  • the searching module is used to obtain the reference character divided by the word segmentation tool, search for the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether the preset dictionary is stored in the preset dictionary Reference character
  • the filtering module is configured to filter the reference characters that are not stored by the fuzzy matching algorithm when the reference characters are not stored in the preset dictionary to obtain target characters and display the target characters.
  • the present application also proposes a device including: a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the computer may
  • the read instruction is configured to implement the steps of the character recognition method as described above.
  • the present application also proposes a storage medium having computer readable instructions stored on it, which when executed by the processor implements the steps of the character recognition method described above .
  • the word segmentation tool is called by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds a correspondence according to the characters of the preset length In the preset dictionary to determine whether the character is stored in the preset dictionary. If the character is not stored in the preset dictionary, it indicates that the character has a recognition abnormality. In this case, The unexisted characters are filtered out by fuzzy matching algorithm to target characters, so that the fuzzy matching algorithm realizes text recognition and improves the efficiency of text recognition.
  • FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a character recognition method of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a character recognition method of this application.
  • FIG. 4 is a schematic flowchart of a third embodiment of a character recognition method of this application.
  • FIG. 5 is a schematic diagram of function modules of the first embodiment of the character recognition device of the present application.
  • FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present application.
  • the device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display), an input unit such as a key, and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the device, and may include more or less components than shown, or combine certain components, or arrange different components.
  • the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions.
  • the network interface 1004 is mainly used to connect to an external network and perform data communication with other network devices;
  • the user interface 1003 is mainly used to connect to user devices and perform data communication with the device;
  • the device of this application passes the processor 1001 Invoke the computer readable instructions stored in the memory 1005, and execute the character recognition implementation method provided by the embodiments of the present application.
  • FIG. 2 is a schematic flowchart of a first embodiment of a character recognition method of the present application.
  • the character recognition method includes the following steps:
  • Step S10 Obtain the text to be recognized.
  • the historical recognition text is first obtained through OCR, and the historical recognition text is used as the text to be recognized.
  • the recognition document is mainly input into the computer through an input device.
  • the input device can be a scanner or other devices that can achieve the same function.
  • the inclination angle of the document is measured, the layout analysis of the document is performed, and the selected text field is analyzed.
  • Perform typesetting confirmation divide text lines in horizontal and vertical layout, realize the separation of text images in each line, and distinguish punctuation marks, etc., so as to preprocess the images, and sort out each text image after processing It is handed over to the recognition module for recognition.
  • the layout analysis is the overall analysis of the text image, which is to sort out all the text blocks in the document, distinguish the text paragraphs and the typesetting order, and the areas of the images and tables.
  • the domain boundaries of each text block including the start and end coordinates of the domain in the image, as well as the attributes within the domain, that is, horizontal and vertical layout methods and the connection relationship of each text block, are provided as a data structure to the recognition module for automatic recognition , Recognize the text area directly, perform special table analysis and recognition processing on the table area, and compress or simply store the image area.
  • Line segmentation is the process of cutting a large image into lines and then separating individual characters from the image lines.
  • the text image sorted out from the scanned text is converted into the standard code of the text by the computer.
  • feature points, projection information, and point of the text Analyze the regional distribution, etc., to provide the top10 result of each character recognized in the text, and select top1 as the basic text from the results.
  • top10 result of each character recognized in the text
  • select top1 as the basic text from the results.
  • the recognition result in "I am a person from Zhongyuan” uses the basic text as the text to be recognized for the basic text, so as to realize the initial recognition of the recognized document.
  • step S20 a word segmentation tool pre-stored in the first preset area is invoked, and the word segmentation tool is used to divide the text to be recognized into a plurality of reference characters of a preset length.
  • a word segmentation tool is provided to analyze the text to be recognized through the analysis tool.
  • the word segmentation tool may be jieba, SnowNLP, THULAC, NLPIR, or other word segmentation tools. For example, there is no restriction on this.
  • the word segmentation tool is used to divide the text to be recognized into phrases of a preset word length. For example, the word segmentation tool is used to divide "I am Chinese” into “I”, “Yes” and “Zhongyuanren” , Or “I am”, "Central Park” and "People”.
  • the preset length may be the number of words, for example, "I am” is a character with a length of 2, and "People” is a character with a length of 1, so as to achieve different rules of word segmentation and improve the word segmentation Precision.
  • the phrases with a preset length greater than 2 are listed, that is, "I am” and "China", so as to realize the analysis of the phrases, and also List the phrases that meet other rules.
  • This embodiment does not limit this.
  • the text to be recognized is divided into phrases with a length of 2, thereby improving the efficiency of text recognition.
  • Step S30 Obtain the reference character divided by the word segmentation tool, search the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether the reference character is stored in the preset dictionary .
  • the reference character is a number of phrases after word segmentation by a word segmentation tool, for example, "I am Chinese” is divided into several phrases of length 2, such as “I am”, “Zhongyuan” and " Person ”, wherein the first preset area and the second preset area are used to distinguish the storage address of the word segmentation tool from the storage address of the preset dictionary.
  • the preset dictionary is a dictionary classified according to a preset field, for example, for a dictionary with a word length of 2, a dictionary with a word length of 3, etc.
  • a dictionary with a word length of 2 for example, " “Chinese”, for a dictionary with a word length of 3, such as "Chinese”, etc., so as to classify commonly used phrases according to the length of the words, so as to realize the management of commonly used phrases.
  • the preset dictionary can be used to check whether the target phrase after word segmentation is a common phrase.
  • the phrase with a length of 2 after the word segmentation includes "I am” and "Zhongyuan”.
  • “Yes” and “Zhongyuan” look for the existence in a dictionary with a length of 2, when it does not exist, it indicates that the recognition is abnormal. For example, if the phrase "Zhongyuan” is not found, the phrase “I am” can be found, indicating that "I am” recognition is normal, "Zhongyuan” recognition is abnormal.
  • step S40 when the reference character is not stored in the preset dictionary, the reference character that is not stored is filtered by a fuzzy matching algorithm to obtain a target character, and the target character is displayed.
  • the unexisted characters are screened by a fuzzy matching algorithm, which is a BK-tree (Burkhard-Keller-tree) algorithm, proposed by Burkhard and Keller.
  • the fuzzy matching algorithm refers to Between the two strings, the minimum number of editing operations required to convert from one to the other, using the number of operations as the editing distance, the smaller the editing distance, the more similar the two strings, when the editing distance is 0 At this time, the two character strings are equal, so as to realize the character recognition.
  • the word segmentation tool is invoked by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds the corresponding characters according to the characters of the preset length A preset dictionary, to determine whether the character is stored in the preset dictionary, when the character is not stored in the preset dictionary, it indicates that the character has an abnormal recognition situation, in this case, the The unexisted characters are screened out by fuzzy matching algorithm to target characters, so as to realize text recognition by fuzzy matching algorithm and improve the efficiency of text recognition.
  • the method further includes:
  • Step S201 receiving a tool writing instruction, extracting the word segmentation tool and word segmentation writing address information in the tool writing instruction, writing the word segmentation tool into the first preset area according to the word segmentation writing address information and Save it.
  • the word segmentation tool is first written in the preset area, and after the text to be recognized is obtained, the word segmentation tool in the preset area is called to change the text to be recognized
  • the word segmentation tool may be a small program or other forms of word segmentation tools, which are not limited in this embodiment.
  • tool writing instruction may be writing operation through the writing platform interface, or writing through the data serial port, which is not limited in this embodiment.
  • step S20 includes:
  • step S202 a word segmentation tool pre-stored in the first preset area is called, and the word segmentation tool is used to compare the text to be recognized with keywords of each preset length, and each of the texts in the text to be recognized is extracted according to the comparison result.
  • the word segmentation tool may be provided with various keywords, and by comparing the text to be recognized with each keyword, to realize the recognition of each keyword in the text to be recognized, for example, the text to be recognized " "Wuhan scenery is good” uses the word segmentation tool to perform word segmentation, and can compare "Wuhan scenery is good” with each keyword to obtain the keywords "Wuhan", “Landscape” and "good”, so as to realize the text to be recognized Treatment.
  • the word segmentation tool is pre-written according to the write instruction, and the word segmentation tool is used to perform word segmentation processing on the text to be recognized, thereby achieving more detailed text recognition.
  • a third embodiment of the character recognition method of the present application is proposed based on the first embodiment or the second embodiment.
  • the description is based on the first embodiment, and before step S30, The method also includes:
  • Step S301 Receive a dictionary writing instruction, extract the preset dictionary and dictionary writing address information in the dictionary writing instruction, and write the preset dictionary into the second preset according to the dictionary writing address information region.
  • the preset dictionary needs to be written first, specifically to receive the write instruction, extract the preset dictionary in the write instruction, and save the preset dictionary in a preset area . Because the word segmentation tool was previously saved, the storage address of the word segmentation tool and the word segmentation address of the preset dictionary can be saved in different areas and labeled with different identification labels, that is, distinguished by the first preset area and the second preset area To achieve effective data management.
  • step S30 includes:
  • Step S302 Obtain the reference character divided by the word segmentation tool, and search for a corresponding storage address in a preset address relationship mapping table according to the target length of the reference character.
  • the storage address is a storage address of a preset dictionary
  • multiple dictionaries are stored in the database, such as a dictionary with a length of 2 and a dictionary with a length of 3, and other types of dictionaries are also stored.
  • the dictionary can be stored using different storage addresses, and the correspondence between the storage address and the length of the dictionary can be used to establish the preset address relationship mapping table, and the preset address relationship mapping table can be obtained by obtaining the length of characters. You can find the address of the corresponding dictionary. For example, when the reference character length is 2, the address information stored in the dictionary of length 2 is searched in the preset address relationship mapping table according to the character length 2, so as to realize the address Effective management.
  • Step S303 searching the corresponding preset dictionary in the preset area according to the storage address, and extracting the characteristic information of the reference character, comparing the characteristic information with the characteristic information of the character in the searched dictionary, and according to the comparison As a result, it is judged whether the reference character is stored in the dictionary.
  • the characteristic information may be the area distribution of the points of the reference character, the geometric distribution state of each point, or other forms of characteristic information. No restrictions.
  • step S40 includes:
  • Step S401 when the reference character is not stored in the preset dictionary, the target character whose edit distance is less than the target length corresponding to the parameter character is found in the preset dictionary through the fuzzy matching algorithm, and The target character is displayed.
  • the BK-tree algorithm is used to find words whose edit distance is not greater than the length of the word. For example, if there is no "Zhongyuan", the word whose edit distance is not greater than the length of the word from the BK-tree may be "China" , Where the edit distance is the edit distance of the character strings A to B.
  • the minimum number of steps required to change A into B For example, it takes two steps from FAME to GATE and two replacements, and three steps from GAME to ACM, including deleting G and E and adding C, and displaying the filtered "China" as the target character, so as to pass the blur
  • the matching algorithm realizes text recognition and improves the accuracy of text recognition.
  • the method further includes: establishing an initial recognition list of each initial recognition character in the text to be recognized, and the step S401 includes:
  • Step S402 when the reference character does not exist in the preset dictionary, a target character whose edit distance is less than the target length corresponding to the parameter character is found in the preset dictionary through the fuzzy matching algorithm.
  • the text image that is sorted out from the scanned text is converted into the standard code of the text by the computer.
  • the stroke, feature point, projection information of the text is analyzed to provide the top10 result of each character recognized in the text, and the top10 result of each character is established as an initial recognition list corresponding to each character.
  • Step S403 judging the number of the target characters, when the number is multiple, judging whether the target characters exist in the initial recognition list, and displaying the target characters corresponding to the characters existing in the initial recognition list .
  • the solution provided by this embodiment is added to the text recognition through the fuzzy matching algorithm for recognition, finds similar characters according to the editing distance, and uses the selected characters as the target characters, thereby improving the accuracy of text recognition.
  • the present application further provides a character recognition device.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a character recognition device of the present application.
  • the character recognition device includes:
  • the obtaining module 10 obtains the text to be recognized.
  • the historical recognition text is first obtained through OCR, and the historical recognition text is used as the text to be recognized.
  • the recognition document is mainly input into the computer through an input device.
  • the input device can be a scanner or other devices that can achieve the same function.
  • the inclination angle of the document is measured, the layout analysis of the document is performed, and the selected text field is analyzed.
  • Perform typesetting confirmation divide text lines in horizontal and vertical layout, realize the separation of text images in each line, and distinguish punctuation marks, etc., so as to preprocess the images, and sort out each text image after processing It is handed over to the recognition module for recognition.
  • the layout analysis is the overall analysis of the text image, which is to sort out all the text blocks in the document, distinguish the text paragraphs and the typesetting order, and the areas of the images and tables.
  • the domain boundaries of each text block including the start and end coordinates of the domain in the image, as well as the attributes within the domain, that is, horizontal and vertical layout methods and the connection relationship of each text block, are provided as a data structure to the recognition module for automatic recognition , Recognize the text area directly, perform special table analysis and recognition processing on the table area, and compress or simply store the image area.
  • Line segmentation is the process of cutting a large image into lines and then separating individual characters from the image lines.
  • the text image sorted out from the scanned text is converted into the standard code of the text by the computer.
  • feature points, projection information, and point of the text Analyze the regional distribution, etc., to provide the top10 result of each character recognized in the text, and select top1 as the basic text from the results.
  • top10 result of each character recognized in the text
  • select top1 as the basic text from the results.
  • the recognition result in "I am a person from Zhongyuan” uses the basic text as the text to be recognized for the basic text, so as to realize the initial recognition of the recognized document.
  • the calling module 20 is configured to call a word segmentation tool pre-stored in the first preset area, and use the word segmentation tool to divide the text to be recognized into a plurality of reference characters of a preset length.
  • a word segmentation tool is provided to analyze the text to be recognized through the analysis tool.
  • the word segmentation tool may be jieba, SnowNLP, THULAC, NLPIR, or other word segmentation tools. For example, there is no restriction on this.
  • the word segmentation tool is used to divide the text to be recognized into phrases of a preset word length. For example, the word segmentation tool is used to divide "I am Chinese” into “I”, “Yes” and “Zhongyuanren” , Or “I am”, "Central Park” and "People”.
  • the preset length may be the number of words, for example, "I am” is a character with a length of 2, and "People” is a character with a length of 1, so as to achieve different rules of word segmentation and improve the word segmentation Precision.
  • the phrases with a preset length greater than 2 are listed, that is, "I am” and "China", so as to realize the analysis of the phrases, and also List the phrases that meet other rules.
  • This embodiment does not limit this.
  • the text to be recognized is divided into phrases with a length of 2, thereby improving the efficiency of text recognition.
  • the searching module 30 is used to obtain the reference characters divided by the word segmentation tool, search for the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether there is any stored in the preset dictionary Reference character.
  • the reference character is a number of phrases after word segmentation by a word segmentation tool, for example, "I am Chinese” is divided into several phrases of length 2, such as “I am”, “Zhongyuan” and " people”.
  • the preset dictionary is a dictionary classified according to a preset field, for example, for a dictionary with a word length of 2, a dictionary with a word length of 3, etc.
  • a dictionary with a word length of 2 for example, " “Chinese”, for a dictionary with a word length of 3, such as "Chinese”, etc., so as to classify commonly used phrases according to the length of the words, so as to realize the management of commonly used phrases.
  • the preset dictionary can be used to check whether the target phrase after word segmentation is a common phrase.
  • the phrase with a length of 2 after the word segmentation includes "I am” and "Zhongyuan”.
  • “Yes” and “Zhongyuan” look for the existence in a dictionary with a length of 2, when it does not exist, it indicates that the recognition is abnormal. For example, if the phrase "Zhongyuan” is not found, the phrase “I am” can be found, indicating that "I am” recognition is normal, "Zhongyuan” recognition is abnormal.
  • the filtering module 40 is configured to filter the non-existing reference characters through the fuzzy matching algorithm to obtain the target characters and display the target characters when the reference characters are not stored in the preset dictionary.
  • the unexisted characters are screened by a fuzzy matching algorithm, which is a BK-tree (Burkhard-Keller-tree) algorithm, proposed by Burkhard and Keller.
  • the fuzzy matching algorithm refers to Between the two strings, the minimum number of editing operations required to convert from one to the other, using the number of operations as the editing distance, the smaller the editing distance, the more similar the two strings, when the editing distance is 0 At this time, the two character strings are equal, so as to realize the character recognition.
  • the word segmentation tool is invoked by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds the corresponding A preset dictionary, to determine whether the character is stored in the preset dictionary, when the character is not stored in the preset dictionary, it indicates that the character has an abnormal recognition situation, in this case, the The unexisted characters are screened out by fuzzy matching algorithm to target characters, so as to realize text recognition by fuzzy matching algorithm and improve the efficiency of text recognition.
  • the present application also proposes a device including: a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the computer may
  • the read instruction is configured to implement the steps of the character recognition method as described above.
  • an embodiment of the present application further provides a storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
  • the storage medium of the present application stores computer readable instructions, and the computer readable instructions are executed by the processor to perform the steps of the character recognition method as described above.
  • the method implemented when the computer-readable instruction is executed can refer to various embodiments of the invoicing method of this application, and details are not described herein again.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or a part that contributes to the existing technology, and the computer software product is stored in a computer-readable storage medium (such as ROM / RAM, magnetic disk, and optical disk), including several instructions to enable an intelligent terminal device (which can be a mobile phone, computer, terminal device, air conditioner, or network terminal device, etc.) to execute the method.
  • a computer-readable storage medium such as ROM / RAM, magnetic disk, and optical disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

L'invention concerne un procédé, un appareil, un dispositif et un support de mémoire d'identification de symbole basés sur le traitement de données volumineuses, le procédé comprenant les étapes consistant à : acquérir un texte à identifier (S10) ; appeler un outil de segmentation de mot pré-stocké à partir d'une première zone prédéfinie de telle sorte que l'outil de segmentation de mot divise le texte à identifier en une pluralité de symboles de référence de longueur prédéfinie (S20) ; selon une longueur cible pour les symboles de référence, rechercher un dictionnaire prédéfini correspondant dans une seconde zone prédéfinie et déterminer si les symboles de référence sont présents dans le dictionnaire prédéfini (S30) ; et lorsque les symboles de référence ne sont pas présents dans le dictionnaire prédéfini, filtrer un symbole cible à partir des symboles de référence qui ne sont pas présents au moyen d'un algorithme de correspondance floue (S40).
PCT/CN2018/122832 2018-10-25 2018-12-21 Procédé, appareil, dispositif et support de mémoire d'identification de symbole Ceased WO2020082562A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811254944.6 2018-10-25
CN201811254944.6A CN109657738B (zh) 2018-10-25 2018-10-25 字符识别方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020082562A1 true WO2020082562A1 (fr) 2020-04-30

Family

ID=66110077

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122832 Ceased WO2020082562A1 (fr) 2018-10-25 2018-12-21 Procédé, appareil, dispositif et support de mémoire d'identification de symbole

Country Status (2)

Country Link
CN (1) CN109657738B (fr)
WO (1) WO2020082562A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582169A (zh) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 图像识别数据纠错方法、装置、计算机设备和存储介质
CN111897958A (zh) * 2020-07-16 2020-11-06 邓桦 基于自然语言处理的古诗词分类方法
CN112347765A (zh) * 2020-10-10 2021-02-09 清华大学 基于词典匹配的实体标注方法、模块及装置
CN112667831A (zh) * 2020-12-25 2021-04-16 上海硬通网络科技有限公司 素材存储方法、装置及电子设备
CN113408270A (zh) * 2021-06-10 2021-09-17 广州三七极创网络科技有限公司 变体文本的识别方法、装置及电子设备
CN113420564A (zh) * 2021-06-21 2021-09-21 国网山东省电力公司物资公司 一种基于混合匹配的电力铭牌语义结构化方法及系统
CN113625884A (zh) * 2020-05-07 2021-11-09 顺丰科技有限公司 一种输入词推荐方法、装置、服务器及存储介质
CN113761913A (zh) * 2021-08-23 2021-12-07 南京优飞保科信息技术有限公司 一种话术文本的处理方法和系统
CN113988068A (zh) * 2021-12-29 2022-01-28 深圳前海硬之城信息技术有限公司 Bom文本的分词方法、装置、设备及存储介质
CN114386407A (zh) * 2021-12-23 2022-04-22 北京金堤科技有限公司 文本的分词方法及装置
CN114510935A (zh) * 2020-11-17 2022-05-17 顺丰科技有限公司 双地址文本识别方法、装置、计算机设备和存储介质
CN115935028A (zh) * 2022-12-07 2023-04-07 植恩生物技术股份有限公司 用于医药电商平台的用户识别方法及系统
CN119295020A (zh) * 2024-12-11 2025-01-10 天津博诺智创机器人技术有限公司 基于人工智能的工业互联网数据管理方法及系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022543870A (ja) * 2019-08-07 2022-10-14 ジナット テクノロジーズ インコーポレイテッド 情報追跡システムのためのデータエントリ機能
CN110633660B (zh) * 2019-08-30 2022-05-31 盈盛智创科技(广州)有限公司 一种文档识别的方法、设备和存储介质
CN110738202A (zh) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 字符识别方法、装置及计算机可读存储介质
CN111241365B (zh) * 2019-12-23 2023-06-30 望海康信(北京)科技股份公司 表格图片解析方法及系统
CN111860657B (zh) * 2020-07-23 2024-12-24 中国建设银行股份有限公司 一种图像分类方法、装置、电子设备及存储介质
CN112560791B (zh) * 2020-12-28 2022-08-09 苏州科达科技股份有限公司 识别模型的训练方法、识别方法、装置及电子设备
CN112949446B (zh) * 2021-02-25 2023-04-18 山东英信计算机技术有限公司 一种物体识别方法、装置、设备及介质
CN113743102B (zh) * 2021-08-18 2023-09-01 百度在线网络技术(北京)有限公司 识别字符的方法、装置以及电子设备
CN116521926A (zh) * 2023-05-06 2023-08-01 北京思明启创科技有限公司 一种文字库的生成方法、装置、电子设备及存储介质
CN116580402B (zh) * 2023-05-26 2024-06-25 读书郎教育科技有限公司 一种词典笔的文本识别方法及装置
CN117037192A (zh) * 2023-08-10 2023-11-10 广东电网有限责任公司 一种文件稽查方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991889A (zh) * 2015-06-26 2015-10-21 江苏科技大学 一种基于模糊分词的非多字词错误自动校对方法
CN105068994A (zh) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 一种药品信息的自然语言处理方法及系统
CN107622044A (zh) * 2016-07-13 2018-01-23 阿里巴巴集团控股有限公司 字符串的分词方法、装置及设备
CN108304484A (zh) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 关键词匹配方法及装置、电子设备和可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100476800C (zh) * 2007-06-22 2009-04-08 腾讯科技(深圳)有限公司 一种切分索引分词的方法及系统
JP5716328B2 (ja) * 2010-09-14 2015-05-13 株式会社リコー 情報処理装置、情報処理方法、および情報処理プログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991889A (zh) * 2015-06-26 2015-10-21 江苏科技大学 一种基于模糊分词的非多字词错误自动校对方法
CN105068994A (zh) * 2015-08-13 2015-11-18 易保互联医疗信息科技(北京)有限公司 一种药品信息的自然语言处理方法及系统
CN107622044A (zh) * 2016-07-13 2018-01-23 阿里巴巴集团控股有限公司 字符串的分词方法、装置及设备
CN108304484A (zh) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 关键词匹配方法及装置、电子设备和可读存储介质

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625884A (zh) * 2020-05-07 2021-11-09 顺丰科技有限公司 一种输入词推荐方法、装置、服务器及存储介质
CN111582169B (zh) * 2020-05-08 2023-10-10 腾讯科技(深圳)有限公司 图像识别数据纠错方法、装置、计算机设备和存储介质
CN111582169A (zh) * 2020-05-08 2020-08-25 腾讯科技(深圳)有限公司 图像识别数据纠错方法、装置、计算机设备和存储介质
CN111897958A (zh) * 2020-07-16 2020-11-06 邓桦 基于自然语言处理的古诗词分类方法
CN111897958B (zh) * 2020-07-16 2024-03-12 邓桦 基于自然语言处理的古诗词分类方法
CN112347765A (zh) * 2020-10-10 2021-02-09 清华大学 基于词典匹配的实体标注方法、模块及装置
CN112347765B (zh) * 2020-10-10 2022-06-07 清华大学 基于词典匹配的实体标注方法、模块及装置
CN114510935A (zh) * 2020-11-17 2022-05-17 顺丰科技有限公司 双地址文本识别方法、装置、计算机设备和存储介质
CN112667831A (zh) * 2020-12-25 2021-04-16 上海硬通网络科技有限公司 素材存储方法、装置及电子设备
CN113408270A (zh) * 2021-06-10 2021-09-17 广州三七极创网络科技有限公司 变体文本的识别方法、装置及电子设备
CN113420564B (zh) * 2021-06-21 2022-11-22 国网山东省电力公司物资公司 一种基于混合匹配的电力铭牌语义结构化方法及系统
CN113420564A (zh) * 2021-06-21 2021-09-21 国网山东省电力公司物资公司 一种基于混合匹配的电力铭牌语义结构化方法及系统
CN113761913A (zh) * 2021-08-23 2021-12-07 南京优飞保科信息技术有限公司 一种话术文本的处理方法和系统
CN113761913B (zh) * 2021-08-23 2024-02-23 南京优飞保科信息技术有限公司 一种话术文本的处理方法和系统
CN114386407A (zh) * 2021-12-23 2022-04-22 北京金堤科技有限公司 文本的分词方法及装置
CN113988068A (zh) * 2021-12-29 2022-01-28 深圳前海硬之城信息技术有限公司 Bom文本的分词方法、装置、设备及存储介质
CN115935028A (zh) * 2022-12-07 2023-04-07 植恩生物技术股份有限公司 用于医药电商平台的用户识别方法及系统
CN119295020A (zh) * 2024-12-11 2025-01-10 天津博诺智创机器人技术有限公司 基于人工智能的工业互联网数据管理方法及系统

Also Published As

Publication number Publication date
CN109657738B (zh) 2024-04-30
CN109657738A (zh) 2019-04-19

Similar Documents

Publication Publication Date Title
WO2020082562A1 (fr) Procédé, appareil, dispositif et support de mémoire d'identification de symbole
WO2020015067A1 (fr) Procédé d'acquisition de données, dispositif, équipement et support de stockage
WO2021051558A1 (fr) Procédé et appareil de questions et réponses basées sur un graphe de connaissances et support de stockage
WO2020253113A1 (fr) Procédé, dispositif et appareil d'enregistrement de factures et support de stockage informatique
WO2020233089A1 (fr) Procédé et appareil de création de jeu de test, terminal et support de stockage lisible par ordinateur
WO2020073495A1 (fr) Procédé, appareil et dispositif de réexamen basés sur l'intelligence artificielle, et support d'informations
WO2020119116A1 (fr) Procédé, appareil et dispositif de vérification d'assurance médicale basés sur l'analyse de données et support de stockage
WO2021215620A1 (fr) Dispositif et procédé pour générer automatiquement un sous-titre d'image spécifique au domaine à l'aide d'une ontologie sémantique
WO2011021907A2 (fr) Système d'ajout de métadonnées, procédé et dispositif de recherche d'image, et procédé d'ajout de geste associé
WO2020087704A1 (fr) Procédé, appareil et dispositif de gestion d'informations de crédit et support d'enregistrement
WO2020186777A1 (fr) Procédé, appareil et dispositif de récupération d'image et support de stockage lisible par ordinateur
WO2019037197A1 (fr) Procédé et dispositif d'apprentissage de classificateur de sujets, et support de stockage lisible par ordinateur
WO2021003956A1 (fr) Procédé, appareil et dispositif de gestion d'informations de produit et support d'enregistrement
WO2021012489A1 (fr) Procédé d'interrogation de journal de plateforme téléphonique, dispositif terminal, support de stockage et appareil
WO2020082766A1 (fr) Procédé et appareil d'association pour un procédé d'entrée, dispositif et support d'informations lisible
WO2010137814A2 (fr) Procédé de fourniture d'une carte de brevets par point de vue et système associé
WO2010123168A1 (fr) Procédé et système de gestion de base de données
WO2019024485A1 (fr) Procédé et dispositif de partage de données, et support de stockage lisible par ordinateur
CN118410196B (zh) 一种图纸的图签识别方法、系统及装置
WO2016099019A1 (fr) Système et procédé de classification de documents de brevet
WO2020085558A1 (fr) Appareil de traitement d'image d'analyse à grande vitesse et procédé de commande associé
WO2021051557A1 (fr) Procédé et appareil de détermination de mot-clé basé sur une reconnaissance sémantique et support de stockage
WO2018086371A1 (fr) Ordinateur portable, terminal intelligent et procédé de création d'index de contenu pour ordinateur portable
WO2014148784A1 (fr) Base de données de modèles linguistiques pour la reconnaissance linguistique, dispositif et procédé et système de reconnaissance linguistique
WO2024019226A1 (fr) Procédé de détection d'urls malveillantes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937751

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18937751

Country of ref document: EP

Kind code of ref document: A1