WO2020082562A1 - Procédé, appareil, dispositif et support de mémoire d'identification de symbole - Google Patents
Procédé, appareil, dispositif et support de mémoire d'identification de symbole Download PDFInfo
- Publication number
- WO2020082562A1 WO2020082562A1 PCT/CN2018/122832 CN2018122832W WO2020082562A1 WO 2020082562 A1 WO2020082562 A1 WO 2020082562A1 CN 2018122832 W CN2018122832 W CN 2018122832W WO 2020082562 A1 WO2020082562 A1 WO 2020082562A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- preset
- character
- dictionary
- target
- word segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/248—Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
- G06V30/1983—Syntactic or structural pattern recognition, e.g. symbolic string recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present application relates to the field of text recognition technology, and in particular, to a character recognition method, device, equipment, and storage medium.
- Optical Character Recognition is mainly through electronic equipment, such as a scanner or a digital camera, to check the characters printed on the paper, determine the shape by detecting the dark and bright patterns, and then use the character recognition method to translate the shape into computer text.
- OCR Optical Character Recognition
- the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by the recognition software for further editing and processing by the word processing software.
- the recognition speed is often low.
- the main purpose of this application is to propose a character recognition method, device, equipment and storage medium, aiming to improve the efficiency of text recognition.
- the character recognition method includes the following steps:
- the reference character that is not stored is filtered by a fuzzy matching algorithm to obtain a target character, and the target character is displayed.
- the present application also proposes a character recognition device, the character recognition device includes:
- Acquisition module used to acquire the text to be recognized
- a calling module for calling a word segmentation tool pre-stored in the first preset area, and dividing the text to be recognized into a plurality of reference characters of a preset length by the word segmentation tool;
- the searching module is used to obtain the reference character divided by the word segmentation tool, search for the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether the preset dictionary is stored in the preset dictionary Reference character
- the filtering module is configured to filter the reference characters that are not stored by the fuzzy matching algorithm when the reference characters are not stored in the preset dictionary to obtain target characters and display the target characters.
- the present application also proposes a device including: a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the computer may
- the read instruction is configured to implement the steps of the character recognition method as described above.
- the present application also proposes a storage medium having computer readable instructions stored on it, which when executed by the processor implements the steps of the character recognition method described above .
- the word segmentation tool is called by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds a correspondence according to the characters of the preset length In the preset dictionary to determine whether the character is stored in the preset dictionary. If the character is not stored in the preset dictionary, it indicates that the character has a recognition abnormality. In this case, The unexisted characters are filtered out by fuzzy matching algorithm to target characters, so that the fuzzy matching algorithm realizes text recognition and improves the efficiency of text recognition.
- FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present application
- FIG. 2 is a schematic flowchart of a first embodiment of a character recognition method of the present application
- FIG. 3 is a schematic flowchart of a second embodiment of a character recognition method of this application.
- FIG. 4 is a schematic flowchart of a third embodiment of a character recognition method of this application.
- FIG. 5 is a schematic diagram of function modules of the first embodiment of the character recognition device of the present application.
- FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present application.
- the device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
- the communication bus 1002 is used to implement connection communication between these components.
- the user interface 1003 may include a display (Display), an input unit such as a key, and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
- the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
- the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
- FIG. 1 does not constitute a limitation on the device, and may include more or less components than shown, or combine certain components, or arrange different components.
- the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions.
- the network interface 1004 is mainly used to connect to an external network and perform data communication with other network devices;
- the user interface 1003 is mainly used to connect to user devices and perform data communication with the device;
- the device of this application passes the processor 1001 Invoke the computer readable instructions stored in the memory 1005, and execute the character recognition implementation method provided by the embodiments of the present application.
- FIG. 2 is a schematic flowchart of a first embodiment of a character recognition method of the present application.
- the character recognition method includes the following steps:
- Step S10 Obtain the text to be recognized.
- the historical recognition text is first obtained through OCR, and the historical recognition text is used as the text to be recognized.
- the recognition document is mainly input into the computer through an input device.
- the input device can be a scanner or other devices that can achieve the same function.
- the inclination angle of the document is measured, the layout analysis of the document is performed, and the selected text field is analyzed.
- Perform typesetting confirmation divide text lines in horizontal and vertical layout, realize the separation of text images in each line, and distinguish punctuation marks, etc., so as to preprocess the images, and sort out each text image after processing It is handed over to the recognition module for recognition.
- the layout analysis is the overall analysis of the text image, which is to sort out all the text blocks in the document, distinguish the text paragraphs and the typesetting order, and the areas of the images and tables.
- the domain boundaries of each text block including the start and end coordinates of the domain in the image, as well as the attributes within the domain, that is, horizontal and vertical layout methods and the connection relationship of each text block, are provided as a data structure to the recognition module for automatic recognition , Recognize the text area directly, perform special table analysis and recognition processing on the table area, and compress or simply store the image area.
- Line segmentation is the process of cutting a large image into lines and then separating individual characters from the image lines.
- the text image sorted out from the scanned text is converted into the standard code of the text by the computer.
- feature points, projection information, and point of the text Analyze the regional distribution, etc., to provide the top10 result of each character recognized in the text, and select top1 as the basic text from the results.
- top10 result of each character recognized in the text
- select top1 as the basic text from the results.
- the recognition result in "I am a person from Zhongyuan” uses the basic text as the text to be recognized for the basic text, so as to realize the initial recognition of the recognized document.
- step S20 a word segmentation tool pre-stored in the first preset area is invoked, and the word segmentation tool is used to divide the text to be recognized into a plurality of reference characters of a preset length.
- a word segmentation tool is provided to analyze the text to be recognized through the analysis tool.
- the word segmentation tool may be jieba, SnowNLP, THULAC, NLPIR, or other word segmentation tools. For example, there is no restriction on this.
- the word segmentation tool is used to divide the text to be recognized into phrases of a preset word length. For example, the word segmentation tool is used to divide "I am Chinese” into “I”, “Yes” and “Zhongyuanren” , Or “I am”, "Central Park” and "People”.
- the preset length may be the number of words, for example, "I am” is a character with a length of 2, and "People” is a character with a length of 1, so as to achieve different rules of word segmentation and improve the word segmentation Precision.
- the phrases with a preset length greater than 2 are listed, that is, "I am” and "China", so as to realize the analysis of the phrases, and also List the phrases that meet other rules.
- This embodiment does not limit this.
- the text to be recognized is divided into phrases with a length of 2, thereby improving the efficiency of text recognition.
- Step S30 Obtain the reference character divided by the word segmentation tool, search the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether the reference character is stored in the preset dictionary .
- the reference character is a number of phrases after word segmentation by a word segmentation tool, for example, "I am Chinese” is divided into several phrases of length 2, such as “I am”, “Zhongyuan” and " Person ”, wherein the first preset area and the second preset area are used to distinguish the storage address of the word segmentation tool from the storage address of the preset dictionary.
- the preset dictionary is a dictionary classified according to a preset field, for example, for a dictionary with a word length of 2, a dictionary with a word length of 3, etc.
- a dictionary with a word length of 2 for example, " “Chinese”, for a dictionary with a word length of 3, such as "Chinese”, etc., so as to classify commonly used phrases according to the length of the words, so as to realize the management of commonly used phrases.
- the preset dictionary can be used to check whether the target phrase after word segmentation is a common phrase.
- the phrase with a length of 2 after the word segmentation includes "I am” and "Zhongyuan”.
- “Yes” and “Zhongyuan” look for the existence in a dictionary with a length of 2, when it does not exist, it indicates that the recognition is abnormal. For example, if the phrase "Zhongyuan” is not found, the phrase “I am” can be found, indicating that "I am” recognition is normal, "Zhongyuan” recognition is abnormal.
- step S40 when the reference character is not stored in the preset dictionary, the reference character that is not stored is filtered by a fuzzy matching algorithm to obtain a target character, and the target character is displayed.
- the unexisted characters are screened by a fuzzy matching algorithm, which is a BK-tree (Burkhard-Keller-tree) algorithm, proposed by Burkhard and Keller.
- the fuzzy matching algorithm refers to Between the two strings, the minimum number of editing operations required to convert from one to the other, using the number of operations as the editing distance, the smaller the editing distance, the more similar the two strings, when the editing distance is 0 At this time, the two character strings are equal, so as to realize the character recognition.
- the word segmentation tool is invoked by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds the corresponding characters according to the characters of the preset length A preset dictionary, to determine whether the character is stored in the preset dictionary, when the character is not stored in the preset dictionary, it indicates that the character has an abnormal recognition situation, in this case, the The unexisted characters are screened out by fuzzy matching algorithm to target characters, so as to realize text recognition by fuzzy matching algorithm and improve the efficiency of text recognition.
- the method further includes:
- Step S201 receiving a tool writing instruction, extracting the word segmentation tool and word segmentation writing address information in the tool writing instruction, writing the word segmentation tool into the first preset area according to the word segmentation writing address information and Save it.
- the word segmentation tool is first written in the preset area, and after the text to be recognized is obtained, the word segmentation tool in the preset area is called to change the text to be recognized
- the word segmentation tool may be a small program or other forms of word segmentation tools, which are not limited in this embodiment.
- tool writing instruction may be writing operation through the writing platform interface, or writing through the data serial port, which is not limited in this embodiment.
- step S20 includes:
- step S202 a word segmentation tool pre-stored in the first preset area is called, and the word segmentation tool is used to compare the text to be recognized with keywords of each preset length, and each of the texts in the text to be recognized is extracted according to the comparison result.
- the word segmentation tool may be provided with various keywords, and by comparing the text to be recognized with each keyword, to realize the recognition of each keyword in the text to be recognized, for example, the text to be recognized " "Wuhan scenery is good” uses the word segmentation tool to perform word segmentation, and can compare "Wuhan scenery is good” with each keyword to obtain the keywords "Wuhan", “Landscape” and "good”, so as to realize the text to be recognized Treatment.
- the word segmentation tool is pre-written according to the write instruction, and the word segmentation tool is used to perform word segmentation processing on the text to be recognized, thereby achieving more detailed text recognition.
- a third embodiment of the character recognition method of the present application is proposed based on the first embodiment or the second embodiment.
- the description is based on the first embodiment, and before step S30, The method also includes:
- Step S301 Receive a dictionary writing instruction, extract the preset dictionary and dictionary writing address information in the dictionary writing instruction, and write the preset dictionary into the second preset according to the dictionary writing address information region.
- the preset dictionary needs to be written first, specifically to receive the write instruction, extract the preset dictionary in the write instruction, and save the preset dictionary in a preset area . Because the word segmentation tool was previously saved, the storage address of the word segmentation tool and the word segmentation address of the preset dictionary can be saved in different areas and labeled with different identification labels, that is, distinguished by the first preset area and the second preset area To achieve effective data management.
- step S30 includes:
- Step S302 Obtain the reference character divided by the word segmentation tool, and search for a corresponding storage address in a preset address relationship mapping table according to the target length of the reference character.
- the storage address is a storage address of a preset dictionary
- multiple dictionaries are stored in the database, such as a dictionary with a length of 2 and a dictionary with a length of 3, and other types of dictionaries are also stored.
- the dictionary can be stored using different storage addresses, and the correspondence between the storage address and the length of the dictionary can be used to establish the preset address relationship mapping table, and the preset address relationship mapping table can be obtained by obtaining the length of characters. You can find the address of the corresponding dictionary. For example, when the reference character length is 2, the address information stored in the dictionary of length 2 is searched in the preset address relationship mapping table according to the character length 2, so as to realize the address Effective management.
- Step S303 searching the corresponding preset dictionary in the preset area according to the storage address, and extracting the characteristic information of the reference character, comparing the characteristic information with the characteristic information of the character in the searched dictionary, and according to the comparison As a result, it is judged whether the reference character is stored in the dictionary.
- the characteristic information may be the area distribution of the points of the reference character, the geometric distribution state of each point, or other forms of characteristic information. No restrictions.
- step S40 includes:
- Step S401 when the reference character is not stored in the preset dictionary, the target character whose edit distance is less than the target length corresponding to the parameter character is found in the preset dictionary through the fuzzy matching algorithm, and The target character is displayed.
- the BK-tree algorithm is used to find words whose edit distance is not greater than the length of the word. For example, if there is no "Zhongyuan", the word whose edit distance is not greater than the length of the word from the BK-tree may be "China" , Where the edit distance is the edit distance of the character strings A to B.
- the minimum number of steps required to change A into B For example, it takes two steps from FAME to GATE and two replacements, and three steps from GAME to ACM, including deleting G and E and adding C, and displaying the filtered "China" as the target character, so as to pass the blur
- the matching algorithm realizes text recognition and improves the accuracy of text recognition.
- the method further includes: establishing an initial recognition list of each initial recognition character in the text to be recognized, and the step S401 includes:
- Step S402 when the reference character does not exist in the preset dictionary, a target character whose edit distance is less than the target length corresponding to the parameter character is found in the preset dictionary through the fuzzy matching algorithm.
- the text image that is sorted out from the scanned text is converted into the standard code of the text by the computer.
- the stroke, feature point, projection information of the text is analyzed to provide the top10 result of each character recognized in the text, and the top10 result of each character is established as an initial recognition list corresponding to each character.
- Step S403 judging the number of the target characters, when the number is multiple, judging whether the target characters exist in the initial recognition list, and displaying the target characters corresponding to the characters existing in the initial recognition list .
- the solution provided by this embodiment is added to the text recognition through the fuzzy matching algorithm for recognition, finds similar characters according to the editing distance, and uses the selected characters as the target characters, thereby improving the accuracy of text recognition.
- the present application further provides a character recognition device.
- FIG. 5 is a schematic diagram of functional modules of a first embodiment of a character recognition device of the present application.
- the character recognition device includes:
- the obtaining module 10 obtains the text to be recognized.
- the historical recognition text is first obtained through OCR, and the historical recognition text is used as the text to be recognized.
- the recognition document is mainly input into the computer through an input device.
- the input device can be a scanner or other devices that can achieve the same function.
- the inclination angle of the document is measured, the layout analysis of the document is performed, and the selected text field is analyzed.
- Perform typesetting confirmation divide text lines in horizontal and vertical layout, realize the separation of text images in each line, and distinguish punctuation marks, etc., so as to preprocess the images, and sort out each text image after processing It is handed over to the recognition module for recognition.
- the layout analysis is the overall analysis of the text image, which is to sort out all the text blocks in the document, distinguish the text paragraphs and the typesetting order, and the areas of the images and tables.
- the domain boundaries of each text block including the start and end coordinates of the domain in the image, as well as the attributes within the domain, that is, horizontal and vertical layout methods and the connection relationship of each text block, are provided as a data structure to the recognition module for automatic recognition , Recognize the text area directly, perform special table analysis and recognition processing on the table area, and compress or simply store the image area.
- Line segmentation is the process of cutting a large image into lines and then separating individual characters from the image lines.
- the text image sorted out from the scanned text is converted into the standard code of the text by the computer.
- feature points, projection information, and point of the text Analyze the regional distribution, etc., to provide the top10 result of each character recognized in the text, and select top1 as the basic text from the results.
- top10 result of each character recognized in the text
- select top1 as the basic text from the results.
- the recognition result in "I am a person from Zhongyuan” uses the basic text as the text to be recognized for the basic text, so as to realize the initial recognition of the recognized document.
- the calling module 20 is configured to call a word segmentation tool pre-stored in the first preset area, and use the word segmentation tool to divide the text to be recognized into a plurality of reference characters of a preset length.
- a word segmentation tool is provided to analyze the text to be recognized through the analysis tool.
- the word segmentation tool may be jieba, SnowNLP, THULAC, NLPIR, or other word segmentation tools. For example, there is no restriction on this.
- the word segmentation tool is used to divide the text to be recognized into phrases of a preset word length. For example, the word segmentation tool is used to divide "I am Chinese” into “I”, “Yes” and “Zhongyuanren” , Or “I am”, "Central Park” and "People”.
- the preset length may be the number of words, for example, "I am” is a character with a length of 2, and "People” is a character with a length of 1, so as to achieve different rules of word segmentation and improve the word segmentation Precision.
- the phrases with a preset length greater than 2 are listed, that is, "I am” and "China", so as to realize the analysis of the phrases, and also List the phrases that meet other rules.
- This embodiment does not limit this.
- the text to be recognized is divided into phrases with a length of 2, thereby improving the efficiency of text recognition.
- the searching module 30 is used to obtain the reference characters divided by the word segmentation tool, search for the corresponding preset dictionary in the second preset area according to the target length of the reference character, and determine whether there is any stored in the preset dictionary Reference character.
- the reference character is a number of phrases after word segmentation by a word segmentation tool, for example, "I am Chinese” is divided into several phrases of length 2, such as “I am”, “Zhongyuan” and " people”.
- the preset dictionary is a dictionary classified according to a preset field, for example, for a dictionary with a word length of 2, a dictionary with a word length of 3, etc.
- a dictionary with a word length of 2 for example, " “Chinese”, for a dictionary with a word length of 3, such as "Chinese”, etc., so as to classify commonly used phrases according to the length of the words, so as to realize the management of commonly used phrases.
- the preset dictionary can be used to check whether the target phrase after word segmentation is a common phrase.
- the phrase with a length of 2 after the word segmentation includes "I am” and "Zhongyuan”.
- “Yes” and “Zhongyuan” look for the existence in a dictionary with a length of 2, when it does not exist, it indicates that the recognition is abnormal. For example, if the phrase "Zhongyuan” is not found, the phrase “I am” can be found, indicating that "I am” recognition is normal, "Zhongyuan” recognition is abnormal.
- the filtering module 40 is configured to filter the non-existing reference characters through the fuzzy matching algorithm to obtain the target characters and display the target characters when the reference characters are not stored in the preset dictionary.
- the unexisted characters are screened by a fuzzy matching algorithm, which is a BK-tree (Burkhard-Keller-tree) algorithm, proposed by Burkhard and Keller.
- the fuzzy matching algorithm refers to Between the two strings, the minimum number of editing operations required to convert from one to the other, using the number of operations as the editing distance, the smaller the editing distance, the more similar the two strings, when the editing distance is 0 At this time, the two character strings are equal, so as to realize the character recognition.
- the word segmentation tool is invoked by acquiring the text to be recognized, so that the word segmentation tool divides the text to be recognized into a plurality of characters of a preset length, and finds the corresponding A preset dictionary, to determine whether the character is stored in the preset dictionary, when the character is not stored in the preset dictionary, it indicates that the character has an abnormal recognition situation, in this case, the The unexisted characters are screened out by fuzzy matching algorithm to target characters, so as to realize text recognition by fuzzy matching algorithm and improve the efficiency of text recognition.
- the present application also proposes a device including: a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the computer may
- the read instruction is configured to implement the steps of the character recognition method as described above.
- an embodiment of the present application further provides a storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
- the storage medium of the present application stores computer readable instructions, and the computer readable instructions are executed by the processor to perform the steps of the character recognition method as described above.
- the method implemented when the computer-readable instruction is executed can refer to various embodiments of the invoicing method of this application, and details are not described herein again.
- the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or a part that contributes to the existing technology, and the computer software product is stored in a computer-readable storage medium (such as ROM / RAM, magnetic disk, and optical disk), including several instructions to enable an intelligent terminal device (which can be a mobile phone, computer, terminal device, air conditioner, or network terminal device, etc.) to execute the method.
- a computer-readable storage medium such as ROM / RAM, magnetic disk, and optical disk
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
L'invention concerne un procédé, un appareil, un dispositif et un support de mémoire d'identification de symbole basés sur le traitement de données volumineuses, le procédé comprenant les étapes consistant à : acquérir un texte à identifier (S10) ; appeler un outil de segmentation de mot pré-stocké à partir d'une première zone prédéfinie de telle sorte que l'outil de segmentation de mot divise le texte à identifier en une pluralité de symboles de référence de longueur prédéfinie (S20) ; selon une longueur cible pour les symboles de référence, rechercher un dictionnaire prédéfini correspondant dans une seconde zone prédéfinie et déterminer si les symboles de référence sont présents dans le dictionnaire prédéfini (S30) ; et lorsque les symboles de référence ne sont pas présents dans le dictionnaire prédéfini, filtrer un symbole cible à partir des symboles de référence qui ne sont pas présents au moyen d'un algorithme de correspondance floue (S40).
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811254944.6 | 2018-10-25 | ||
| CN201811254944.6A CN109657738B (zh) | 2018-10-25 | 2018-10-25 | 字符识别方法、装置、设备及存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020082562A1 true WO2020082562A1 (fr) | 2020-04-30 |
Family
ID=66110077
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/122832 Ceased WO2020082562A1 (fr) | 2018-10-25 | 2018-12-21 | Procédé, appareil, dispositif et support de mémoire d'identification de symbole |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN109657738B (fr) |
| WO (1) | WO2020082562A1 (fr) |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111582169A (zh) * | 2020-05-08 | 2020-08-25 | 腾讯科技(深圳)有限公司 | 图像识别数据纠错方法、装置、计算机设备和存储介质 |
| CN111897958A (zh) * | 2020-07-16 | 2020-11-06 | 邓桦 | 基于自然语言处理的古诗词分类方法 |
| CN112347765A (zh) * | 2020-10-10 | 2021-02-09 | 清华大学 | 基于词典匹配的实体标注方法、模块及装置 |
| CN112667831A (zh) * | 2020-12-25 | 2021-04-16 | 上海硬通网络科技有限公司 | 素材存储方法、装置及电子设备 |
| CN113408270A (zh) * | 2021-06-10 | 2021-09-17 | 广州三七极创网络科技有限公司 | 变体文本的识别方法、装置及电子设备 |
| CN113420564A (zh) * | 2021-06-21 | 2021-09-21 | 国网山东省电力公司物资公司 | 一种基于混合匹配的电力铭牌语义结构化方法及系统 |
| CN113625884A (zh) * | 2020-05-07 | 2021-11-09 | 顺丰科技有限公司 | 一种输入词推荐方法、装置、服务器及存储介质 |
| CN113761913A (zh) * | 2021-08-23 | 2021-12-07 | 南京优飞保科信息技术有限公司 | 一种话术文本的处理方法和系统 |
| CN113988068A (zh) * | 2021-12-29 | 2022-01-28 | 深圳前海硬之城信息技术有限公司 | Bom文本的分词方法、装置、设备及存储介质 |
| CN114386407A (zh) * | 2021-12-23 | 2022-04-22 | 北京金堤科技有限公司 | 文本的分词方法及装置 |
| CN114510935A (zh) * | 2020-11-17 | 2022-05-17 | 顺丰科技有限公司 | 双地址文本识别方法、装置、计算机设备和存储介质 |
| CN115935028A (zh) * | 2022-12-07 | 2023-04-07 | 植恩生物技术股份有限公司 | 用于医药电商平台的用户识别方法及系统 |
| CN119295020A (zh) * | 2024-12-11 | 2025-01-10 | 天津博诺智创机器人技术有限公司 | 基于人工智能的工业互联网数据管理方法及系统 |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2022543870A (ja) * | 2019-08-07 | 2022-10-14 | ジナット テクノロジーズ インコーポレイテッド | 情報追跡システムのためのデータエントリ機能 |
| CN110633660B (zh) * | 2019-08-30 | 2022-05-31 | 盈盛智创科技(广州)有限公司 | 一种文档识别的方法、设备和存储介质 |
| CN110738202A (zh) * | 2019-09-06 | 2020-01-31 | 平安科技(深圳)有限公司 | 字符识别方法、装置及计算机可读存储介质 |
| CN111241365B (zh) * | 2019-12-23 | 2023-06-30 | 望海康信(北京)科技股份公司 | 表格图片解析方法及系统 |
| CN111860657B (zh) * | 2020-07-23 | 2024-12-24 | 中国建设银行股份有限公司 | 一种图像分类方法、装置、电子设备及存储介质 |
| CN112560791B (zh) * | 2020-12-28 | 2022-08-09 | 苏州科达科技股份有限公司 | 识别模型的训练方法、识别方法、装置及电子设备 |
| CN112949446B (zh) * | 2021-02-25 | 2023-04-18 | 山东英信计算机技术有限公司 | 一种物体识别方法、装置、设备及介质 |
| CN113743102B (zh) * | 2021-08-18 | 2023-09-01 | 百度在线网络技术(北京)有限公司 | 识别字符的方法、装置以及电子设备 |
| CN116521926A (zh) * | 2023-05-06 | 2023-08-01 | 北京思明启创科技有限公司 | 一种文字库的生成方法、装置、电子设备及存储介质 |
| CN116580402B (zh) * | 2023-05-26 | 2024-06-25 | 读书郎教育科技有限公司 | 一种词典笔的文本识别方法及装置 |
| CN117037192A (zh) * | 2023-08-10 | 2023-11-10 | 广东电网有限责任公司 | 一种文件稽查方法、装置、设备及介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
| CN105068994A (zh) * | 2015-08-13 | 2015-11-18 | 易保互联医疗信息科技(北京)有限公司 | 一种药品信息的自然语言处理方法及系统 |
| CN107622044A (zh) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | 字符串的分词方法、装置及设备 |
| CN108304484A (zh) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | 关键词匹配方法及装置、电子设备和可读存储介质 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100476800C (zh) * | 2007-06-22 | 2009-04-08 | 腾讯科技(深圳)有限公司 | 一种切分索引分词的方法及系统 |
| JP5716328B2 (ja) * | 2010-09-14 | 2015-05-13 | 株式会社リコー | 情報処理装置、情報処理方法、および情報処理プログラム |
-
2018
- 2018-10-25 CN CN201811254944.6A patent/CN109657738B/zh active Active
- 2018-12-21 WO PCT/CN2018/122832 patent/WO2020082562A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
| CN105068994A (zh) * | 2015-08-13 | 2015-11-18 | 易保互联医疗信息科技(北京)有限公司 | 一种药品信息的自然语言处理方法及系统 |
| CN107622044A (zh) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | 字符串的分词方法、装置及设备 |
| CN108304484A (zh) * | 2017-12-29 | 2018-07-20 | 北京城市网邻信息技术有限公司 | 关键词匹配方法及装置、电子设备和可读存储介质 |
Cited By (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113625884A (zh) * | 2020-05-07 | 2021-11-09 | 顺丰科技有限公司 | 一种输入词推荐方法、装置、服务器及存储介质 |
| CN111582169B (zh) * | 2020-05-08 | 2023-10-10 | 腾讯科技(深圳)有限公司 | 图像识别数据纠错方法、装置、计算机设备和存储介质 |
| CN111582169A (zh) * | 2020-05-08 | 2020-08-25 | 腾讯科技(深圳)有限公司 | 图像识别数据纠错方法、装置、计算机设备和存储介质 |
| CN111897958A (zh) * | 2020-07-16 | 2020-11-06 | 邓桦 | 基于自然语言处理的古诗词分类方法 |
| CN111897958B (zh) * | 2020-07-16 | 2024-03-12 | 邓桦 | 基于自然语言处理的古诗词分类方法 |
| CN112347765A (zh) * | 2020-10-10 | 2021-02-09 | 清华大学 | 基于词典匹配的实体标注方法、模块及装置 |
| CN112347765B (zh) * | 2020-10-10 | 2022-06-07 | 清华大学 | 基于词典匹配的实体标注方法、模块及装置 |
| CN114510935A (zh) * | 2020-11-17 | 2022-05-17 | 顺丰科技有限公司 | 双地址文本识别方法、装置、计算机设备和存储介质 |
| CN112667831A (zh) * | 2020-12-25 | 2021-04-16 | 上海硬通网络科技有限公司 | 素材存储方法、装置及电子设备 |
| CN113408270A (zh) * | 2021-06-10 | 2021-09-17 | 广州三七极创网络科技有限公司 | 变体文本的识别方法、装置及电子设备 |
| CN113420564B (zh) * | 2021-06-21 | 2022-11-22 | 国网山东省电力公司物资公司 | 一种基于混合匹配的电力铭牌语义结构化方法及系统 |
| CN113420564A (zh) * | 2021-06-21 | 2021-09-21 | 国网山东省电力公司物资公司 | 一种基于混合匹配的电力铭牌语义结构化方法及系统 |
| CN113761913A (zh) * | 2021-08-23 | 2021-12-07 | 南京优飞保科信息技术有限公司 | 一种话术文本的处理方法和系统 |
| CN113761913B (zh) * | 2021-08-23 | 2024-02-23 | 南京优飞保科信息技术有限公司 | 一种话术文本的处理方法和系统 |
| CN114386407A (zh) * | 2021-12-23 | 2022-04-22 | 北京金堤科技有限公司 | 文本的分词方法及装置 |
| CN113988068A (zh) * | 2021-12-29 | 2022-01-28 | 深圳前海硬之城信息技术有限公司 | Bom文本的分词方法、装置、设备及存储介质 |
| CN115935028A (zh) * | 2022-12-07 | 2023-04-07 | 植恩生物技术股份有限公司 | 用于医药电商平台的用户识别方法及系统 |
| CN119295020A (zh) * | 2024-12-11 | 2025-01-10 | 天津博诺智创机器人技术有限公司 | 基于人工智能的工业互联网数据管理方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109657738B (zh) | 2024-04-30 |
| CN109657738A (zh) | 2019-04-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020082562A1 (fr) | Procédé, appareil, dispositif et support de mémoire d'identification de symbole | |
| WO2020015067A1 (fr) | Procédé d'acquisition de données, dispositif, équipement et support de stockage | |
| WO2021051558A1 (fr) | Procédé et appareil de questions et réponses basées sur un graphe de connaissances et support de stockage | |
| WO2020253113A1 (fr) | Procédé, dispositif et appareil d'enregistrement de factures et support de stockage informatique | |
| WO2020233089A1 (fr) | Procédé et appareil de création de jeu de test, terminal et support de stockage lisible par ordinateur | |
| WO2020073495A1 (fr) | Procédé, appareil et dispositif de réexamen basés sur l'intelligence artificielle, et support d'informations | |
| WO2020119116A1 (fr) | Procédé, appareil et dispositif de vérification d'assurance médicale basés sur l'analyse de données et support de stockage | |
| WO2021215620A1 (fr) | Dispositif et procédé pour générer automatiquement un sous-titre d'image spécifique au domaine à l'aide d'une ontologie sémantique | |
| WO2011021907A2 (fr) | Système d'ajout de métadonnées, procédé et dispositif de recherche d'image, et procédé d'ajout de geste associé | |
| WO2020087704A1 (fr) | Procédé, appareil et dispositif de gestion d'informations de crédit et support d'enregistrement | |
| WO2020186777A1 (fr) | Procédé, appareil et dispositif de récupération d'image et support de stockage lisible par ordinateur | |
| WO2019037197A1 (fr) | Procédé et dispositif d'apprentissage de classificateur de sujets, et support de stockage lisible par ordinateur | |
| WO2021003956A1 (fr) | Procédé, appareil et dispositif de gestion d'informations de produit et support d'enregistrement | |
| WO2021012489A1 (fr) | Procédé d'interrogation de journal de plateforme téléphonique, dispositif terminal, support de stockage et appareil | |
| WO2020082766A1 (fr) | Procédé et appareil d'association pour un procédé d'entrée, dispositif et support d'informations lisible | |
| WO2010137814A2 (fr) | Procédé de fourniture d'une carte de brevets par point de vue et système associé | |
| WO2010123168A1 (fr) | Procédé et système de gestion de base de données | |
| WO2019024485A1 (fr) | Procédé et dispositif de partage de données, et support de stockage lisible par ordinateur | |
| CN118410196B (zh) | 一种图纸的图签识别方法、系统及装置 | |
| WO2016099019A1 (fr) | Système et procédé de classification de documents de brevet | |
| WO2020085558A1 (fr) | Appareil de traitement d'image d'analyse à grande vitesse et procédé de commande associé | |
| WO2021051557A1 (fr) | Procédé et appareil de détermination de mot-clé basé sur une reconnaissance sémantique et support de stockage | |
| WO2018086371A1 (fr) | Ordinateur portable, terminal intelligent et procédé de création d'index de contenu pour ordinateur portable | |
| WO2014148784A1 (fr) | Base de données de modèles linguistiques pour la reconnaissance linguistique, dispositif et procédé et système de reconnaissance linguistique | |
| WO2024019226A1 (fr) | Procédé de détection d'urls malveillantes |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18937751 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18937751 Country of ref document: EP Kind code of ref document: A1 |