HK1220276A1 - Managing real - time handwriting recognition - Google Patents
Managing real - time handwriting recognition Download PDFInfo
- Publication number
- HK1220276A1 HK1220276A1 HK16108185.0A HK16108185A HK1220276A1 HK 1220276 A1 HK1220276 A1 HK 1220276A1 HK 16108185 A HK16108185 A HK 16108185A HK 1220276 A1 HK1220276 A1 HK 1220276A1
- Authority
- HK
- Hong Kong
- Prior art keywords
- handwriting
- recognition
- input
- user
- handwritten
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/22—Character recognition characterised by the type of writing
- G06V30/226—Character recognition characterised by the type of writing of cursive writing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/22—Character recognition characterised by the type of writing
- G06V30/226—Character recognition characterised by the type of writing of cursive writing
- G06V30/2264—Character recognition characterised by the type of writing of cursive writing using word shape
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
- G06V30/387—Matching; Classification using human interaction, e.g. selection of the best displayed recognition candidate
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/293—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of characters other than Kanji, Hiragana or Katakana
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Character Discrimination (AREA)
- User Interface Of Digital Computer (AREA)
- Document Processing Apparatus (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition for multi-character handwriting input. In particular, real-time, stroke-order and stroke-direction independent handwriting recognition is provided for multi-character, or sentence level Chinese handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed.
Description
Technical Field
This description relates to providing handwriting input functionality on computing devices, and more particularly to providing real-time, multi-script, stroke-order independent handwriting recognition and input functionality on computing devices.
Background
Handwriting input methods are an important alternative input method for computing devices equipped with touch-sensitive surfaces (e.g., touch-sensitive display screens or touch pads). Many users, especially in some asian or arabic countries/regions, are accustomed to writing in a cursive style and may feel comfortable writing in ordinary writing as compared to typing on a keyboard.
For some logographic writing systems, such as chinese characters or japanese chinese characters (also known as chinese characters), while alternative syllable entry methods (e.g., pinyin or kana) may be used to enter characters corresponding to the logographic writing system, such syllable entry methods are inadequate when the user does not know how to spell the logographic characters in speech and uses the logographic characters for incorrect phonetic spelling. Thus, being able to use handwriting input on a computing device becomes critical for users who do not spell words of the relevant logographic writing system well or at all.
Although handwriting input functionality has become popular in some areas of the world, improvements are still needed. In particular, human handwriting fonts are highly disparate (e.g., in stroke order, size, writing style, etc.), and high quality handwriting recognition software is complex and requires extensive training. As such, it has become a challenge to provide efficient real-time handwriting recognition on mobile devices with limited memory and computing resources.
Furthermore, in today's multi-cultural world, users in many countries are in multiple languages and may frequently need to write more than one word (e.g., write a message in chinese to mention the name of an english movie). However, manually switching the recognition system to the desired word or language during writing is cumbersome and inefficient. Furthermore, the utility of conventional multi-word handwriting recognition techniques is severely limited because increasing the recognition capabilities of the device to process multiple words simultaneously greatly increases the complexity of the recognition system and the demand on computer resources.
Furthermore, conventional handwriting techniques rely heavily on language or text specific specificity to achieve recognition accuracy. Such distinctiveness is not easily transferable to other languages or words. Thus, adding handwriting input capabilities to new languages or text is a difficult task that is not easily accepted by software and device vendors. Thus, users of many languages lack important alternative input methods for their electronic devices.
A conventional user interface for providing handwriting input includes an area for accepting the handwriting input from a user and an area for displaying a handwriting recognition result. On portable devices with small form factors, there remains a need for significant improvements in user interfaces to improve efficiency, accuracy, and user experience overall.
Disclosure of Invention
This specification describes techniques for providing multi-word handwriting recognition using a universal recognizer. The generic recognizer is trained using a large multi-script corpus of writing samples for characters in different languages and scripts. The training of the universal recognizer is language independent, text independent, stroke order independent, and stroke direction independent. Thus, the same recognizer is able to recognize mixed language, mixed text handwriting input without the need to manually switch between input languages during use. In addition, the universal recognizer is light enough to be used as a stand-alone module on a mobile device, thereby enabling handwriting input in different languages and words used in different regions of the world.
Furthermore, because the universal recognizer is trained on spatially derived features that are independent of stroke order and independent of stroke direction and do not require temporal or order information at the stroke level, the universal recognizer provides a number of additional features and advantages over conventional time-based recognition methods, such as those based on Hidden Markov Methods (HMMs). For example, the user is allowed to enter one or more of the strokes of characters, phrases, and sentences in any order, and still obtain the same recognition results. Thus, it is now possible to make unordered multi-character inputs and corrections (e.g., additions or rewrites) to previously entered characters.
Further, the generic recognizer is used for real-time handwriting recognition, where time information for each stroke is available, and optionally for disambiguating or segmenting the handwritten input prior to character recognition being performed by the generic recognizer. The stroke order independent real-time recognition described herein is different from conventional offline recognition methods (e.g., Optical Character Recognition (OCR)) and may provide better performance than conventional offline recognition methods. Moreover, the generic recognizer described herein is capable of handling high variations in individual writing habits (e.g., variations in speed, cadence, stroke order, stroke direction, stroke continuity, etc.) without explicitly embedding distinguishing features of different variations (e.g., variations in speed, cadence, stroke order, stroke direction, stroke continuity, etc.) in the recognition system, thereby reducing the overall complexity of the recognition system.
As described herein, in some embodiments, time-derived stroke distribution information is optionally reintroduced into the generic recognizer to enhance recognition accuracy and to disambiguate between similar appearing recognition outputs for the same input image. Reintroducing the time-derived stroke distribution information does not disrupt stroke order and stroke direction independent of the generic recognizer, because the time-derived features and the space-derived features are obtained through separate training processes and are only combined in the handwriting recognition model after the separate training is completed. Additionally, the time-derived stroke distribution information is carefully designed to capture the discriminative temporal nature of the similar looking characters, without relying on explicit knowledge of differences in stroke order of the similar looking characters.
A user interface for providing handwriting input functionality is also described herein.
In some embodiments, a method of providing multi-word handwriting recognition includes: training a multi-script handwriting recognition model based on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts; and providing real-time handwriting recognition for the user's handwriting input using a multi-script handwriting recognition model that has been trained against spatially-derived features of a multi-script training corpus.
In some embodiments, a method of providing multi-word handwriting recognition includes: receiving a multi-script handwriting recognition model, the multi-script recognition model having been trained on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts; receiving a handwritten input from a user, the handwritten input including one or more handwritten strokes provided on a touch-sensitive surface coupled to a user device; and in response to receiving the handwriting input, providing one or more handwriting recognition results to the user in real-time based on the multi-script handwriting recognition model that has been trained for the spatially-derived features of the multi-script training corpus.
In some embodiments, a method of providing real-time handwriting recognition includes: receiving a plurality of handwritten strokes from a user, the plurality of handwritten strokes corresponding to a handwritten character; generating an input image based on the plurality of handwritten strokes; providing the input image to a handwriting recognition model to perform real-time recognition of the handwritten character, wherein the handwriting recognition model provides stroke-order independent handwriting recognition; and displaying the same first output character in real-time when the plurality of handwritten strokes is received, regardless of the respective order in which the plurality of handwritten strokes has been received from the user.
In some embodiments, the method further comprises: receiving a second plurality of handwritten strokes from the user, the second plurality of handwritten strokes corresponding to a second handwritten character; generating a second input image based on the second plurality of handwritten strokes; providing the second input image to the handwriting recognition model to perform real-time recognition of the second handwritten character; and displaying in real-time a second output character corresponding to the second plurality of handwritten strokes when the second plurality of handwritten strokes is received, wherein the first output character and the second output character are simultaneously displayed in a spatial sequence regardless of a respective order of the first plurality of handwritten inputs and the second plurality of handwritten inputs that have been provided by the user.
In some embodiments, wherein the second plurality of handwritten strokes is spatially subsequent to the first plurality of handwritten strokes along a default writing direction of a handwriting input interface of the user device, and the second output character is subsequent to the first output character in the spatial sequence along the default writing direction, and the method further comprises: receiving a third handwritten stroke from the user to modify the handwritten character, the third handwritten stroke being temporarily received after the first plurality of handwritten strokes and the second plurality of handwritten strokes; assigning the handwritten stroke to the same recognition unit as the first plurality of handwritten strokes based on a relative proximity of the third handwritten stroke to the first plurality of handwritten strokes in response to receiving the third handwritten stroke; generating a corrected input image based on the first plurality of handwritten strokes and the third handwritten stroke; providing the corrected input image to a handwriting recognition model to perform real-time recognition of the corrected handwritten character; and displaying a third output character with the revised input image in response to receiving the third handwriting input, wherein the third output character replaces the first output character and is displayed concurrently with the second output character in the spatial sequence along the default writing direction.
In some embodiments, the method further comprises: receiving a deletion input from the user while simultaneously displaying the third output character and the second output character as recognition results in the candidate display area of the handwriting input interface; and deleting the second output character from the recognition result while maintaining the third output character in the recognition result in response to a deletion input.
In some embodiments, the first plurality of handwritten strokes, the second plurality of handwritten strokes, and the third handwritten stroke are rendered in real-time in a handwriting input area of the handwriting input interface as each of the handwritten strokes is provided by the user; and in response to receiving the deletion input, deleting the respective renderings of the second plurality of handwritten strokes from the handwriting input area while maintaining the respective renderings of the first and third handwritten strokes in the handwriting input area.
In some embodiments, a method of providing real-time handwriting recognition includes: receiving a handwriting input from a user, the handwriting input comprising one or more handwritten strokes provided in a handwriting input area of a handwriting input interface; identifying a plurality of output characters for the handwriting input based on the handwriting recognition model; classifying the plurality of output characters into two or more categories based on predetermined classification criteria; displaying a respective output character of a first category of the two or more categories in an initial view of a candidate display area of the handwriting input interface, wherein the initial view of the candidate display area is provided concurrently with an affordance for invoking an expanded view of the candidate display area; receiving a user input for selecting an affordance for invoking the expanded view; and in response to the user input, displaying, in the expanded view of the candidate display area, respective output characters of a first category and respective output characters of at least a second category of the two or more categories that were not previously displayed in the initial view of the candidate display area.
In some embodiments, a method of providing real-time handwriting recognition includes: receiving a handwriting input from a user, the handwriting input comprising a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface; identifying a plurality of output characters from the handwriting input based on a handwriting recognition model, the plurality of output characters including at least a first emoji character and at least a first character from a script of a natural human language; and displaying a recognition result including the first emoticon character and a first character from the text of the natural human language in a candidate display area of the handwriting input interface.
In some embodiments, a method of providing handwriting recognition comprises: receiving a handwritten input from a user, the handwritten input including a plurality of handwritten strokes provided in a touch-sensitive surface coupled to a device; rendering the plurality of handwritten strokes in real time in a handwriting input area of a handwriting input interface; receiving one of a pinch gesture input and an expand gesture input over a plurality of handwritten strokes; generating a first recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as a single recognition unit when the pinch gesture input is received; generating, when an expand gesture input is received, a second recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as two independent recognition units pulled apart by the expand gesture input; and when a corresponding one of the first recognition result and the second recognition result is generated, displaying the generated recognition result in a candidate display area of the handwriting input interface.
In some embodiments, a method of providing handwriting recognition comprises: receiving a handwriting input from a user, the handwriting input comprising a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface; identifying a plurality of recognition units from the plurality of handwritten strokes, each recognition unit including a respective subset of the plurality of handwritten strokes; generating a multi-character recognition result including respective characters recognized from a plurality of recognition units; displaying a multi-character recognition result in a candidate display area of a handwriting input interface; receiving a deletion input from a user while displaying the multi-character recognition result in the candidate display area; and removing an end character from the multi-character recognition result displayed in the candidate display area in response to receiving the deletion input.
In some embodiments, a method of providing real-time handwriting recognition includes: determining an orientation of the device; providing a handwriting input interface on the device in a horizontal input mode according to the device being in a first orientation, wherein a respective line of handwriting input entered in the horizontal input mode is segmented into one or more respective recognition units along a horizontal writing direction; and providing a handwriting input interface on the device in a vertical input mode in accordance with the device being in the second orientation, wherein a respective line of handwriting input entered in the vertical input mode is divided into one or more respective recognition units along the vertical writing direction.
In some embodiments, a method of providing real-time handwriting recognition includes: receiving a handwritten input from a user, the handwritten input including a plurality of handwritten strokes provided on a touch-sensitive surface coupled to a device; rendering a plurality of handwritten strokes in a handwriting input area of a handwriting input interface; segmenting the plurality of handwritten strokes into two or more recognition units, each recognition unit including a respective subset of the plurality of handwritten strokes; receiving an edit request from a user; visually distinguishing two or more recognition units in the handwriting input area in response to the edit request; and means are provided for deleting each of the two or more recognition units independently from the handwriting input area.
In some embodiments, a method of providing real-time handwriting recognition includes: receiving a first handwritten input from a user, the first handwritten input including a plurality of handwritten strokes, and the plurality of handwritten strokes forming a plurality of recognition units distributed along respective writing directions associated with a handwriting input area of a handwriting input interface; rendering each of the plurality of handwritten strokes in the handwriting input area when the handwritten stroke is provided by the user; after the recognition unit is fully rendered, starting a respective fade-out process for each recognition unit of the plurality of recognition units, wherein the rendering of the recognition unit in the first handwriting input gradually fades out during the respective fade-out process; receiving, from the user, a second handwriting input over an area of the handwriting input area occupied by the faded-out recognition unit of the plurality of recognition units; and in response to receiving the second handwriting input: rendering a second handwriting input in the handwriting input area; and removing all faded recognition units from the handwriting input area.
In some embodiments, a method of providing handwriting recognition comprises: independently training a set of spatially-derived features and a set of temporally-derived features of a handwriting recognition model, wherein: training a set of spatially-derived features against a corpus of training images, each image in the corpus of training images being an image of a handwritten sample for a respective character in an output character set, and training a set of temporally-derived features against a corpus of stroke distribution profiles, each stroke distribution profile numerically characterizing a spatial distribution of a plurality of strokes in the handwritten sample for the respective character in the output character set; and combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model; and using the handwriting recognition model to provide real-time handwriting recognition for the user's handwriting input.
The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
Drawings
FIG. 1 is a block diagram illustrating a portable multifunction device with a touch-sensitive display in accordance with some embodiments.
FIG. 2 illustrates a portable multifunction device with a touch-sensitive display in accordance with some embodiments.
FIG. 3 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with some embodiments.
FIG. 4 illustrates an exemplary user interface for a multifunction device with a touch-sensitive surface separate from a display in accordance with some embodiments.
FIG. 5 is a block diagram illustrating an operating environment of a handwriting input system, according to some embodiments.
FIG. 6 is a block diagram of a multi-script handwriting recognition model according to some embodiments.
FIG. 7 is a flow diagram of an exemplary process for training a multi-script handwriting recognition model, according to some embodiments.
Figures 8A-8B illustrate exemplary user interfaces displaying real-time multi-script handwriting recognition and input on a portable multifunction device according to some embodiments.
9A-9B are flow diagrams of exemplary processes for providing real-time multi-script handwriting recognition and input on a portable multifunction device.
10A-10C are flowcharts of an exemplary process for providing real-time stroke order independent handwriting recognition and input on a portable multifunction device according to some embodiments.
11A-11K illustrate exemplary user interfaces for selectively displaying recognition results of one category in a normal view of a candidate display area and recognition results of other categories in an expanded view of the candidate display area, according to some embodiments.
12A-12B are flow diagrams of exemplary processes for selectively displaying recognition results of one category in a normal view of a candidate display area and selectively displaying recognition results of other categories in an expanded view of the candidate display area, according to some embodiments.
13A-13E illustrate exemplary user interfaces for entering emoji characters by handwriting input, according to some embodiments.
Fig. 14 is a flow diagram of an exemplary process for entering emoji characters by handwriting input, in accordance with some embodiments.
15A-15K illustrate exemplary user interfaces for using a pinch gesture or a spread gesture to inform a handwriting input module how to separate a current accumulation of handwriting input into one or more recognition units, according to some embodiments.
16A-16B are flowcharts of an exemplary process for using a pinch gesture or an expand gesture to inform a handwriting input module how to separate a current accumulation of handwriting input into one or more recognition units, according to some embodiments.
17A-17H illustrate exemplary user interfaces for providing character-by-character deletion for a user's handwritten input, in accordance with some embodiments.
18A-18B are flow diagrams of exemplary processes for providing character-by-character deletion for a user's handwritten input, according to some embodiments.
19A-19F illustrate exemplary user interfaces for switching between a vertical writing mode and a horizontal writing mode, according to some embodiments.
20A-20C illustrate a flow diagram of an exemplary process for switching between a vertical writing mode and a horizontal writing mode, according to some embodiments.
21A-21H illustrate user interfaces for providing an apparatus for displaying and selectively deleting individual recognition units recognized in a user's handwriting input, according to some embodiments.
22A-22B are flowcharts of an exemplary process for providing an apparatus for displaying and selectively deleting individual recognition units recognized in a user's handwriting input, according to some embodiments.
23A-23L illustrate an exemplary user interface for utilizing new handwriting input provided over existing handwriting input in a handwriting input area as a cued confirmation input for entering recognition results displayed for the existing handwriting input, according to some embodiments.
24A-24B are flowcharts of an exemplary process for utilizing new handwriting input provided over existing handwriting input in a handwriting input area as a cued confirmation input for entering recognition results displayed for the existing handwriting input, according to some embodiments.
25A-25B are flowcharts of exemplary processes for integrating temporally-derived stroke distribution information into a handwriting recognition model based on spatially-derived features without disrupting stroke order and stroke direction independence of the handwriting recognition model, according to some embodiments.
FIG. 26 is a block diagram illustrating training independently and subsequently integrating spatially-derived features and temporally-derived features of an exemplary handwriting recognition system, in accordance with some embodiments.
FIG. 27 is a block diagram illustrating an exemplary method for calculating a stroke distribution profile for a character.
Like reference numerals refer to corresponding parts throughout the drawings.
Detailed Description
Many electronic devices have graphical user interfaces with soft keyboards for character entry. On some electronic devices, a user may also be able to install or enable a handwriting input interface that allows the user to enter characters by handwriting on a touch-sensitive display screen or touch-sensitive surface coupled to the device. Conventional handwriting recognition input methods and user interfaces have several problems and disadvantages. For example,
typically, conventional handwriting input functionality is enabled language-by-language or text-by-text. Each additional input language requires the installation of a separate handwriting recognition model that occupies separate memory space and memory. Little synergy is provided by combining handwriting recognition models for different languages, and hybrid language or hybrid text handwriting recognition typically takes a long time due to a complex disambiguation process.
Furthermore, because conventional handwriting recognition systems rely heavily on language-specific or text-specific characteristics for character recognition. The accuracy of recognizing mixed-language handwriting input is poor. Furthermore, the available combinations of recognized languages are very limited. Most systems require the user to manually specify the desired language-specific handwriting recognizer before providing handwriting input in each non-default language or text.
Many existing real-time handwriting recognition models require temporal or sequential information on a stroke-by-stroke level, which can produce inaccurate recognition results when dealing with the high variability of how a character can be written (e.g., due to writing style and personal habits, there is a high variability in the shape, length, rhythm, segmentation, order, and direction of strokes). Some systems also require the user to adhere to strict spatial and temporal criteria when providing handwritten input (e.g., with built-in assumptions on the size, order, and time frame of each character input). Any deviation from these criteria results in inaccurate recognition results that are difficult to correct.
Currently, most real-time handwriting input interfaces only allow a user to enter a few characters at a time. The input of long phrases or sentences is broken down into short sentence fragments and input independently. Such unnatural input not only imposes a cognitive burden on the user to keep writing smooth, but also makes it difficult for the user to correct or correct characters or phrases that were input earlier.
The embodiments described below address these and related problems.
Fig. 1-4 below provide a description of exemplary devices. Fig. 5, 6, and 26-27 illustrate exemplary handwriting recognition and input systems. Fig. 8A-8B, 11A-11K, 13A-13E, 15A-15K, 17A-17H, 19A-19F, 21A-21H, and 23A-12L illustrate exemplary user interfaces for handwriting recognition and input. FIGS. 7, 9A-9B, 10A-10C, 12A-12B, 14, 16A-16B, 18A-18B, 20A-20C, 22A-22B, 24A-24B, and 25 are flowcharts illustrating a method of implementing handwriting recognition and input on a user device that includes training a handwriting recognition model, providing real-time handwriting recognition results, providing means for inputting and modifying handwriting input, and providing means for inputting recognition results as text input. The user interfaces in fig. 8A-8B, 11A-11K, 13A-13E, 15A-15K, 17A-17H, 19A-19F, 21A-21H, 23A-12L are for illustrating the processes in fig. 7, 9A-9B, L0A-L0C, 12A-12B, 14, 16A-16B, 18A-18B, 20A-20C, 22A-22B, 24A-24B, and 25.
Exemplary device
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail as not to unnecessarily obscure aspects of the embodiments.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact may be termed a second contact, and similarly, a second contact may be termed a first contact, without departing from the scope of the present invention. The first contact and the second contact are both contacts, but they are not the same contact.
The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term "if" may be interpreted to mean "when … …" (when "or" upon ") or" in response to a determination "or" in response to a detection ", depending on the context. Similarly, depending on the context, the phrase "if it is determined" or "if [ stated condition or event ] is detected" may be interpreted to mean "when determining … …" or "in response to determining" or "when [ stated condition or event ] is detected" or "in response to detecting [ stated condition or event ]".
Embodiments of electronic devices, user interfaces for such devices, and associated processes for using such devices are described. In some embodiments, the device is a portable communication device such as a mobile phone that also contains other functionality such as PDA and/or music player functionality. Exemplary embodiments of the portable multifunction device include, but are not limited to, those from AOf ppleInc. (Cupertino, California)iPodAndan apparatus. Other portable electronic devices may also be used, such as a laptop or tablet computer with a touch-sensitive surface (e.g., a touch screen display and/or a touchpad). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer with a touch-sensitive surface (e.g., a touch screen display and/or a touchpad).
In the following discussion, an electronic device including a display and a touch-sensitive surface is described. However, it should be understood that the electronic device may include one or more other physical user interface devices, such as a physical keyboard, mouse, and/or joystick.
The device typically supports various applications, such as one or more of the following: a mapping application, a rendering application, a word processing application, a website creation application, a disc editing application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, a fitness support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications executable on the device may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the device may be adjusted and/or varied from one application to the next and/or within the corresponding application. In this way, a common physical architecture of the device (such as a touch-sensitive surface) can support various applications with a user interface that is intuitive and clear to the user.
Attention is now directed to embodiments of portable devices having touch sensitive displays. FIG. 1 is a block diagram illustrating a portable multifunction device 100 with a touch-sensitive display 112 in accordance with some embodiments. Touch-sensitive display 112 is sometimes referred to as a "touch screen" for convenience, and may also be referred to as or as a touch-sensitive display system. Device 100 may include memory 102 (which may include one or more computer-readable storage media), a memory controller 122, one or more processing units (CPUs) 120, a peripheral interface 118, RF circuitry 108, audio circuitry 110, a speaker 111, a microphone 113, an input/output (I/O) subsystem 106, other input or control devices 116, and an external port 124. The device 100 may include one or more optical sensors 164. These components may communicate over one or more communication buses or signal lines 103.
It should be understood that device 100 is only one example of a portable multifunction device and that device 100 may have more or fewer components than shown, may combine two or more components, or may have a different configuration or arrangement of components. The various components shown in fig. 1, which may be implemented in hardware, software, or a combination of hardware and software, include one or more signal processing circuits and/or application specific integrated circuits.
The memory 102 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to the memory 102 by other components of the device 100, such as the CPU120 and the peripheral interface 118, may be controlled by a memory controller 122.
Peripheral interface 118 may be used to couple the input and output peripherals of the device to CPU120 and memory 102. The one or more processors 120 run or execute various software programs and/or sets of instructions stored in the memory 102 to perform various functions for the device 100 and to process data.
In some embodiments, peripherals interface 118, CPU120, and memory controller 122 may be implemented on a single chip, such as chip 104. In other embodiments, they may be implemented on separate chips.
RF (radio frequency) circuitry 108 receives and transmits RF signals, also called electromagnetic signals. The RF circuitry 108 converts electrical signals to/from electromagnetic signals and communicates with communication networks and other communication devices via electromagnetic signals.
Audio circuitry 110, speaker 111, and microphone 113 provide an audio interface between a user and device 100. The audio circuitry 110 receives audio data from the peripheral interface 118, converts the audio data to electrical signals, and transmits the electrical signals to the speaker 111. The speaker 111 converts the electrical signals into sound waves audible to the human ear. The audio circuit 110 also receives electrical signals converted by the microphone 113 from sound waves. The audio circuit 110 converts the electrical signals to audio data and transmits the audio data to the peripheral interface 118 for processing. Audio data may be retrieved from and/or transmitted to memory 102 and/or RF circuitry 108 by peripheral interface 118. In some embodiments, the audio circuit 110 also includes a headset jack (e.g., 212 in fig. 2).
The I/O subsystem 106 couples input/output peripheral devices on the device 100, such as a touch screen 112 and other input control devices 116, to a peripheral interface 118. The I/O subsystem 106 may include a display controller 156 and one or more input controllers 160 for other input or control devices. The one or more input controllers 160 receive/transmit electrical signals from/to other input or control devices 116. Other input control devices 116 may include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels, and so forth. In some alternative embodiments, the one or more input controllers 160 may or may not be coupled to any of the following: a keyboard, an infrared port, a USB port, and a pointing device such as a mouse. The one or more buttons (e.g., 208 in fig. 2) may include an up/down button for volume control of the speaker 111 and/or microphone 113. The one or more buttons may include a push button (e.g., 206 in fig. 2).
Touch-sensitive display 112 provides an input interface and an output interface between the device and a user. Display controller 156 receives electrical signals from touch screen 112 and/or transmits electrical signals to touch screen 112. Touch screen 112 displays visual output to a user. The visual output may include graphics, text, icons, video, and any combination thereof (collectively "graphics"). In some embodiments, some or all of the visual output may correspond to a user interface object.
Touch screen 112 has a touch-sensitive surface, sensor or group of sensors for accepting input from a user based on tactile sensation and/or tactile contact. Touch screen 112 and display controller 156 (along with any associated modules and/or sets of instructions in memory 102) detect contact (and any movement or breaking of the contact) on touch screen 112 and convert the detected contact into interaction with user interface objects (e.g., one or more soft keys, icons, web pages, or images) displayed on touch screen 112. In one exemplary embodiment, the point of contact between touch screen 112 and the user corresponds to a finger of the user.
The touch screen 112 may use LCD (liquid crystal display) technology, LPD (light emitting polymer display) technology, or LED (light emitting diode) technology, although other display technologies may be used in other embodiments. Touch screen 112 and display controller 156 may detect contact and any movement or breaking thereof using any of a variety of touch sensing technologies now known or later developed, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch screen 112. In one exemplary embodiment, projected mutual capacitance sensing is used Techniques, such as those from apple inc (Cupertino, California)iPodAndthose techniques were found.
The touch screen 112 may have a video resolution in excess of 100 dpi. In some embodiments, the touch screen has a video resolution of about 160 dpi. The user may make contact with touch screen 112 using any suitable object or appendage, such as a stylus, a finger, and so forth. In some embodiments, the user interface is designed to work primarily with finger-based contacts and gestures, which may not be as accurate as stylus-based input due to the large contact area of the finger on the touch screen. In some embodiments, the device translates the rough finger-based input into a precise pointer/cursor position or command for performing the action desired by the user. Handwritten input may be provided on touch screen 112 via the position and motion of finger-based contacts or stylus-based contacts. In some embodiments, touch screen 112 renders finger-based input or stylus-based input as immediate visual feedback of current handwriting input and provides a visual effect of actual writing on a writing surface (e.g., a piece of paper) with a writing implement (e.g., a pen).
In some embodiments, in addition to a touch screen, device 100 may include a touch pad (not shown) for activating or deactivating particular functions. In some embodiments, the touchpad is a touch-sensitive area of the device that, unlike a touch screen, does not display visual output. The touchpad may be a touch-sensitive surface that is separate from touch screen 112 or an extension of the touch-sensitive surface formed by the touch screen.
The device 100 also includes a power system 162 for powering the various components. The power system 162 may include a power management system, one or more power sources (e.g., battery, Alternating Current (AC)), a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator (e.g., a Light Emitting Diode (LED)), and any other components associated with the generation, management, and distribution of power in a portable device.
The device 100 may also include one or more optical sensors 164. FIG. 1 shows an optical sensor coupled to an optical sensor controller 158 in the I/O subsystem 106. The optical sensor 164 may include a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The optical sensor 164 receives light from the environment projected through one or more lenses and converts the light into data representing an image. In conjunction with imaging module 143 (also referred to as a camera module), optical sensor 164 may capture still images or video.
The device 100 may also include one or more proximity sensors 166. Fig. 1 shows a proximity sensor 166 coupled to the peripheral interface 118. Alternatively, the proximity sensor 166 may be coupled to the input controller 160 in the I/O subsystem 106. In some embodiments, the proximity sensor turns off and disables the touch screen 112 when the multifunction device is placed near the user's ear (e.g., when the user is making a phone call).
Device 100 may also include one or more accelerometers 168. Fig. 1 shows accelerometer 168 coupled to peripheral interface 118. Alternatively, accelerometer 168 may be coupled to input controller 160 in I/O subsystem 106. In some embodiments, the information is displayed in a portrait view or a landscape view on the touch screen display based on an analysis of the data received from the one or more accelerometers. Device 100 optionally includes a magnetometer (not shown) and a GPS (or GLONASS or other global navigation system) receiver (not shown) in addition to the one or more accelerometers 168 for obtaining information about the position and orientation (e.g., portrait or landscape) of device 100.
In some embodiments, the software components stored in memory 102 include an operating system 126, a communication module (or set of instructions) 128, a contact/motion module (or set of instructions) 130, a graphics module (or set of instructions) 132, a text input module (or set of instructions) 134, a Global Positioning System (GPS) module (or set of instructions) 135, and an application program (or set of instructions) 136. Further, in some embodiments, memory 102 stores handwriting input module 157, as shown in fig. 1 and 3. Handwriting input module 157 includes a handwriting recognition model and provides handwriting recognition and input functionality to a user of device 100 (or device 300). Further details of handwriting input module 157 are provided with respect to fig. 5-27 and the accompanying description.
An operating system 126 (e.g., Darwin, RTXC, LINUX, UNIX, OSX, WINDOWS, or embedded operating systems such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components.
Communications module 128 facilitates communications with other devices through one or more external ports 124 and also includes various software components for processing data received by RF circuitry 108 and/or external ports 124. The external port 124 (e.g., Universal Serial Bus (USB), firewire, etc.) is adapted to couple directly to other devices or indirectly through a network (e.g., the internet, wireless LAN, etc.).
The contact/motion module 130 may detect contact with the touch screen 112 (in conjunction with the display controller 156) and other touch sensitive devices (e.g., a touchpad or a physical click wheel). The contact/motion module 130 includes a number of software components for performing various operations related to the detection of contact, such as determining whether contact has occurred (e.g., detecting a finger-down event), determining whether there is movement of the contact and tracking the movement across the touch-sensitive surface (e.g., detecting one or more finger-dragging events), and determining whether contact has terminated (e.g., detecting a finger-up event or a break in contact). The contact/motion module 130 receives contact data from the touch-sensitive surface. Determining movement of the point of contact may include determining velocity (magnitude), velocity (magnitude and direction), and/or acceleration (change in magnitude and/or direction) of the point of contact, the movement of the point of contact being represented by a series of contact data. These operations may be applied to single point contacts (e.g., one finger contact) or multiple point simultaneous contacts (e.g., "multi-touch"/multiple finger contacts). In some embodiments, the contact/motion module 130 and the display controller 156 detect contact on the touch panel.
The contact/motion module 130 may detect gesture input by the user. Different gestures on the touch-sensitive surface have different contact patterns. Thus, gestures may be detected by detecting specific contact patterns. For example, detecting a finger tap gesture includes detecting a finger-down event, and then detecting a finger-up (lift-off) event at the same location (or substantially the same location) as the finger-down event (e.g., at an icon location). As another example, detecting a finger swipe gesture on the touch-sensitive surface includes detecting a finger-down event, then detecting one or more finger-dragging events, and then subsequently detecting a finger-up (lift-off) event.
Contact/motion module 130 is optionally used by handwriting input module 157 to align the input of handwritten strokes within a handwriting input area of a handwriting input interface displayed on touch-sensitive display screen 112 (or within an area of touchpad 355 corresponding to the handwriting input area displayed on display 340 in fig. 3). In some embodiments, the position, motion path, and intensity associated with the initial finger down event, the final finger up event, the contact during any time between the two are recorded as handwritten strokes. Based on such information, the handwritten stroke may be rendered on a display as feedback to the user input. Further, one or more input images may be generated based on the handwritten strokes aligned by the contact/motion module 130.
Graphics module 132 includes various known software components for rendering and displaying graphics on touch screen 112 or other display, including components for changing the intensity of the graphics being displayed. As used herein, the term "graphic" includes any object that may be displayed to a user, including without limitation text, web pages, icons (such as user interface objects including soft keys), digital images, videos, animations and the like.
In some embodiments, the graphics module 132 stores data representing graphics to be used. Each graphic may be assigned a corresponding code. The graphics module 132 receives one or more codes specifying graphics to be displayed, if necessary together with coordinate data and other graphics attribute data, from an application program or the like, and then generates screen image data to output to the display controller 156.
Text input module 134, which may be a component of graphics module 132, provides a soft keyboard for entering text in various applications (e.g., contacts 137, email 140, IM141, browser 147, and any other application that requires text input). In some embodiments, handwriting input module 157 is optionally invoked via a user interface of text input module 134, for example, via a keyboard selecting an affordance. In some embodiments, the same or similar keyboard selection affordance is also provided in the handwriting input interface to invoke the text input module 134.
The GPS module 135 determines the location of the device and provides this information for use in various applications (e.g., to the phone 138 for location-based dialing, to the camera 143 as picture/video metadata, and to applications for providing location-based services such as weather desktop applets, local yellow pages desktop applets, and map/navigation desktop applets).
The application programs 136 may include the following modules (or sets of instructions), or a subset or superset thereof: a contacts module 137 (sometimes referred to as an address book or contact list); a telephone module 138; a video conferencing module 139; an email client module 140; an Instant Messaging (IM) module 141; a fitness support module 142; a camera module 143 for still images and/or video images; an image management module 144; a browser module 147; a calendar module 148; a desktop applet module 149, which may include one or more of the following: a weather desktop applet 149-1, a stock market desktop applet 149-2, a calculator desktop applet 149-3, an alarm desktop applet 149-4, a dictionary desktop applet 149-5 and other desktop applets acquired by the user, and a user created desktop applet 149-6; a desktop applet creator module 150 for making a user-created desktop applet 149-6; a search module 151; a video and music player module 152 that may be comprised of a video player module and a music player module; a notepad module 153; a map module 154; and/or an online video module 155.
Examples of other applications 136 that may be stored in memory 102 include other word processing applications, other image editing applications, drawing applications, rendering applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.
In conjunction with touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, contacts module 137 may be used to manage an address book or contact list (e.g., stored in memory 102 or in an application internal state 192 of contacts module 137 in memory 370), including: adding one or more names to an address book; deleting one or more names from the address book; associating one or more telephone numbers, one or more email addresses, one or more physical addresses, or other information with a name; associating the image with a name; classifying and ordering names; providing a telephone number or email address to initiate and/or facilitate communications through telephone 138, video conference 139, email 140, or IM 141; and so on.
In conjunction with RF circuitry 108, audio circuitry 110, speaker 111, microphone 113, touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, telephone module 138 may be used to enter a sequence of characters corresponding to a telephone number, access one or more telephone numbers in address book 137, modify a telephone number that has been entered, dial a corresponding telephone number, conduct a call, and disconnect or hang up when the call is completed. As described above, wireless communication may use any of a number of communication standards, protocols, and technologies.
In conjunction with RF circuitry 108, audio circuitry 110, speaker 111, microphone 113, touch screen 112, display controller 156, optical sensor 164, optical sensor controller 158, contact module 130, graphics module 132, handwriting input module 157, text input module 134, contact list 137, and telephone module 138, video conference module 139 includes executable instructions for initiating, conducting, and terminating a video conference between a user and one or more other participants according to user instructions.
In conjunction with RF circuitry 108, touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, email client module 140 includes executable instructions for creating, sending, receiving, and managing emails in response to user instructions. In conjunction with the image management module 144, the email client module 140 makes it very easy to create and send an email with a still image or a video image captured by the camera module 143.
In conjunction with RF circuitry 108, touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, instant message module 141 includes executable instructions for entering a sequence of characters corresponding to an instant message, modifying previously entered characters, transmitting a corresponding instant message (e.g., using a Short Message Service (SMS) or Multimedia Message Service (MMS) protocol for a phone-based instant message or using XMPP, SIMPLE, or IMPS-for an internet-based instant message), receiving an instant message, and viewing the received instant message. In some embodiments, the transmitted and/or received instant messages may include graphics, photos, audio files, video files, and/or MMS and/or other attachments supported in an Enhanced Messaging Service (EMS). As used herein, "instant message" refers to both telephony-based messages (e.g., messages sent using SMS or MMS) and internet-based messages (e.g., messages sent using XMPP, SIMPLE, or IMPS).
In conjunction with RF circuitry 108, touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, GPS module 135, map module 154, and music player module 146, fitness support module 142 includes executable instructions for: creating a fitness plan (e.g., with time, distance, and/or calorie burning goals); communicating with fitness sensors (sports equipment); receiving fitness sensor data; calibrating a sensor for monitoring fitness; selecting and playing music for fitness; and displaying, storing and transmitting fitness data.
In conjunction with touch screen 112, display controller 156, one or more optical sensors 164, optical sensor controller 158, contact module 130, graphics module 132, and image management module 144, camera module 143 includes executable instructions for: capturing still images or video (including video streams) and storing them in memory 102; modifying a characteristic of the still image or video; or delete still images or video from memory 102.
In conjunction with touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, and camera module 143, image management module 144 includes executable instructions for arranging, modifying (e.g., editing), or otherwise manipulating, marking, deleting, presenting (e.g., in a digital slide or album), and storing still images and/or video images.
In conjunction with RF circuitry 108, touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, browser module 147 includes executable instructions for browsing the internet (including searching, linking to, receiving, and displaying web pages or portions thereof, and attachments and other files linked to web pages) according to user instructions.
In conjunction with RF circuitry 108, touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, email client module 140, and browser module 147, calendar module 148 includes executable instructions for creating, displaying, modifying, and storing a calendar and data associated with the calendar (e.g., calendar entries, to-do, etc.) according to user instructions.
In conjunction with RF circuitry 108, touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, and browser module 147, the desktop applet module 149 is a mini-application (e.g., weather desktop applet 149-1, stock market desktop applet 149-2, calculator desktop applet 149-3, alarm clock desktop applet 149-4, and dictionary desktop applet 149-5) or a mini-application created by a user (e.g., user created desktop applet 149-6) that may be downloaded and used by the user. In some embodiments, the desktop applet includes an HTML (HyperText markup language) file, a CSS (cascading Style sheet) file, and a JavaScript file. In some embodiments, the desktop applet includes an XML (extensible markup language) file and a JavaScript file (e.g., Yahoo! desktop applet).
In conjunction with RF circuitry 108, touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, and browser module 147, the desktop applet creator module 150 may be used by a user to create a desktop applet (e.g., to transfer a user-specified portion of a web page into the desktop applet).
In conjunction with touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, search module 151 includes executable instructions for searching for text, music, sound, images, video, and/or other files in memory 102 that match one or more search criteria (e.g., one or more user-specified search terms) in accordance with user instructions.
In conjunction with touch screen 112, display system controller 156, contact module 130, graphics module 132, audio circuitry 110, speakers 111, RF circuitry 108, and browser module 147, video and music player module 152 includes executable instructions that allow a user to download and playback recorded music and other sound files stored in one or more file formats, such as MP3 or AAC files, as well as executable instructions for displaying, rendering, or otherwise playing back video (e.g., on touch screen 112 or on an external display connected via external port 124). In some embodiments, the device 100 may include the functionality of an MP3 player, such as an iPod (trademark of appleinc).
In conjunction with touch screen 112, display controller 156, contact module 130, graphics module 132, handwriting input module 157, and text input module 134, notepad module 153 includes executable instructions for creating and managing notepads, backlogs, and the like according to user instructions.
In conjunction with RF circuitry 108, touch screen 112, display system controller 156, contact module 130, graphics module 132, handwriting input module 157, text input module 134, GPS module 135, and browser module 147, map module 154 may be used to receive, display, modify, and store maps and data associated with maps (e.g., driving routes; data about shops or other points of interest at or near a particular location; and other location-based data) according to user instructions.
In conjunction with touch screen 112, display system controller 156, contact module 130, graphics module 132, audio circuit 110, speaker 111, RF circuit 108, handwriting input module 157, text input module 134, email client module 140, and browser module 147, online video module 155 includes instructions that allow a user to access, browse, receive (e.g., through streaming media and/or download), playback (e.g., on the touch screen or an external display connected via external port 124), send emails with links to particular online videos, and otherwise manage online videos in one or more file formats, such as h.264. In some embodiments, the instant message module 141, rather than the email client module 140, is used to send a link to a particular online video.
Each of the above identified modules and applications corresponds to a set of executable instructions for performing one or more of the functions described above as well as the methods described in this patent application (e.g., the computer-implemented methods and other information processing methods described herein). These modules (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. In some embodiments, memory 102 may store a subset of the modules and data structures identified above. In addition, memory 102 may store additional modules and data structures not described above.
In some embodiments, device 100 is a device in which the operation of a predefined set of functions on the device is performed exclusively through a touch screen and/or a touchpad. By using a touch screen and/or touch pad as the primary input control device for operation of the device 100, the number of physical input control devices (such as push buttons, dials, etc.) on the device 100 may be reduced.
FIG. 2 illustrates a portable multifunction device 100 with a touch screen 112 in accordance with some embodiments. The touch screen may display one or more graphics within the User Interface (UI) 200. In this embodiment, as well as other embodiments described below, a user may select one or more of these graphics by, for example, gesturing graphically with one or more fingers 202 (not drawn to scale in the figures) or with one or more styluses 203 (not drawn to scale in the figures). In some embodiments, the selection of one or more graphics occurs when the user breaks contact with the one or more graphics. In some embodiments, the gesture may include one or more taps, one or more swipes (left to right, right to left, up, and/or down), and/or a rolling of a finger (right to left, left to right, up, and/or down) that has made contact with device 100. In some embodiments, inadvertent contact with a graphic does not select the graphic. For example, when the gesture corresponding to the selection is a tap, a swipe gesture that swipes over an application icon does not select the corresponding application.
Device 100 may also include one or more physical buttons, such as a "home" button or menu button 204. As previously described, the menu button 204 may be used to navigate to any application 136 in a set of applications that may be executed on the device 100. Alternatively, in some embodiments, the menu buttons are implemented as soft keys in a GUI displayed on touch screen 112.
In one embodiment, device 100 includes touch screen 112, menu buttons 204, push buttons 206 for powering the device on and off and locking the device, one or more volume adjustment buttons 208, a Subscriber Identity Module (SIM) card slot 210, a headset jack 212, and docking/charging external port 124. Pressing the button 206 may be used to turn the device on/off by pressing the button and holding the button in a pressed state for a predefined period of time; locking the device by pressing the button and releasing the button before a predefined period of time has elapsed; and/or unlock the device or initiate an unlocking process. In an alternative embodiment, device 100 may also accept verbal input through microphone 113 for activating or deactivating certain functions.
FIG. 3 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with some embodiments. The device 300 need not be portable. In some embodiments, the device 300 is a laptop computer, desktop computer, tablet computer, multimedia player device, navigation device, educational device (such as a child learning toy), gaming system, telephony device, or control device (e.g., a home or industrial controller). Device 300 typically includes one or more processing units (CPUs) 310, one or more network or other communication interfaces 360, memory 370, and one or more communication buses 320 for interconnecting these components. The communication bus 320 may include circuitry (sometimes referred to as a chipset) that interconnects and controls communication between system components. Device 300 includes an input/output (I/O) interface 330 with a display 340, typically a touch screen display. The I/O interface 330 may also include a keyboard and/or mouse (or other pointing device) 350 and a touchpad 355. Memory 370 comprises high speed random access memory such as DRAM, SRAM, ddr ram or other random access solid state memory devices; and may include non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. Optionally, memory 370 may include one or more storage devices remotely located from one or more CPUs 310. In some embodiments, memory 370 stores programs, modules, and data structures similar to those stored in memory 102 of portable multifunction device 100 (FIG. 1), or a subset thereof. In addition, memory 370 may store additional programs, modules, and data structures not present in memory 102 of portable multifunction device 100. For example, memory 370 of device 300 may store drawing module 380, presentation module 382, word processing module 384, website creation module 386, disk editing module 388, and/or spreadsheet module 390, while memory 102 of portable multifunction device 100 (FIG. 1) may not store these modules.
Each of the above identified elements in fig. 3 may be stored in one or more of the aforementioned memory devices. Each of the identified modules corresponds to a set of instructions for performing the function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. In some embodiments, memory 370 may store a subset of the modules and data structures identified above. In addition, memory 370 may store additional modules and data structures not described above.
Fig. 4 illustrates an exemplary user interface on a device (e.g., device 300 in fig. 3) having a touch-sensitive surface 451 (e.g., tablet or touchpad 355 in fig. 3) separate from a display 450 (e.g., touchscreen display 112). Although many of the examples that follow will be given with reference to input on touch screen display 112 (where the touch-sensitive surface and the display are merged), in some embodiments the device detects input on a touch-sensitive surface that is separate from the display, as shown in fig. 4. In some embodiments, the touch-sensitive surface (e.g., 451 in FIG. 4) has a primary axis (e.g., 452 in FIG. 4) that corresponds to a primary axis (e.g., 453 in FIG. 4) on the display (e.g., 450). In accordance with these embodiments, the device detects contacts (e.g., 460 and 462 in fig. 4) with the touch-sensitive surface 451 at locations that correspond to respective locations on the display (e.g., in fig. 4, 460 corresponds to 468 and 462 corresponds to 470). As such, when the touch-sensitive surface (e.g., 451 in fig. 4) is separated from the display (450 in fig. 4) of the multifunction device, user inputs (e.g., contacts 460 and 462, and their movements) detected by the device on the touch-sensitive surface are used by the device to manipulate the user interface on the display. It should be understood that similar methods may be used for the other user interfaces described herein.
Attention is now directed to embodiments of handwriting input methods and user interfaces ("UIs") that may be implemented on a multifunction device (e.g., device 100).
FIG. 5 is a block diagram illustrating an exemplary handwriting input module 157, according to some embodiments, the exemplary handwriting input module 157 interacting with an I/O interface module 500 (e.g., I/O interface 330 of FIG. 3 or I/O subsystem 106 of FIG. 1) to provide handwriting input capabilities on a device. As shown in fig. 5, handwriting input module 157 includes input processing module 502, handwriting recognition module 504, and result generation module 506. In some embodiments, the input processing module 502 includes a segmentation module 508 and a normalization module 510. In some embodiments, the result generation module 506 includes a radical clustering module 512 and one or more language models 514.
In some embodiments, input processing module 502 communicates with I/O interface module 500 (e.g., I/O interface 330 in fig. 3 or I/O subsystem 106 in fig. 1) to receive handwritten input from a user. Handwriting is input via any suitable means, such as touch-sensitive display system 112 in FIG. 1 and/or touch pad 355 in FIG. 3. The handwriting input includes data representing each stroke provided by the user within a predetermined handwriting input area within the handwriting input UI. In some embodiments, the data representing each stroke of the handwritten input includes data such as: a start and end position, an intensity profile, and a motion path of a maintained contact (e.g., a contact between a user's finger or a stylus and a touch-sensitive surface of the device) within the handwriting input area. In some embodiments, the I/O interface module 500 transmits the sequence of handwritten strokes 516 associated with temporal information and spatial information to the input processing module 502 in real-time. At the same time, the I/O interface module also provides a real-time rendering 518 of the handwritten stroke within the handwriting input area of the handwriting input user interface as visual feedback to the user input.
In some embodiments, time information and sequence information associated with a plurality of consecutive strokes is also recorded as data representing each handwritten stroke is received by input processing module 502. For example, the data optionally includes a stack showing the shape, size, spatial saturation of individual strokes with corresponding stroke numbers, and relative spatial positions of the strokes along the direction of writing of the entire handwritten input, and so forth. In some embodiments, input processing module 502 provides instructions back to I/O interface module 500 to render the received strokes on display 518 of the device (e.g., display 340 in fig. 3 or touch-sensitive display 112 in fig. 1). In some embodiments, the received strokes are rendered as an animation to provide a visual effect that mimics the actual process of writing with a writing instrument (e.g., a pen) on a writing surface (e.g., a piece of paper). In some embodiments, the user is optionally allowed to specify a nib style, color, texture, etc. of the rendered strokes.
In some embodiments, input processing module 502 processes strokes currently accumulated in the handwriting input area to assign strokes to one or more recognition units. In some embodiments, each recognition unit corresponds to a character to be recognized by handwriting recognition model 504. In some embodiments, each recognition unit corresponds to an output character or radical to be recognized by handwriting recognition model 504. The radical is a recurring component found in multiple synthetic logographic characters. The synthetic logographic characters may include two or more radicals arranged according to a common layout (e.g., a left-right layout, an up-down layout, etc.). In one example, a single Chinese character "hear" is constructed using two radicals, a left radical "kou" and a right radical "jin".
In some embodiments, the input processing module 502 assigns or divides the currently accumulated handwritten stroke into one or more recognition units depending on the segmentation module. For example, when dividing strokes for a handwritten character "listen" the segmentation module 508 optionally assigns the strokes of the left cluster of the handwritten input to one recognition unit (i.e., for the left radical "mouth") and assigns the strokes of the right cluster of the handwritten input to another recognition unit (i.e., for the right radical "jin"). Alternatively, the segmentation module 508 may also assign all strokes to a single recognition unit (i.e., for character "listening").
In some embodiments, segmentation module 508 segments the currently accumulated handwritten input (e.g., one or more handwritten strokes) into a set of recognition units in several different ways to create segmentation grid 520. For example, assume that a total of nine strokes have accumulated in the handwriting input area by now. Stroke 1,2,3 is grouped into a first recognition unit 522 and stroke 4,5,6 is grouped into a second recognition unit 526 according to a first segmentation chain of a segmentation grid 520. All strokes 1-9 are grouped into one recognition unit 526 according to the second segmentation chain of segmentation grid 520.
In some embodiments, a segmentation score is assigned to each segmentation chain to measure the likelihood that a particular segmentation chain is a correct segmentation of the current handwritten input. In some embodiments, optionally the factors for calculating the segmentation score for each segmentation chain include: absolute size and/or relative size of a stroke, relative span and/or absolute span of a stroke in various directions (e.g., x, y, and z directions), average and/or variance of saturation level of a stroke, absolute distance and/or relative distance from adjacent strokes, absolute position and/or relative position of a stroke, order or sequence in which strokes are input, duration of each stroke, average and/or variance of velocity (or tempo) at which each stroke is input, intensity distribution of each stroke along the length of the stroke, and so forth. In some embodiments, one or more functions or transforms are optionally applied to one or more of these factors to generate segmentation scores for different segmentation chains in segmentation grid 520.
In some embodiments, after the segmentation module 508 segments the current handwritten input 516 received from the user, the segmentation module 508 transmits the segmentation grid 520 to the normalization module 510. In some embodiments, normalization module 510 generates an input image (e.g., input image 528) for each recognition unit (e.g., recognition units 522, 524, and 526) specified in segmentation grid 520. In some embodiments, the normalization module performs necessary or desired normalization (e.g., stretching, clipping, downsampling, or upsampling) on the input image so that the input image may be provided as input to the handwriting recognition model 504. In some embodiments, each input image 528 includes strokes assigned to a respective recognition unit and corresponds to a character or radical to be recognized by handwriting recognition module 504.
In some embodiments, the input image generated by the input processing module 502 does not include any temporal information associated with the individual strokes, and only spatial information (e.g., information represented by the location and density of pixels in the input image) is retained in the input image. A handwriting recognition model trained purely in terms of training the spatial information of the writing samples enables handwriting recognition based on the spatial information only. Thus, the handwriting recognition model is independent of stroke order and stroke direction, and is not exhaustive of all possible permutations for all character stroke orders and stroke directions in its vocabulary (i.e., all output categories) during training. Indeed, in some embodiments, handwriting recognition module 502 does not distinguish pixels belonging to one stroke from pixels belonging to another stroke within the input image.
As will be described in more detail later (e.g., with respect to fig. 25A-27), in some embodiments, some time-derived stroke distribution information is reintroduced into the pure space handwriting recognition model to improve recognition accuracy without affecting stroke order and stroke direction independent of the recognition model.
In some embodiments, the input image generated by the input processing module 502 for one recognition unit does not overlap with the input image of any other recognition unit in the same segmentation chain. In some embodiments, the input images generated for different recognition units may have some overlap. In some embodiments, some overlap between input images is allowed for recognition of handwritten input written in a cursive writing style and/or includes a connecting character (e.g., one stroke connecting two adjacent characters).
In some embodiments, some normalization is performed prior to segmentation. In some embodiments, the functions of the segmentation module 508 and the normalization module 510 may be performed by the same module or by two or more other modules.
In some embodiments, when the input image 528 of each recognition unit is provided as input to the handwriting recognition model 504, the handwriting recognition model 504 produces outputs comprised of different likelihoods that the recognition unit is a corresponding output character in a vocabulary or vocabulary (i.e., a list of all characters and radicals recognizable by the handwriting recognition model 504) of the handwriting recognition model 504. As will be explained in more detail later, handwriting recognition model 504 has been trained to recognize a large number of characters in a variety of scripts (e.g., at least three non-overlapping scripts that have been encoded by the Unicode standard). Examples of non-overlapping words include latin words, chinese characters, arabic letters, bos, cyrillic letters, and artificial words such as emoji characters. In some embodiments, handwriting recognition model 504 generates one or more output characters for each input image (i.e., for each recognition unit) and assigns a respective recognition score for each output character based on a confidence level associated with the character recognition.
In some embodiments, handwriting recognition model 504 generates candidate grid 530 from segmented grid 520, wherein each arc in a chain of segmentations (e.g., corresponding to respective recognition units 522,524,526) in segmented grid 520 is extended to one or more candidate arcs (e.g., arcs 532,534,536,538,540 each corresponding to a respective output character) within candidate grid 530. Each candidate chain within candidate grid 530 is scored according to the respective segmentation scores of the segmentation chains below the candidate chain and the recognition score associated with the middle output character of the character chain.
In some embodiments, after the handwriting recognition model 504 produces output characters from the input image 528 of the recognition unit, the candidate grid 530 is passed to the result generation module 506 to generate one or more recognition results for the currently accumulated handwriting input 516.
In some embodiments, the result generation module 506 utilizes the radical clustering module 512 to combine one or more radicals in the candidate chain into a compound character. In some embodiments, the result generation module 506 uses one or more language models 514 to determine whether the character chains in the candidate grid 530 are likely sequences in a particular speech represented by the language model. In some embodiments, the outcome generation module 506 generates the revised candidate grid 542 by eliminating a particular arc or combining two or more arcs in the candidate grid 530.
In some embodiments, the result generation module 506 generates an integrated recognition score for each character sequence (e.g., character sequences 544 and 546) that remains in the revised candidate grid 542 based on the recognition scores of the output characters in the character sequences modified (e.g., enhanced or eliminated) by the radical clustering module 512 and the language model 514. In some embodiments, the result generation module 506 orders the different character sequences retained in the revised candidate grid 542 based on their integrated recognition scores.
In some embodiments, the result generation module 506 sends the top-ranked sequence of characters as the ranked recognition result 548 to the I/O interface module 500 for display to the user. In some embodiments, I/O interface module 500 displays received recognition results 548 (e.g., "china" and "women") in a candidate display area of the handwriting input interface. In some embodiments, the I/O interface module displays a plurality of recognition results (e.g., "china" and "women") for the user and allows the user to select a recognition result to enter as text input for the relevant application. In some embodiments, the I/O interface module automatically enters the top-ranked recognition result (e.g., "women") in response to other input or an indication that the user confirms the recognition result. Effectively automatically inputting the top ranked results may improve the efficiency of the input interface and provide a better user experience.
In some embodiments, the result generation module 506 uses other factors to change the integrated identification scores of the candidate chains. For example, in some embodiments, the results generation module 506 optionally maintains a log of the most frequently used characters for a particular user or users. If a particular candidate character or sequence of characters is found in the list of most frequently used characters or sequences of characters, the result generation module 506 optionally increases the integrated recognition score for the particular candidate character or sequence of characters.
In some embodiments, handwriting input module 157 provides real-time updates for recognition results displayed to the user. For example, in some embodiments, for each additional stroke entered by the user, input processing module 502 optionally re-segments the currently accumulated handwriting input and modifies the segmentation grid and input image provided to handwriting recognition model 504. In turn, handwriting recognition model 504 optionally modifies the candidate grid provided to result generation module 506. Thus, the result generation module 506 optionally updates the recognition results presented to the user. As used in this specification, real-time handwriting recognition refers to handwriting recognition in which a handwriting recognition result is presented to a user immediately or within a short time (e.g., within tens of milliseconds to several seconds). Real-time handwriting recognition differs from offline recognition (e.g., as in offline Optical Character Recognition (OCR) applications) in that recognition is initiated immediately and performed substantially simultaneously with receiving handwriting input, rather than at some time after a current user session in which a recorded image is saved for later retrieval. Furthermore, performing offline character recognition does not require any temporal information about individual strokes and stroke order, and thus does not require utilizing such information to perform segmentation. Further differentiation between similarly appearing candidate characters also does not take advantage of such temporal information.
In some embodiments, handwriting recognition model 504 is implemented as a Convolutional Neural Network (CNN). FIG. 6 illustrates an exemplary convolutional neural network 602 trained on a multi-script training corpus 604, the multi-script training corpus 604 containing written samples for characters in multiple non-overlapping scripts.
As shown in fig. 6, convolutional neural network 602 includes an input plane 606 and an output plane 608. Between input plane 606 and output plane 608 are a plurality of convolutional layers 610 (e.g., including a first convolutional layer 610a, zero or more intermediate convolutional layers (not shown), and a last convolutional layer 610 n). Each convolutional layer 610 is followed by a corresponding sub-sampling layer 612 (e.g., a first sub-sampling layer 612a, zero or more intermediate sub-sampling layers (not shown), and a last sub-sampling layer 612 n). Following the convolutional layers and the sub-sampling layers and just before the output plane 608 is a hidden layer 614. Hidden layer 614 is the last layer before output plane 608. In some embodiments, a kernel layer 616 (e.g., including a first kernel layer 616a, zero or more intermediate kernel layers (not shown), and a last kernel layer 612n) is inserted before each convolutional layer 610 to improve computational efficiency.
As shown in fig. 6, input plane 606 receives an input image 614 of a handwriting recognition unit (e.g., a handwritten character or radical), and output plane 608 outputs a set of probabilities indicating the likelihood that the recognition unit belongs to a respective output category (e.g., a neural network is configured as a particular character in the set of output characters to be recognized). The output class of the neural network as a whole (or the output character set of the neural network) is also referred to as a vocabulary or vocabulary of the handwriting recognition model. The convolutional neural networks described herein may be trained to have a vocabulary of tens of thousands of characters.
When the input image 614 is processed through different layers of the neural network, different spatial features embedded in the input image 614 are extracted by the convolutional layer 610. Each convolutional layer 610 is also referred to as a set of feature maps and serves as a filter for selecting particular features in the input image 614 for distinguishing between images corresponding to different characters. The sub-sampling layer 612 ensures that larger and larger sized features are captured from the input image 614. In some embodiments, the sub-sampling layer 612 is implemented using a max-pooling technique. The max-pooling layer creates a location invariance over a larger local area and downsamples the output images of the previous convolutional layer by a factor of Kx and Ky in each direction, Kx and Ky being the size of the max-pooling rectangle. Maximum pooling achieves faster convergence rates by selecting good quality invariant features that improve normalization performance. In some embodiments, other methods are used to implement sub-sampling.
In some embodiments, after the last set of convolutional layers 610n and sub-sampling layers 612n and before the output plane 608 is a fully connected layer, i.e., a hidden layer 614. The fully connected hidden layer 614 is a multi-layer perceptron that fully connects nodes in the last sub-sampled layer 612n and nodes in the output plane 608. The hidden layer 614 takes the output image received from that layer before and during the logistic regression to one of the output characters in the output layer 608.
During training of the convolutional neural network 602, the features in the convolutional layer 610 and the corresponding weights associated with the features and the weights associated with the parameters in the hidden layer 614 are tuned such that the classification error is minimized for written samples in the training corpus 604 having known output classes. Once the convolutional neural network 602 is trained and an optimal set of parameters and associated weights will be established for the different layers in the network, the convolutional neural network 602 can be used to identify new writing samples 618 that are not part of the training corpus 604, such as input images generated based on real-time handwriting input received from a user.
As described herein, a multi-script training corpus is used to train a convolutional neural network of a handwriting input interface to enable multi-script or mixed-script handwriting recognition. In some embodiments, the convolutional neural network is trained to recognize a large vocabulary of 3 ten thousand characters to over 6 thousand characters (e.g., all characters encoded by the Unicode standard). Most existing handwriting recognition systems are based on Hidden Markov Methods (HMMs) that depend on the stroke order. Furthermore, most existing handwriting recognition models are language specific and include a small vocabulary of tens of characters (e.g., characters of the english alphabet, the greek alphabet, all ten digits, etc.) up to thousands of characters (e.g., a set of the most common chinese characters). As such, the universal recognizer described herein can handle several orders of magnitude more characters than most existing systems.
Some conventional handwriting systems may include several individually trained handwriting recognition models, each customized for a particular language or small character set. The written sample is propagated through the different recognition models until classification can occur. For example, a series of connected language-specific or text-specific character recognition models may be provided with handwriting samples, which if not ultimately classified by a first recognition model, are provided to a next recognition model that attempts to classify the handwriting samples within its own vocabulary. The approach for classification is time consuming and memory requirements can increase rapidly with each additional recognition model that needs to be employed.
Other existing models require the user to specify a preferred language and use the selected handwriting recognition model to classify the current input. Such implementations are not only cumbersome to use and consume significant memory, but also cannot be used to recognize mixed language inputs. It is impractical to require the user to switch language preferences mid-way through the input of a mixed language or mixed text input.
The multi-word recognizer or universal recognizer described herein solves at least some of the above problems of conventional recognition systems. FIG. 7 is a flow diagram of an exemplary process 700 for training a handwriting recognition module (e.g., a convolutional neural network) using a large multi-script training corpus, such that the handwriting recognition module can be subsequently used to provide real-time multi-language handwriting recognition and multi-script handwriting recognition for a user's handwriting input.
In some embodiments, training of the handwriting recognition model is performed on a server device, and the trained handwriting recognition model is then provided to the user device. The handwriting recognition model optionally performs real-time handwriting recognition locally on the user device without further assistance from the server. In some embodiments, both training and recognition are provided on the same device. For example, the server device may receive a user's handwriting input from the user device, perform handwriting recognition, and send the recognition results to the user device in real-time.
In the exemplary process 700, at a device having memory and one or more processors, the device trains (702) a multi-word handwriting recognition model based on spatially derived features (e.g., stroke order independent features) of a multi-word training corpus. In some embodiments, the spatially derived features of the multi-script training corpus are stroke order independent (704) and stroke direction independent. In some embodiments, the training (706) of the multi-script handwriting recognition model is independent of the temporal information associated with the respective strokes in the handwriting sample. Specifically, the images of the handwritten sample are normalized to a predetermined size, and the images do not include any information about the order in which the individual strokes are input to form the images. Furthermore, the image does not include any information about the direction in which the individual strokes are input to form the image. In practice, during training, features are extracted from the handwritten image without regard to how the image is temporarily formed by individual strokes. Thus, during recognition, no time information associated with each stroke is required. Thus, recognition robustly provides consistent recognition results despite delayed, unordered strokes, and arbitrary stroke directions in the handwritten input.
In some embodiments, the multi-script training corpus includes corresponding handwriting samples of characters with at least three non-overlapping scripts. As shown in fig. 6, a multi-word training corpus includes handwriting samples collected from many users. Each handwriting sample corresponds to a character of a respective word represented in the handwriting recognition model. To fully train the handwriting recognition model, the training corpus includes a large number of writing samples for each character of the text represented in the handwriting recognition model.
In some embodiments, the at least three non-overlapping words include (708) chinese characters, emoji characters, and latin words. In some embodiments, a multi-script handwriting recognition model has (710) at least thirty thousand output categories representing thirty thousand characters spanning at least three non-overlapping scripts.
In some embodiments, the multi-script training corpus includes a respective writing sample for each of all chinese characters encoded in the Unicode standard (e.g., all or most of all CJK (chinese-japanese-korean) unified ideographs). The Unicode standard defines a total of about seventy-four thousand CJK Uniideograms. The basic block of CJK Uniideograms (4E00-9FFF) includes 20,941 basic chinese characters for chinese as well as japanese, korean, and vietnamese. In some embodiments, the multi-script training corpus includes writing samples for all characters in a basic block of CJK unicomplexes. In some embodiments, the multi-script training corpus further includes writing samples for CJK radicals, which may be used to structurally compose one or more compound chinese characters. In some embodiments, the multi-word training corpus further includes writing samples for less used chinese characters, such as chinese characters encoded in one or more ideograms in a CJK unified ideogram extension set.
In some embodiments, the multi-script training corpus further includes a respective writing sample for each of all characters in the latin script encoded by the Unicode standard. The characters in basic latin characters include upper case latin letters and lower case latin letters, as well as various basic symbols and numbers commonly used on standard latin character keyboards. In some embodiments, the multi-script training corpus further includes characters in extended latin scripts (e.g., various accented forms of the basic latin alphabet).
In some embodiments, the multi-script training corpus includes writing samples corresponding to each character of an artificial script that is not associated with any natural human language. For example, in some embodiments, a set of emoji characters is optionally defined in the emoji script, and writing samples corresponding to each emoji character are included in the multi-script training corpus. For example, hand-drawn cardioid is used to train emoticon characters in a corpusThe handwritten sample. Similarly, hand-drawn smiley faces (e.g., two points above an upper curved arc) are used to train emoji characters in a corpus The handwritten sample. Other emoji characters include categories of icons that display different emotions (e.g., happy, sad, angry, embarrassed, surprised, laugh, crying, depressed, etc.), different objects and characters (e.g., cat, dog, rabbit, heart, fruit, eyes, lips, gift, flower, candle, moon, star, etc.), and different actions (e.g., holding hands, kissing, running, dancing, jumping, sleeping, eating, dating, loving, voting, etc.). In some embodiments, the strokes in the handwriting sample corresponding to an emoji character are simplified and/or stylized lines that form the actual lines of the corresponding emoji character. In some embodiments, each device or application may use a different design for the same emoji character. For example, a smiley face emoticon character presented to a female user may be the same as a smiley face presented to a male user even though the handwritten input received from both users is substantially the sameThe emoticon characters are different.
In some embodiments, the multi-script training corpus also includes written samples for characters in other scripts, such as greek scripts (e.g., including greek letters and symbols), cyrillic scripts, hebrew scripts, and one or more other scripts encoded according to the Unicode standard. In some embodiments, the at least three non-overlapping scripts included in the multi-script training corpus include chinese characters, emoji characters, and characters in latin scripts. The characters in chinese characters, emoji characters and latin characters are naturally non-overlapping characters. Many other words may overlap each other for at least some characters. For example, some characters (e.g., A, Z) in latin scripts may be found in many other scripts (e.g., greek and cyrillic). In some embodiments, the multi-script training corpus includes chinese characters, arabic characters, and latin characters. In some embodiments, the multi-word training corpus includes other combinations of overlapping and/or non-overlapping words. In some embodiments, the multi-script training corpus includes written samples for all characters encoded by the Unicode standard.
As shown in fig. 7, in some embodiments, to train a multi-script handwriting recognition model, the device provides (712) handwriting samples of a multi-script training corpus to a single convolutional neural network having a single input plane and a single output plane. The device determines (714) spatially-derived features (e.g., stroke order independent features) of the handwriting samples and corresponding weights for the spatially-derived features using a convolutional neural network for distinguishing characters of at least three non-overlapping words represented in a multi-word training corpus. The multi-script handwriting recognition model differs from conventional multi-script handwriting recognition models in that all samples in a multi-script training corpus are used to train a single handwriting recognition model having a single input plane and a single output plane. A single convolutional neural network is trained to distinguish all characters represented in a multi-script training corpus without relying on sub-networks that each process a small subset of the training corpus (e.g., the sub-networks are each trained on characters of a particular script or to recognize characters used in a particular language). In addition, a single convolutional neural network is trained to distinguish a large number of characters spanning multiple non-overlapping words, rather than characters of several overlapping words, such as latin words and greek words (e.g., with overlapping letters A, B, E, Z, etc.).
In some embodiments, the device provides (716) real-time handwriting recognition for the user's handwriting input using a multi-script handwriting recognition model that has been trained for spatially-derived features of a multi-script training corpus. In some embodiments, providing real-time handwriting recognition for the user's handwriting input includes continuously modifying the recognition output for the user's handwriting input as the user continues to provide additions and modifications to the handwriting input. In some embodiments, providing real-time handwriting recognition for the user's handwriting input further includes (718) providing a multi-text handwriting recognition model to the user device, wherein the user device receives the handwriting input from the user and performs handwriting recognition on the handwriting input locally based on the multi-text handwriting recognition model.
In some embodiments, the device provides multiple text handwriting recognition models to multiple devices that do not have existing overlap in their respective input languages, and uses the multiple text handwriting recognition models on each of the multiple devices for handwriting recognition of different languages associated with each user device. For example, where a multi-script handwriting recognition model has been trained to recognize characters in many different words and languages, the same handwriting recognition model may be used throughout the world to provide handwriting input for any of those input languages. A first device of a user who only wishes to input using english and hebrew may provide handwriting input functionality using the same handwriting recognition model as a second device of another user who only wishes to input using chinese and emoji characters. Instead of requiring the user of the first device to independently install an english handwriting input keyboard (e.g., implemented with an english-specific handwriting recognition model) and an independent hebrew handwriting input keyboard (e.g., implemented with a hebrew-specific handwriting recognition model), the same generic multi-script handwriting recognition model may be installed on the first device at one time and used to provide handwriting input functionality for english, hebrew and to provide mixed input in both languages. Furthermore, the second user is not required to install a chinese handwriting input keyboard (e.g., implemented with a chinese-specific handwriting recognition model) and a separate emoji handwriting input keyboard (e.g., implemented with an emoji-specific handwriting recognition model), but the same general-purpose multi-script handwriting recognition model can be installed on the second device at once and used to provide handwriting input functionality for chinese, emoji and to provide hybrid input using both scripts. Processing a large vocabulary across multiple scripts (e.g., using most or all characters encoded using nearly one hundred different scripts) using the same multi-script handwriting model improves the utility of the recognizer without significant burden on the device vendor and user.
Training a multi-script handwriting recognition model using a large multi-script training corpus is different from conventional HMM-based handwriting recognition systems and does not rely on temporal information associated with individual strokes of characters. Furthermore, the resource and memory requirements for a multi-word recognition system do not increase linearly as the symbols and languages covered by the multi-word recognition system increase. For example, in conventional handwriting systems, increasing the number of languages means adding another independently trained model, and the memory requirements will at least double to accommodate the enhanced capabilities of the handwriting recognition system. In contrast, when training a multi-word model over a multi-word training corpus, improving language coverage requires retraining the handwriting recognition model with additional handwriting samples and increasing the size of the output plane, but by a very modest amount. Assuming that the multi-script training corpus includes handwriting samples corresponding to N different languages and the multi-script handwriting recognition model occupies a memory of size m, when increasing language coverage to N languages (N > N), the apparatus retrains the multi-script handwriting recognition model based on spatially-derived features of a second multi-script training corpus including second handwriting samples corresponding to N different languages. The variation of M/M remains substantially constant in the range of 1-2, with the variation of N/N ranging from 1 to 100. Once the multi-script handwriting recognition model is retrained, the device may use the retrained multi-script handwriting recognition model to provide real-time handwriting recognition for the user's handwriting input.
Fig. 8A-8B illustrate exemplary user interfaces for providing real-time multi-script handwriting recognition and input on a portable user device (e.g., device 100). In fig. 8A-8B, handwriting input interface 802 is displayed on a touch-sensitive display screen (e.g., touch screen 112) of a user device. Handwriting input interface 802 includes handwriting input area 804, candidate display area 806, and text input area 808. In some embodiments, handwriting input interface 802 further includes a plurality of control elements, where each control element may be invoked to cause the handwriting input interface to perform a predetermined function. As shown in fig. 8A, a delete button, a space button (carriagereturn or Enterbutton), an enter button, a keyboard switch button are included in the handwriting input interface. Other control elements are also possible and may optionally be provided in the handwriting input interface to accommodate each different application that utilizes handwriting input interface 802. The layout of the different components of handwriting input interface 802 is merely exemplary and may vary for different devices and different applications.
In some embodiments, handwriting input area 804 is a touch-sensitive area for receiving handwriting input from a user. The sustained contact on the touch screen and its associated motion path within handwriting input area 804 are registered as handwritten strokes. In some embodiments, the handwritten strokes registered by the device are visually rendered within handwriting input area 804 at the same location of the maintained contact tracking. As shown in fig. 8A, a user provides several handwritten strokes in handwriting input area 804, including some handwritten chinese characters (e.g., "my very"), some handwritten english letters (e.g., "Happy"), and hand-drawn emoji characters (e.g., smiley face). Handwritten characters are distributed across multiple rows (e.g., two rows) in handwriting input area 804.
In some embodiments, candidate display area 806 displays one or more recognition results (e.g., 810 and 812) for the current accumulated handwriting input in handwriting input area 804. Typically, the top ranked recognition result (e.g., 810) is displayed in the first position in the candidate display area. As shown in fig. 8A, since the handwriting recognition model described herein is capable of recognizing characters of a variety of non-overlapping scripts including chinese characters, latin scripts, and emoji characters, the recognition results (e.g., 810) provided by the recognition model correctly include the chinese characters, english letters, and emoji characters represented by the handwriting input. The user is not required to stop in the middle of writing the input to select the switching recognition language.
In some embodiments, text entry area 808 is an area that displays text input provided to a corresponding application that employs a handwriting input interface. As shown in FIG. 8A, text entry area 808 is used by the notepad application, and the text currently shown within text entry area 808 (e.g., "America is beautiful") is the text input that has been provided to the notepad application. In some embodiments, cursor 813 indicates the current text entry location in text entry area 808.
In some embodiments, the user may select a particular recognition result displayed in candidate display area 806, for example, by an explicit selection input (e.g., a tap gesture on one of the displayed recognition results) or a implicit confirmation input (e.g., a tap gesture on an "enter" button or a double-tap gesture in a handwriting input area). As shown in FIG. 8B, the user explicitly selects the top ranked recognition result 810 using a tap gesture (as shown by contact 814 over recognition result 810 in FIG. 8A). In response to the selection input, the text of the recognition result 810 is inserted at the insertion point indicated by the cursor 813 in the text input region 808. As shown in fig. 8B, upon entering the text of the selected recognition result 810 into the text entry area 808, both the handwriting entry area 804 and the candidate display area 806 are cleared. Handwriting input area 804 is now ready to accept new handwriting input and candidate display area 806 can now be used to display recognition results for the new handwriting input. In some embodiments, the implied input of confirmation causes the top ranked recognition result to be input into text entry area 808 without the user having to stop and select the top ranked recognition result. Well-designed hinted validation input increases text entry speed and reduces the cognitive burden placed on users during text writing.
In some embodiments (not shown in fig. 8A-8B), the top-ranked recognition result of the current handwritten input is optionally displayed temporarily in text entry area 808. For example, the tentative text input displayed in text input area 808 is visually distinguished from other text inputs in the text input area by a tentative input box around the tentative text input. The text shown in the tentative input box is not submitted or provided to an associated application (e.g., notepad application), and the handwriting input module is automatically updated when the top-ranked recognition result is changed, for example, in response to a user modifying the current handwriting input.
9A-9B are flow diagrams of an exemplary process 900 for providing multi-word handwriting recognition on a user device. In some embodiments, as shown in diagram 900, a user device receives (902) a multi-script handwriting recognition model that has been trained on spatially derived features (e.g., features that are independent of stroke order and stroke direction) of a multi-script training corpus that includes handwriting samples corresponding to characters of at least three non-overlapping scripts. In some embodiments, the multi-script handwriting recognition model is (906) a single convolutional neural network having a single input plane and a single output plane, and includes spatially-derived features and corresponding weights for the spatially-derived features for distinguishing characters of at least three non-overlapping scripts represented in the multi-script training corpus. In some embodiments, the multi-script handwriting recognition model (908) is configured to recognize characters based on respective input images of one or more recognition units recognized in the handwriting input, and respective spatially-derived features for recognition are independent of respective stroke order, stroke direction, and continuity of strokes in the handwriting input.
In some embodiments, a user device receives (908) handwritten input from a user, the handwritten input including one or more handwritten strokes provided on a touch-sensitive surface coupled to the user device. For example, the handwritten input includes respective data regarding the location and movement of contact between a finger or stylus and a touch-sensitive surface coupled to the user device. In response to receiving the handwriting input, the user device provides (910) one or more handwriting recognition results to the user in real-time based on a multi-script handwriting recognition model (912) that has been trained for spatially-derived features of a multi-script training corpus.
In some embodiments, in providing real-time handwriting recognition results to a user, the user device segments (914) the user's handwriting input into one or more recognition units, each recognition unit including one or more of the handwritten strokes provided by the user. In some embodiments, the user device segments the user's handwritten input according to the shape, location, and size of individual strokes formed by contact between the user's finger or stylus and the touch-sensitive surface of the user device. In some embodiments, the split handwriting input also takes into account the relative order and relative position of individual strokes formed by contact between a user's finger or stylus and the touch-sensitive surface of the user device. In some embodiments, the user's handwritten input is of a cursive writing style, and each successive stroke in the handwritten input may correspond to multiple strokes in a printed form of the recognized character. In some embodiments, the user's handwritten input may include continuous strokes across multiple recognized characters in printed form. In some embodiments, the handwritten input is segmented to generate one or more input images, each input image corresponding to a respective recognition unit. In some embodiments, some of the input images optionally include some overlapping pixels. In some embodiments, the input image does not include any overlapping pixels. In some embodiments, the user device generates a segmentation grid, each segmentation chain of the segmentation grid representing a respective manner of segmenting the current handwritten input. In some embodiments, each arc in the segmentation chain corresponds to a respective set of strokes in the current handwriting input.
As shown in diagram 900, the user device provides (914) a respective image for each of the one or more recognition units as input to the multi-word recognition model for at least one of the one or more recognition units, the user device obtains (916) from the multi-word handwriting recognition model at least a first output character from a first word and at least a second output from a second word that is different from the first wordMay be similar to handwriting for the CJK root "west". In some embodiments, a multi-word handwriting recognition model typically produces multiple candidate recognition results that may correspond to a user's handwriting input, because the visual appearance of the handwriting input is difficult to interpret, even for a human reader. In some embodiments, the first literals are CJK base character blocks and the second literals are latin literals as encoded by the Unicode standard. In some embodiments, the first letter is a CJK base character block and the second letter is a set of emoticon characters. In some embodiments, the first word is a latin word and the second word is an emoji character.
In some embodiments, the user device displays (918) both the first output character and the second output character in a candidate display area of a handwriting input interface of the user device. In some embodiments, the user device selectively displays (920) one of the first output character and the second output character based on which of the first and second scripts is for a respective script in a soft keyboard currently installed on the user device. For example, assuming that the handwriting recognition model has recognized the chinese character "in" and the greek letter "λ" as output characters for the current handwriting input, the user device determines whether the user has installed a chinese soft keyboard (e.g., a keyboard using a pinyin input method) or a greek input keyboard on the user device. If the user device determines that only the Chinese soft keyboard is installed, the user device optionally displays only the Chinese character "in" to the user as a recognition result instead of the Greek letter "λ".
In some embodiments, the user device provides real-time handwriting recognition and input. In some embodiments, the user device continuously modifies (922) one or more recognition results for the user's handwritten input in response to the user continuing to add or modify the handwritten input before the user makes an explicit selection or an implicit selection of a recognition result displayed to the user. In some embodiments, in response to each modification of one or more recognition results, the user displays (924) the corresponding modified one or more recognition results to the user in a candidate display area of the handwriting input user interface.
In some embodiments, a multi-script handwriting recognition model is trained (926) to recognize all characters of at least three non-overlapping scripts, including chinese characters, emoji characters, and latin scripts encoded according to the Unicode standard. In some embodiments, the at least three non-overlapping words include chinese characters, arabic words, and latin words. In some embodiments, the multi-script handwriting recognition model has (928) at least thirty thousand output categories representing at least thirty characters spanning at least three non-overlapping scripts.
In some embodiments, the user device allows the user to enter multi-word handwriting input, such as a phrase that includes characters using more than one word. For example, a user can continuously write and receive a handwriting recognition result including characters using more than one kind of letters without stopping in the middle of writing to manually switch the recognition language. For example, a user may write the multi-word sentence "Hellomeans hello in Chinese in a handwriting input area of a user device. "without switching the input language from english to chinese prior to writing the chinese character" hello "or switching the input language from chinese back to english when writing the english word" inChinese ".
As described herein, a multi-word handwriting recognition model is used to provide real-time handwriting recognition for a user's input. In some embodiments, real-time handwriting recognition is used to provide real-time multi-script handwriting input functionality on a user's device. 10A-10C are flow diagrams of an exemplary process 1000 for providing real-time handwriting recognition and input on a user device. In particular, real-time handwriting recognition is independent of stroke order at the character level, phrase level, and sentence level.
In some embodiments, character-level stroke-order-independent handwriting recognition requires that the handwriting recognition model provide the same recognition results for a particular handwritten character, regardless of the order of individual strokes of the particular character that have been provided by the user. For example, individual strokes of a Chinese character are typically written in a particular order. Although people with native Chinese are often trained at school to write each Chinese character in a particular order, many users will later employ a personalized style and stroke order that deviates from the conventional stroke order. Furthermore, cursive writing styles are highly personalized, and multiple strokes of a printed form of a chinese character are often combined into a single stylized stroke that twists and bends, and sometimes even connects to the next character. A stroke order independent recognition model is trained based on images of the writing sample without temporal information associated with individual strokes. Thus, recognition is independent of stroke order information. For example, for the Chinese character "ten," the handwriting recognition model will give the same recognition result "ten" whether the user writes the horizontal stroke first or the vertical stroke first.
As shown in FIG. 10A, in process 1000, a user device receives (1002) a plurality of handwritten strokes from a user, the plurality of handwritten strokes corresponding to handwritten characters. For example, a handwritten input for the character "ten" typically includes substantially horizontal handwritten strokes that intersect substantially vertical handwritten strokes.
In some embodiments, the user device generates (1004) an input image based on the plurality of handwritten strokes. In some embodiments, the user device provides (1006) the input image to a handwriting recognition model to perform real-time handwriting recognition on the handwritten character, wherein the handwriting recognition model provides stroke-order independent handwriting recognition. Then, as the plurality of handwritten strokes are received, the user device displays (1008) the same first output character (e.g., the character "ten" in printed form) in real-time regardless of the respective order of the plurality of handwritten strokes (e.g., horizontal strokes and vertical strokes) that have been received from the user.
Although some conventional handwriting recognition systems permit slight stroke order variations in a small number of characters by specifically including such variations in training the handwriting recognition system. Such conventional handwriting recognition systems cannot scale to accommodate any stroke order variation for a large number of complex characters, such as chinese characters, because even moderately complex characters have resulted in significant variations in stroke order. Furthermore, conventional recognition systems still cannot handle handwriting input that combines multiple strokes into a single stroke (e.g., when writing in super cursive fashion) or that splits one stroke into multiple sub-strokes (e.g., when capturing a character with super-coarse sampling of input strokes) by including only more permutations of acceptable stroke orders for a particular character. Thus, the multi-script handwriting system trained for spatially derived features described herein has advantages over conventional recognition systems.
In some embodiments, stroke order independent handwriting recognition is performed independently of time information associated with individual strokes within each handwritten character. In some embodiments, stroke order independent handwriting recognition is performed in conjunction with stroke distribution information that takes into account the spatial distribution of individual strokes prior to merging the individual strokes into a flat input image. More details on how the time-derived stroke distribution information is used to enhance the above-described stroke order-independent handwriting recognition are provided later in the specification (e.g., with respect to FIGS. 25A-27). The techniques described with respect to FIGS. 25A-27 do not disrupt the stroke order independence of the handwriting recognition system.
In some embodiments, the handwriting recognition model provides (1010) stroke direction independent handwriting recognition. In some embodiments, stroke direction independent recognition requires the user device to display the same first output character in response to receiving multiple handwritten inputs, regardless of the respective stroke direction of each of the multiple handwritten strokes that have been provided by the user. For example, if a user writes a Chinese character "ten" in the handwriting input area of the user device, the handwriting recognition model will output the same recognition result regardless of whether the user writes horizontal strokes from left to right or right to left. Similarly, the handwriting recognition model will output the same recognition result regardless of whether the user writes the vertical stroke in a top-to-bottom direction or a bottom-to-top direction. In another example, many chinese characters are structurally composed of two or more radicals. Some chinese characters each include a left radical and a right radical, and people usually write the left radical first and then the right radical. In some embodiments, the handwriting recognition model will provide the same recognition result whether the user writes the right radical first or the left radical first, as long as the resulting handwriting input shows the left radical to the left of the right radical when the user completes handwriting the character. Similarly, some chinese characters each include an upper radical and a lower radical, and people typically write the upper radical first and then the lower radical. In some embodiments, the handwriting recognition model will provide the same recognition result whether the user writes the upper or lower radical first, as long as the resulting handwritten input shows the upper or lower radical above. In other words, the handwriting recognition model does not rely on the direction in which the user provided each stroke of the handwritten character to determine the identity of the handwritten character.
In some embodiments, the handwriting recognition model provides handwriting recognition based on an image of the recognition unit, regardless of the number of sub-strokes that have been utilized by the user to provide the recognition unit. In other words, in some embodiments, the handwriting recognition model provides (1014) handwriting recognition independent of stroke count. In some embodiments, the user device displays the same first output character in response to receiving multiple handwritten strokes, regardless of how many handwritten strokes are used to form consecutive strokes in the input image. For example, if a user writes a Chinese character "ten" in a handwriting input area, the handwriting recognition model will output the same recognition result regardless of whether the user provided four strokes (e.g., two short horizontal strokes and two short vertical strokes to make up a cross), two strokes (e.g., an L-shaped stroke and a 7-shaped stroke, or a horizontal stroke and a vertical stroke), or any other number of strokes (e.g., hundreds of very short strokes or points) to make up the shape of the character "ten".
In some embodiments, the handwriting recognition model is capable of recognizing not only the same character regardless of the order, direction, and stroke count in which each single character is written, but the handwriting recognition model is also capable of recognizing multiple characters regardless of the temporal order of the strokes of the multiple characters that have been provided by the user.
In some embodiments, the user device receives not only the first plurality of handwritten strokes, but also receives (1016) a second plurality of handwritten strokes from the user, where the second plurality of handwritten strokes corresponds to a second handwritten character. In some embodiments, the user device generates (1018) a second input image based on the second plurality of handwritten strokes. In some embodiments, the user device provides (1020) a second input image to the handwriting recognition model to perform real-time recognition of the second handwritten character. In some embodiments, when the second plurality of handwritten strokes is received, the user device displays (1022), in real-time, a second output character corresponding to the second plurality of handwritten strokes. In some embodiments, the second output character and the first output character are displayed simultaneously in the spatial sequence regardless of the respective order in which the first plurality of handwritten strokes and the second plurality of handwritten strokes have been provided by the user. For example, if the user writes two Chinese characters (e.g., "ten" and "eight") in the handwriting input area of the user device, the user device will display the recognition result "eighteen" whenever the current accumulated handwriting input in the handwriting input area shows that the strokes of the character "ten" are to the left of the strokes of the character "eight", regardless of whether the user writes the strokes of the character "ten" first or the character "eight" first. In fact, if the user has written some strokes of the character "eight" (e.g., left-hand-bent strokes) before writing some strokes of the character "ten" (e.g., vertical strokes), the user device will display the recognition results "eighteen" in spatial order of the two handwritten characters as long as the resulting image of the handwriting input in the handwriting input area shows that all strokes of the character "ten" are to the left of all strokes of the character "eight".
In other words, as shown in fig. 10B, in some embodiments, the spatial order of the first output character and the second output character corresponds to (1024) a spatial distribution of the first plurality of handwritten strokes and the second plurality of strokes along a default writing direction (e.g., left to right) of the handwriting input interface of the user device. In some embodiments, the second plurality of handwritten strokes is temporarily received (1026) after the first plurality of handwritten strokes, and the second output character precedes the first output character in the spatial sequence along a default writing direction of a handwriting input interface of the user device (e.g., left to right).
In some embodiments, the handwriting recognition model provides stroke order independent recognition in terms of sentence-to-sentence hierarchy. For example, even if the handwritten character "ten" is in a first handwritten sentence and the handwritten character "eight" is in a second handwritten sentence, and the two handwritten characters are separated by one or more other handwritten characters and/or words in the handwriting input area, the handwriting recognition model will still provide a recognition result "eighteen" showing the two characters in the spatial sequence. Regardless of the temporal order of the strokes of the two characters that have been provided by the user, when the user completes the handwriting input, the recognition result and the spatial order of the two recognized characters remain the same, provided that the recognition units of the two characters are spatially arranged in the sequence "eighteen". In some embodiments, a first handwritten character (e.g., "ten") is provided by a user as part of a first handwritten sentence (e.g., "ten isanumber"), and a second handwritten character (e.g., "eight") is provided by the user as part of a second handwritten sentence (e.g., "eight isanothernumber"), and the first and second handwritten sentences are displayed simultaneously in a handwriting input area of the user device. In some embodiments, when the user confirms that the recognition result (e.g., "ten isanum ber. eight isanotheramber.") is a correct recognition result, two sentences will be input into the text input area of the user device, and the handwriting input area will be cleared for the user to input another handwriting input.
In some embodiments, because the handwriting recognition model is independent of stroke order not only at the character level, but also at both the phrase and sentence levels, the user can make corrections to previous incomplete characters after a subsequent character has been written. For example, if the user forgets to write a particular stroke of a certain character before continuing to write one or more subsequent characters in the handwriting input area, the user may still write the missing stroke later at the correct location in the particular character to receive the correct recognition result.
In conventional stroke order dependent recognition systems (e.g., HMM-based recognition systems), once a character has been written, it is submitted and the user can no longer make any changes to it. If the user wishes to make any changes, the user must delete the character and all subsequent characters to start all over again. In some conventional recognition systems, the user is required to complete the handwritten character within a short predetermined time window, and any strokes entered outside the predetermined time window are not included in the same recognition unit because other strokes are provided during the time window. Such conventional systems are difficult to use and present a lot of frustration to the user. A stroke order independent system does not suffer from these drawbacks, and the user can complete the character in any order and for any period of time that the user sees fit. The user may also correct (e.g., add one or more strokes) an earlier written character after writing one or more characters in succession in the handwriting input interface. In some embodiments, the user may also independently delete (e.g., using the method described later with respect to FIGS. 21A-22B) earlier written characters and overwrite the same location in the handwriting input interface.
As shown in fig. 10B-10C, the second plurality of handwritten strokes is spatially subsequent to the first plurality of handwritten strokes along a default writing direction of a handwriting input interface of the user device (1028), and the second output character is subsequent to the first output character in the spatial sequence along the default writing direction in a candidate display area of the handwriting input interface. The user device receives (1030) a third handwritten stroke from the user to modify the first handwritten character (i.e., the handwritten character formed by the first plurality of handwritten strokes), the third handwritten stroke being temporarily received after the first plurality of handwritten strokes and the second plurality of handwritten strokes. For example, a user has written two characters (e.g., "human bodies") in a left-to-right spatial sequence in a handwriting input area. The first plurality of strokes forms the handwritten character "eight". Note that the user actually wants to write the character "one", but loses one stroke. The second plurality of strokes forms a handwritten character "body". When the user later realizes that he wishes to write "individual" instead of "human body", the user may simply add a vertical stroke underneath the stroke of the character "eight", and the user device assigns the vertical stroke to the first recognition unit (e.g., the recognition unit for "eight"). The user device will output a new output character (e.g., "eight") for the first recognition unit, wherein the new output character will replace the previous output character (e.g., "eight") in the recognition result. As shown in fig. 10C, in response to receiving the third handwritten stroke, the user device assigns (1032) the third handwritten stroke to the same recognition unit as the first plurality of handwritten strokes based on a relative proximity of the third handwritten stroke to the first plurality of handwritten strokes. In some embodiments, the user device generates (1034) the corrected input image based on the first plurality of handwritten strokes and the third handwritten stroke. The user device provides (1036) the modified input image to a handwriting recognition model to perform real-time recognition of the modified handwritten character. In some embodiments, the user device displays (1040), in response to receiving the third handwriting input, a third output character corresponding to the revised input image, wherein the third output character replaces the first output character and is displayed concurrently with the second output character in the spatial sequence along the default writing direction.
In some embodiments, the handwriting recognition module recognizes handwriting input written in a default writing direction from left to right. For example, a user may write characters in one or more lines from left to right. In response to the handwriting input, the handwriting input module presents recognition results including characters in one or more rows in a spatial sequence from left to right as desired. If the user selects a recognition result, the selected recognition result is input into a text input area of the user device. In some embodiments, the default writing direction is from top to bottom. In some embodiments, the default writing direction is from right to left. In some embodiments, the user optionally changes the default writing direction to an alternative writing direction after the recognition result has been selected and the handwriting input area has been cleared.
In some embodiments, the handwriting input module allows a user to enter multi-character handwriting input in a handwriting input area and allows strokes to be deleted from handwriting input of one recognition unit at a time, rather than deleting strokes from all recognition units at a time. In some embodiments, the handwriting input module allows one stroke at a time to be deleted from the handwriting input. In some embodiments, the deletion of recognition units is performed one after another in a direction opposite the default writing direction, regardless of the order in which the recognition units or strokes were entered to produce the current handwritten input. In some embodiments, the deletion of strokes is performed one by one in the reverse order of the strokes entered within each recognition unit, and when all strokes in one recognition unit have been deleted, the deletion of strokes of the next recognition unit is performed in the direction opposite to the default writing direction.
In some embodiments, the user device receives a deletion input from the user while the third output character and the second output character are simultaneously displayed as candidate recognition results in the candidate display area of the handwriting input interface. In response to the deletion input, the user device deletes the second output character from the recognition result while maintaining the third output character in the recognition result displayed in the candidate display area.
In some embodiments, as shown in fig. 10C, the user device renders (1042) the first plurality of handwritten strokes, the second plurality of handwritten strokes, and the third handwritten stroke in real-time as the user provides each of the handwritten strokes. In some embodiments, in response to receiving a deletion input from the user, the user device deletes (1044) a respective rendering of a second plurality of handwritten inputs (e.g., corresponding to a second handwritten character) from the handwriting input area while maintaining the respective rendering of the first plurality of handwritten strokes and a third handwritten stroke (e.g., collectively corresponding to the corrected first handwritten character) in the handwriting input area. For example, after the user provides a missing vertical stroke in the character sequence "individual", if the user enters a deletion input, the stroke in the recognition unit for the character "body" is removed from the handwriting input area, and the character "body" is removed from the recognition result "individual" in the candidate display area of the user device. After deletion, the strokes for the character "one" remain in the handwriting input area, while the recognition result only shows the character "one".
In some embodiments, the handwritten character is a multi-stroke Chinese character. In some embodiments, the first plurality of handwritten inputs is provided in a cursive writing format. In some embodiments, the first plurality of handwritten inputs are provided in a cursive writing style, and the handwritten characters are multi-stroke Chinese characters. In some embodiments, the handwritten character is an arabic character written in a cursive style. In some embodiments, the handwritten character is other text written in a cursive style.
In some embodiments, the user device establishes respective predetermined constraints on a set of acceptable sizes for the handwritten character input and segments the currently accumulated plurality of handwritten strokes into a plurality of recognition units based on the respective predetermined constraints, wherein a respective input image is generated from each recognition unit, provided to the handwriting recognition model, and recognized as a corresponding output character.
In some embodiments, the user device receives additional handwritten strokes from the user after segmenting the currently accumulated plurality of handwritten strokes. The user device assigns the additional handwritten stroke to a corresponding one of the plurality of recognition units based on a spatial position of the additional handwritten stroke relative to the plurality of recognition units.
Attention is now directed to an exemplary user interface for providing handwriting recognition and input on a user device. In some embodiments, an exemplary user interface is provided on a user device based on a multi-script handwriting recognition model that provides real-time stroke order independent handwriting recognition of a user's handwriting input. In some embodiments, the exemplary user interface is a user interface of exemplary handwriting input interface 802 (e.g., shown in fig. 8A and 8B) that includes handwriting input area 804, candidate display area 804, and text input area 808. In some embodiments, exemplary handwriting input interface 802 also includes a plurality of control elements 1102, such as a delete button, a space bar, an enter button, a keyboard toggle button, and the like. One or more other areas and/or elements may be provided in handwriting input interface 802 to implement additional functionality described below.
As described herein, a multi-word handwriting recognition model can have a very large vocabulary of tens of thousands of characters in many different words and languages. Thus, for handwriting input, the recognition model will have a high probability of recognizing this large number of output characters, all of which have a considerable probability of being the character the user wishes to enter. On user devices with limited display area, it is advantageous to initially provide only a subset of the recognition results while keeping other results available at the time of the user's request.
11A-11G illustrate exemplary user interfaces for displaying a subset of recognition results in a normal view of a candidate display area, along with an affordance for invoking an expanded view of the candidate display area for displaying the remainder of the recognition results. In addition, within the expanded view of the candidate display area, the recognition results are classified into different categories and displayed on different tab pages of the expanded view.
Fig. 11A illustrates an exemplary handwriting input interface 802. The handwriting input interface includes handwriting input area 804, candidate display area 806, and text input area 808. One or more control elements 1102 are also included in handwriting input interface 1002.
As shown in fig. 11A, the candidate display area 806 optionally includes an area for displaying one or more recognition results and an affordance 1104 (e.g., an expansion icon) for invoking an expanded version of the candidate display area 806.
11A-11C illustrate that as a user provides one or more handwritten strokes (e.g., strokes 1106, 1108, and 1110) in handwriting input area 804, the user device recognizes and displays a corresponding set of recognition results corresponding to the current accumulated strokes in handwriting input area 804. As shown in fig. 11B, after the user inputs the first stroke 1106, the user device recognizes and displays three recognition results 1112, 1114, and 1116 (e.g., characters "/", "1", and ","). In some embodiments, a small number of candidate characters are displayed in the candidate display area 806 in order according to the recognition confidence associated with each character.
In some embodiments, the top ranked candidate result (e.g., "/") is tentatively displayed in text entry area 808, such as within block 1118. The user may optionally confirm that the top ranked candidate is the desired input with a simple confirmation input (e.g., pressing the "enter" key, or providing a double-tap gesture in the handwriting input area).
FIG. 11C illustrates that as the user enters two more strokes 1108 and 1110 in handwriting input area 804, additional strokes are rendered in handwriting input area 804 along with initial stroke 1106, and the candidate results are updated to reflect changes in the recognition units recognized from the current accumulated handwriting input, before the user has selected any candidate recognition results. Based on these three strokes, the user device has recognized a single recognition unit, as shown in FIG. 11C. Based on the single identification unit, the user equipment has identified and displayed several identification results 1118-. In some embodiments, one or more of the currently displayed recognition results (e.g., 1118 and 1122) in candidate display area 806 each represent a candidate character selected from a plurality of similar looking candidate characters of the current handwritten input.
As shown in fig. 11C-11D, upon selection of the affordance 1104 by a user (e.g., using a tap gesture with a contact 1126 over the affordance 1104), the candidate display region changes from a normal view (e.g., shown in fig. 11C) to an expanded view (e.g., shown in fig. 11D). In some embodiments, the expanded view shows all recognition results (e.g., candidate characters) that have been recognized for the current handwriting input.
In some embodiments, the normal view of the initially displayed candidate display area 806 shows only the most commonly used characters in the corresponding word or language, while the expanded view shows all candidate characters including the rarely used characters in one word or language. The expanded view of the candidate display area may be designed in different ways. 11D-11G illustrate exemplary designs of expanding candidate display regions according to some embodiments.
As shown in fig. 11D, in some embodiments, expanded candidate display area 1128 includes one or more tab pages (e.g., pages 1130, 1132, 1134, and 1136) that each present candidate characters of a respective category. The label design shown in FIG. 11D allows a user to quickly find the desired category of characters, and then find the characters they wish to enter in the corresponding label page.
In fig. 11D, the first tab page 1130 displays all candidate characters including common characters and uncommon characters that have been recognized for the currently accumulated handwriting input. As shown in fig. 11D, tab page 1130 includes all of the characters shown in initial candidate display area 806 in fig. 11C, as well as a number of additional characters (e.g., "a," "β," "towel," etc.) that are not included in initial candidate display area 806.
In some embodiments, the characters displayed in the initial candidate display area 806 include only characters from a set of commonly used characters associated with a word (e.g., all characters in a basic block of a CJK word encoded according to the Unicode standard). In some embodiments, the characters displayed in the extended candidate display area 1128 further include a set of infrequent characters associated with the word (e.g., all characters in an extended block of CJK words encoded according to the Unicode standard). In some embodiments, the expanded candidate display area 1128 further includes candidate characters from other words not commonly used by the user, such as greek words, arabic words, and/or emoji words.
In some embodiments, as shown in fig. 11D, the expanded candidate display area 1128 includes respective tab pages 1130, 1132, 1134, and 1138 that each correspond to a respective category of candidate characters (e.g., all characters, rare characters, characters from latin scripts, and characters from emoji scripts). 11E-11G illustrate that a user may select each of the different tab pages to reveal candidate characters of the corresponding category. Fig. 11E shows only rare characters (e.g., characters from an expanded block of CJK text) corresponding to the current handwriting input. Fig. 11F shows only latin letters or greek letters corresponding to the current handwriting input. Fig. 11G shows only the emoticon characters corresponding to the current handwriting input.
In some embodiments, the expanded candidate display area 1128 further includes one or more affordances to classify candidate characters in a respective tab page based on respective criteria (e.g., phonetic spelling based, stroke number based, radical based, etc.). The ability to classify candidate characters for each category according to criteria other than a recognition confidence score provides the user with the additional ability to quickly find a desired candidate character for text entry.
In some embodiments, FIGS. 11H-11K illustrate that candidate characters that are similar in appearance may be grouped and only representative characters from each group of similar in appearance candidate characters presented in the initial candidate display area 806. Since the multi-word recognition model described herein may produce many candidate characters that are nearly as good for a given handwritten input, the recognition model may not always eliminate one candidate at the expense of another that looks similar. On a device with a limited display area, displaying many similar looking candidates at once does not help the user to select the correct character, because subtle distinctions are not readily apparent, and even if the user is able to see the desired character, it may be difficult to select it from a very dense display using a finger or stylus.
In some embodiments, to address the above problem, the user equipment identifies candidate characters that are very similar to each other (e.g., according to an alphabetic index or dictionary of characters that look similar, or some image-based criteria) and groups them into respective groups. In some embodiments, one or more sets of similar appearing characters may be identified from a set of candidate characters for a given handwritten input. In some embodiments, the user device identifies a representative candidate character from a plurality of similarly appearing candidate characters in the same group and displays only the representative candidate in the initial candidate display area 806. If the common character does not look similar enough to any of the other candidate characters, it displays itself. In some embodiments, as shown in fig. 11H, the representative candidate characters (e.g., candidate characters 1118 and 1122, "pieces" and "T") for each group are displayed in a different manner (e.g., in a bold-line box) than the candidate characters (e.g., candidate characters 1120 and 1124, "nanny" and "J") that do not belong to any group. In some embodiments, the criteria for selecting a representative character of a group is based on the relative frequency of use of candidate characters in the group. In other embodiments, other criteria may be used.
In some embodiments, once the one or more representative characters are displayed to the user, the user may optionally expand the candidate display area 806 to display similarly appearing candidate characters in the expanded view. In some embodiments, selecting a particular representative character may result in an expanded view of only those candidate characters in the same group as the selected representative character.
Various designs for providing an expanded view of look-alike candidates are possible. 11H-11K illustrate an embodiment in which an expanded view of a representative candidate character (e.g., representative character 1118) is invoked by a predetermined gesture (e.g., an expansion gesture) detected above the representative candidate character. The predetermined gesture used to invoke the expanded view (e.g., an expand gesture) is different from the predetermined gesture used to select the representative character of the text input (e.g., a flick gesture).
As shown in fig. 11H-11I, when the user provides an expand gesture over the first representative character 1118 (e.g., as shown by the two contacts 1138 and 1140 moving away from each other), the area of the representative character 1118 is displayed and three similar appearing candidate characters (e.g., "ones", "and" towels ") are presented in the enlarged view (e.g., enlarged boxes 1142, 1144, and 1146, respectively) as compared to other candidate characters (e.g.," yian ") that are not in the same expanded group.
As shown in fig. 11I, the user may more easily see the subtle differences in the three similarly appearing candidate characters (e.g., "ones", "rows", and "towels") when presented in the enlarged view. If one of the three candidate characters is the intended character input, the user may select the candidate character, for example, by touching the area where the character is displayed. As shown in fig. 11J-11K, the user has selected (with contact 1148) the second character (e.g., "t") shown in box 1144 in the expanded view. In response, the selected character (e.g., "the") is entered into text entry area 808 at the insertion point indicated by the cursor. As shown in FIG. 11K, once a character is selected, the handwriting input in handwriting input area 804 and the candidate characters in candidate display area 806 (or an expanded view of the candidate display area) are cleared for subsequent handwriting input.
In some embodiments, if the user does not see the desired candidate character in the expanded view of the first representative candidate character 1142, the user may optionally use the same gesture to expand the other representative characters displayed in the candidate display area 806. In some embodiments, expanding another representative character in the candidate display area 806 automatically restores the currently presented expanded view to the normal view. In some embodiments, the user optionally uses a pinch gesture to restore the current expanded view to a normal view. In some embodiments, the user may scroll the candidate display area 806 (e.g., left to right) to reveal additional candidate characters not visible in the candidate display area 806.
12A-12B are flow diagrams of an exemplary process 1200 in which a first subset of recognition results are presented in an initial candidate display area and a second subset of recognition results are presented in an expanded candidate display area, both of which are hidden from view until specifically invoked by a user. In exemplary process 1200, the device identifies, for the handwriting input, a subset of recognition results from the plurality of handwriting recognition results for which the level of visual similarity exceeds a predetermined threshold. The user device then selects a representative recognition result from the subset of recognition results and displays the selected representative recognition result in the candidate display area of the display. The process 1200 is shown in fig. 11A-11K.
As shown in fig. 12A, in an example process 1200, a user device receives (1202) handwritten input from a user. The handwriting input includes one or more handwritten strokes (e.g., 1106, 1108, 1110 in fig. 11C) provided in a handwriting input area (e.g., 806 in fig. 11C) of a handwriting input interface (e.g., 802 in fig. 11C). The user device recognizes (1204), based on the handwriting recognition model, a plurality of output characters (e.g., the characters shown in the tabbed sheet 1130, fig. 11C) for handwriting input. The user device classifies (1206) the plurality of output characters into two or more categories based on a predetermined classification criterion. In some embodiments, the predetermined sorting criteria determine (1208) whether the respective character is a common character or an uncommon character.
In some embodiments, the user device displays (1210) a respective output character (e.g., a common character) of a first category of the two or more categories in an initial view of a candidate display area (e.g., 806 shown in fig. 11C) of the handwriting input interface, wherein the initial view of the candidate display area is provided concurrently with an affordance (e.g., 1104 in fig. 11C) for invoking an expanded view (e.g., 1128 in fig. 11D) of the candidate display area.
In some embodiments, the user device receives (1212) a user input to select an affordance for invoking the expanded view, for example, as shown in FIG. 11C. In response to the user input, the user device displays (1214) in the expanded view of the candidate display area respective output characters of a first category and respective output characters of at least a second category of the two or more categories that were not previously displayed in the initial view of the candidate display area, for example as shown in fig. 11D.
In some embodiments, the respective characters of the first category are characters found in a common character dictionary and the respective characters of the second category are characters found in a less common character dictionary. In some embodiments, the dictionary of common characters and the dictionary of uncommon characters are dynamically adjusted or updated based on a usage history associated with the user device.
In some embodiments, the user device identifies (1216) a set of characters from the plurality of output characters that are visually similar to each other according to a predetermined similarity criterion (e.g., based on a dictionary of similar characters or based on some spatially derived features). In some embodiments, the user device selects a representative character from a set of visually similar characters based on predetermined selection criteria (e.g., based on historical usage frequency). In some embodiments, the predetermined selection criteria is based on the relative frequency of use of the characters in the group. In some embodiments, the predetermined selection criteria is based on a preferred input language associated with the device. In some embodiments, the representative candidates are selected based on other factors indicating the likelihood that each candidate is the intended input of the user. For example, these factors include whether the candidate character belongs to text in a soft keyboard currently installed on the user device, or whether the candidate character is among a set of most frequently used characters in a particular language associated with the user or user device, and so forth.
In some embodiments, the user device displays (1220) the representative character (e.g., "ones") in the initial view of the candidate display area (e.g., 806 in fig. 11H) in place of the other characters (e.g., "takes," "towels") in the set of visually similar characters. In some embodiments, a visual indication (e.g., selective visual highlighting, special background) is provided in an initial view of the candidate display area to indicate whether each candidate character is a representative character in a group or a common candidate character that is not within any group. In some embodiments, the user device receives (1222), from the user, a predetermined expand input (e.g., an expand gesture) that relates to a representative character displayed in the initial view of the candidate display area, e.g., as shown in fig. 11H. In some embodiments, in response to receiving the predetermined expand input, the user device simultaneously displays (1224) a magnified view of the representative character and a corresponding magnified view of one or more other characters in the set of visually similar characters, e.g., as shown in fig. 11I.
In some embodiments, the predetermined expand input is an expand gesture detected over a representative character displayed in the candidate display area. In some embodiments, the predetermined expanded input is a contact detected above a representative character displayed in the candidate display area and lasting longer than a predetermined threshold time. In some embodiments, the persistent contact used to expand the group has a longer threshold duration than a flick gesture that selects a representative character for text input.
In some embodiments, each representative character is displayed concurrently with a respective affordance (e.g., a respective expansion button) to invoke an expanded view of a candidate character set whose appearance is similar. In some embodiments, the predetermined expansion input is a selection of a respective affordance associated with the representative character.
As described herein, in some embodiments, the vocabulary of the multi-word handwriting recognition model includes emoji words. The handwriting recognition module may recognize the emoticon characters based on the user's handwriting input. In some embodiments, the handwriting recognition module presents both the emoji character recognized directly from handwriting and the character or word in the natural human language representing the recognized emoji character. In some embodiments, the handwriting input module identifies a character or word in a natural human language based on the user's handwriting input and presents both the identified character or word and an emoji character corresponding to the identified character or word. In other words, the handwriting input module provides a way for entering emoji characters without switching from the handwriting input interface to the emoji keyboard. In addition, the handwriting input module also provides a way to input conventional natural language characters and words by hand-drawing emoticon characters. Fig. 13A-13E provide exemplary user interfaces for illustrating these different ways of entering emoji characters and conventional natural language characters.
Fig. 13A illustrates an exemplary handwriting input interface 802 invoked under a chat application. Handwriting input interface 802 includes handwriting input area 804, candidate display area 806, and text input area 808. In some embodiments, once the user is satisfied with the textual work in text entry area 808, the user may select to send the textual work to another participant of the current chat session. The conversation history of the chat session is shown in the conversation panel 1302. In this example, the user receives chat message 1304 (e.g., "Happy Birthday") displayed in dialog panel 1302”)。
As shown in FIG. 13B, the user provides handwritten input 1306 for the English word "Thanks" in handwriting input area 804. In response to handwriting input 1306, the user device identifies a number of candidate recognition results (e.g., recognition results 1308, 1310, and 1312). The top ranked recognition result 1303 has been tentatively entered into text entry area 808 within block 1314.
As shown in fig. 13C, after the user has entered the handwritten word "Thanks" in handwriting input area 806, the user then draws a stylized exclamation mark having stroke 1316 in handwriting input area 806 (e.g., an elongated circle with a circular circle below). The user device recognizes that the additional stroke 1316 forms a separate recognition unit from other recognition units previously recognized from the accumulated handwritten stroke 1306 in the handwriting input area 806. Based on the newly entered recognition unit (i.e., the recognition unit formed by strokes 1316), the user device uses a handwriting recognition model to recognize the emoji character (e.g., a stylized "!"). Based on this recognized emoticon character, the user device presents a first recognition result 1318 (e.g., "Thanks |" with a stylized "|") in the candidate display area 806. Furthermore, the user device also recognizes the number "8" which is also visually similar to the newly entered recognition unit. Based on this identified number, the user device presents a second recognition result 1322 (e.g., "Thanks 8") in candidate display area 806. Further, based on the identified emoji character (e.g., stylized "!"), the user device also identifies a regular character (e.g., regular character "!") corresponding to the emoji character. Based on this indirect recognition of the conventional character, the user device presents a third recognition result 1320 (e.g., "Thanks!" with conventional "!") in the candidate display area 806. At this time, the user may select any one of the candidate recognition results 1318, 1320, and 1322 and input it into the text input region 808.
As shown in FIG. 13D, the user continues to provide additional handwritten strokes 1324 in handwriting input area 806. This time, the process is repeated for each of the two runs,the user has drawn a cardioid symbol after the stylized exclamation point. In response to new handwritten stroke 1324, the user device recognizes that newly provided handwritten stroke 1324 forms yet another new recognition unit. Based on the new recognition unit, the user equipment recognizes the emoticon characterAnd, alternatively, the number "0" as a candidate character for a new recognition unit. Based on these new candidate characters identified from the new recognition units, the user device presents two updated candidate recognition results 1326 and 1330 (e.g., "Thanks"and" Thanks80 "). In some embodiments, the user device further identifies and associates the identified emoji character(s) (() Corresponding one or more regular characters or one or more words (e.g., "Love"). Based on the identified one or more regular characters or one or more words for the identified emoji character, the user device presents a third recognition result 1328 in which the identified one or more emoji characters are replaced with corresponding one or more regular characters or one or more words. As shown in FIG. 13D, in recognition result 1328, a normal exclamation point "! "replace an emoticon character And replacing the characters of the expression symbols with the conventional characters or words "Love。
As shown in FIG. 13E, the user has selected one of the candidate recognition results (e.g., showing the mixed text "Thanks"candidate 1326) and enters the text of the selected recognition result into the text entry area 808 and then sent to the other participants of the chat session. Message bubble 1332 shows the message text in dialog panel 1302.
Fig. 14 is a flow diagram of an exemplary process 1400 in which a user enters an emoji character using handwriting input. Fig. 13A-13E illustrate an exemplary process 1400 according to some embodiments.
In process 1400, the user device receives (1402) handwritten input from a user. The handwriting input includes a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface. In some embodiments, the user device identifies (1404) a plurality of output characters from the handwriting input based on a handwriting recognition model. In some embodiments, the output characters include at least a first emoji character from text of a natural human language (e.g., a stylized exclamation point)Or the emoticon character in FIG. 13D ) And at least a first character (e.g., a character from the word "Thanks" in fig. 13D). In some embodiments, the user device displays (1406) a recognition result (e.g., result 1326 in fig. 13D) that includes a first emoji character (e.g., a stylized exclamation point in fig. 13D) from text of a natural human language in a candidate display area of the handwriting input interfaceOr emoticon characters) And a first character (e.g., a character from the word "Thanks" in fig. 13D), for example, as shown in fig. 13D.
In some embodiments, based on the handwriting recognition model, the user device optionally recognizes (1408) at least a first semantic unit (e.g., the word "thanks") from the handwriting input, wherein the first semantic unit includes a respective character, word, or phrase capable of conveying a respective semantic meaning in a respective human language. In some embodiments, the user device identifies (1410) a second emoji character (e.g., "handshake" emoji character) associated with a first semantic unit (e.g., the word "Thanks") identified from the handwriting input. In some embodiments, the user device displays (1412) the second recognition result (e.g., showing a "hand sign" emoticon character and then showing it) in a candidate display area of the handwriting input interface Anda recognition result of an emoticon character) that includes at least a second emoticon character recognized from a first semantic unit (e.g., the word "Thanks"). In some embodiments, displaying the second recognition result further includes associating a third recognition result (e.g., recognition result "Thanks") that includes at least the first semantic unit (e.g., the word "Thanks")") while displaying the second recognition result.
In some embodiments, the user receives user input for selecting the first recognition result displayed in the candidate display area. In some embodiments, in response to the user input, the user device enters text of the selected first recognition result in a text entry area of the handwriting input interface, wherein the text includes at least a first emoji character and a first character from a word of natural human language. In other words, the user is able to enter mixed-text input using a single handwriting input (and nonetheless a handwriting input including multiple strokes) in the handwriting input area without switching between a natural language keyboard and an emoji symbol character keyboard.
In some embodiments, the handwriting recognition model has been trained on a multi-script training corpus that includes writing samples corresponding to characters of at least three non-overlapping scripts, and the three non-overlapping scripts include a set of emoji characters, chinese characters, and latin scripts.
In some embodiments, the user device identifies (1414) a first emoticon character that is recognized directly from the handwriting input (e.g.,emoji character) corresponding to a second semantic unit (e.g., the word "Love"). In some embodiments, the user device displays (1416) a fourth recognition result (e.g., 1328 in fig. 13D) in the candidate display area of the handwriting input interface, the fourth recognition result including at least a first emoji character (e.g., a second emoji character)Emoji character) is identified (e.g., the word "Love"). In some embodiments, the user device simultaneously displays the fourth recognition result (e.g., result 1328 "Thanks | Love") and the first recognition result (e.g., result "Thanks") in the candidate display area") as shown in fig. 13D.
In some embodiments, the user device allows the user to enter regular text by drawing emoticon characters. For example, if the user does not know how to spell the word "elephant," the user optionally draws a stylized emoticon character for "elephant" in the handwriting input area, and if the user device can correctly recognize the handwriting input as the emoticon character for "elephant," the user device optionally also presents the word "elephant" in normal text as one of the recognition results displayed in the candidate display area. In another example, a user may draw a stylized cat in the handwriting input area instead of writing the Chinese character "cat". If the user device identifies the emoji character for "cat" based on the handwritten input provided by the user, the user device optionally also presents the chinese character "cat" representing "cat" in chinese along with the emoji character for "cat" in the candidate identification results. By presenting normal text for the recognized emoji character, the user device provides an alternative way to enter complex characters or words using several stylized strokes typically associated with well-known emoji characters. In some embodiments, the user device stores a dictionary that links emoji characters with their corresponding normal text (e.g., characters, words, phrases, symbols, etc.) in one or more preferred words or languages (e.g., english or chinese).
In some embodiments, the user device identifies the emoticon characters based on visual similarity of the emoticon characters to images generated from the handwritten input. In some embodiments, to enable recognition of emoji characters from handwriting input, a handwriting recognition model for use on a user device is trained using a training corpus that includes both handwriting samples corresponding to characters of a script of a natural human language and handwriting samples corresponding to a set of artificially designed emoji characters. In some embodiments, emoticon characters associated with the same semantic concept may have different appearances when used in mixed input of text having different natural languages. For example, the emoticon character for the semantic concept of "Love" may be a "heart" emoticon character when rendered with normal text in one natural language (e.g., japanese) and may be an emoticon character of "kiss" when rendered with normal text in another natural language (e.g., english or french).
As described herein, in performing recognition on multi-character handwriting input, a handwriting input module performs segmentation of the currently accumulated handwriting input in a handwriting input area and separates the accumulated strokes into one or more recognition units. One of the parameters used to determine how to segment the handwritten input may be the manner in which the strokes are clustered in the handwritten input region and the distance between different clusters of strokes. Because people have different writing styles. Some people tend to write very sparsely with large distances between strokes or between different parts of the same character, while others tend to write very closely with very small distances between strokes or different characters. Even for the same user, the handwritten character may deviate from a balanced appearance due to imperfections in the plan and may be tilted, stretched or squeezed in different ways. As described herein, the multi-word handwriting recognition model provides recognition independent of stroke order, and thus, a user may write a character or portion of a character out of order. Thus, it is difficult to obtain spatial uniformity and balance of handwriting input between characters.
In some embodiments, the handwriting input model described herein provides a way for a user to inform the handwriting input module whether to merge two adjacent recognition units into a single recognition unit or to separate a single recognition unit into two separate recognition units. With the help of the user, the handwriting input module may correct the initial segmentation and generate the results desired by the user.
15A-15J illustrate some exemplary user interfaces and processes in which a user provides predetermined pinch and expand gestures to modify a recognition unit recognized by a user device.
15A-15B, a user has entered a plurality of handwritten strokes 1502 (e.g., three strokes) in handwriting input area 806 of handwriting input interface 802. The user device has recognized a single recognition unit based on the currently accumulated handwritten strokes 1502 and presented three candidate characters 1504, 1506, and 1508 (e.g., "towel," "center," and "coin," respectively) in the candidate display area 806.
FIG. 15C shows a user entering a number of additional strokes 1510 to the right of the initial handwritten stroke 1502 in handwriting input area 606. The user device determines (e.g., based on the size and spatial distribution of the plurality of strokes 1502 and 1510) that strokes 1502 and 1510 should be considered as two separate recognition units. Based on the division of the recognition units, the user equipment provides input images of the first recognition unit and the second recognition unit to the handwriting recognition model, and obtains two groups of candidate characters. The user device then generates a plurality of recognition results (e.g., 1512, 1514, 1516, and 1518) based on different combinations of the recognized characters. Each recognition result includes the recognized character of the first recognition unit and the recognized character of the second recognition unit. As shown in fig. 15C, each of the plurality of recognition results 1512, 1514, 1516, and 1518 includes two recognized characters.
In this example, assume that the user actually wishes to recognize the handwritten input as a single character, but is not intended to leave too much space between the left portion (e.g., the left radical "towel") and the right portion (e.g., the right radical "top") of the handwritten character (e.g., "cap"). Having seen the results (e.g., 1512, 1514, 1516, and 1518) presented in the candidate display area 806, the user will realize that the user device incorrectly segmented the current handwriting input into two recognition units. Although the segmentation may be based on objective criteria, it is undesirable for the user to delete the current handwritten input and re-rewrite the entire character with the smaller distance left between the left and right portions.
Instead, as shown in FIG. 15D, the user uses a pinch gesture over two clusters of handwritten strokes 1502 and 1510 to indicate to the handwriting input module that the two recognition units recognized by the handwriting input module should be merged into a single recognition unit. The pinch gesture is represented by two contacts 1520 and 1522 on the touch-sensitive surface that are adjacent to each other.
FIG. 15E illustrates that in response to a user's pinch gesture, the user device has corrected the segmentation of the currently accumulated handwritten input (e.g., strokes 1502 and 1510) and merged the handwritten strokes into a single recognition unit. As shown in FIG. 15E, the user device provides an input image to the handwriting recognition model based on the modified recognition unit and obtains three new candidate characters 152 for the modified recognition unit 4. 1526 and 1528 (e.g., "cap," "women," and). In some embodiments, as shown in fig. 15E, the user device optionally adjusts the rendering of the handwriting input in handwriting input area 806, thereby reducing the distance between the left and right clusters of handwritten strokes. In some embodiments, the user device does not change the rendering of the handwriting input shown in the handwriting input area 608 in response to the pinch gesture. In some embodiments, the user device distinguishes a pinch gesture from an input stroke based on two simultaneous contacts (as opposed to one single contact) detected in handwriting input area 806.
As shown in FIG. 15F, the user enters two additional strokes 1530 to the right of the previously entered handwritten input (i.e., strokes of the character "cap"). The user device determines that the newly entered stroke 1530 is a new recognition unit and identifies candidate characters (e.g., "children") for the newly recognized recognition unit. The user device then combines the newly recognized character (e.g., "child") with the candidate characters of the recognition unit recognized earlier and presents several different recognition results (e.g., results 1532 and 1534) in the candidate display area 806.
After handwriting the stroke 1530, the user continues to write more strokes 1536 (e.g., three other strokes) to the right of the stroke 1530, as shown in FIG. 15G. Because the horizontal distance between stroke 1530 and stroke 1536 is small, the user device determines that stroke 1530 and stroke 1536 belong to the same recognition unit and provides the input image formed by strokes 1530 and 1536 to the handwriting recognition model. The handwriting recognition model recognizes three different candidate characters for the revised recognition units and generates two revised recognition results 1538 and 1540 for the current accumulated handwriting input.
In this example, it is assumed that the last two sets of strokes 1530 and 1536 are actually intended as two separate characters (e.g., "children" and "+ -). After the user sees that the user device has incorrectly combined the two sets of strokes 1530 and 1536 into a single recognition unit, the user continues to provide an expand gesture to inform the user device that the two sets of strokes 1530 and 1536 should be separated into two separate recognition units. As shown in FIG. 15H, the user makes two contacts 1542 and 1544 near strokes 1530 and 1536, and then moves the two contacts away from each other in a generally horizontal direction (i.e., in the default writing direction).
FIG. 15I illustrates that in response to a user's expand gesture, the user device modifies the previous segmentation of the currently accumulated handwritten input and assigns a stroke 1530 and a stroke 1536 to two consecutive recognition units. Based on the input images generated for the two independent recognition units, the user device identifies one or more candidate characters for the first recognition unit based on the stroke 1530 and one or more candidate characters for the second recognition unit based on the stroke 1536. The user device then generates two new recognition results 1546 and 1548 based on different combinations of the recognized characters. In some embodiments, the user device optionally modifies the rendering of the strokes 1536 and 1536 to reflect the division of the previously identified recognition units.
As shown in fig. 15J-15K, the user selects (as indicated by contact 1550) one of the candidate recognition results displayed in candidate display area 806, and the selected recognition result (e.g., result 1548) has been entered in text entry area 808 of the user interface. After entering the selected recognition result into text entry area 808, both candidate display area 806 and handwriting entry area 804 are cleared and ready to display subsequent user input.
Fig. 16A-16B are flow diagrams of an exemplary process 1600 in which a user uses predetermined gestures (e.g., pinch gestures and/or expand gestures) to inform a handwriting input module how to segment or correct an existing segmentation of a current handwriting input. Fig. 15J and 15K provide examples of an exemplary process 1600 according to some embodiments.
In some embodiments, a user device receives (1602) handwritten input from a user. The handwritten input includes a plurality of handwritten strokes provided in a touch-sensitive surface coupled to the device. In some embodiments, the user device renders (1604) the plurality of handwritten strokes in real-time in a handwriting input area (e.g., handwriting input area 806 of fig. 15A-15K) of the handwriting input interface. The user device receives one of a pinch gesture input and a spread gesture input over the plurality of handwritten strokes, for example as shown in fig. 15D and 15H.
In some embodiments, upon receiving the pinch gesture input, the user device generates (1606) a first recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as a single recognition unit (e.g., as shown in fig. 15C-15E).
In some embodiments, when an expand gesture input is received, the user device generates (1608) a second recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as two independent recognition units pulled apart by the expand gesture input (e.g., as shown in fig. 15G-15I).
In some embodiments, upon generating a respective one of the first recognition result and the second recognition result, the user device displays the generated recognition result in a candidate display area of the handwriting input interface, e.g., as shown in fig. 15E and 15I.
In some embodiments, the pinch gesture input includes two simultaneous contacts on the touch-sensitive surface that are close to each other in an area occupied by the plurality of handwritten strokes. In some embodiments, the expand gesture input includes two simultaneous contacts on the touch-sensitive surface that are separated from each other in an area occupied by the plurality of handwritten strokes.
In some embodiments, the user device identifies (e.g., 1614) two adjacent recognition units from the plurality of handwritten strokes. The user device displays (1616) initial recognition results (e.g., results 1512, 1514, 1516, and 1518 in fig. 15C) in the candidate display area that include respective characters recognized from two adjacent recognition units, for example as shown in fig. 15C. In some embodiments, while displaying the first recognition result (e.g., results 1524, 1526, or 1528 in fig. 15E) in response to the pinch gesture, the user device replaces (1618) the initial recognition result with the first recognition result in the candidate display area. In some embodiments, the user device receives (1620) a pinch gesture input while displaying the initial recognition result in the candidate display area, as shown in fig. 15D. In some embodiments, in response to the pinch gesture input, the user device re-renders (1622) the plurality of handwritten strokes to decrease a distance between two adjacent recognition units in the handwriting input area, e.g., as shown in fig. 15E.
In some embodiments, the user device identifies (1624) a single recognition unit from a plurality of handwritten strokes. The user device displays (1626) an initial recognition result (e.g., result 1538 or 1540 of fig. 15G) including the character recognized (e.g., "let" "be") from the single recognition unit in the candidate display area. In some embodiments, while displaying the second recognition result (e.g., result 1546 or 1548 in fig. 15I) in response to the expand gesture, the user device replaces (1628) the initial recognition result (e.g., result 1538 or 1540) with the second recognition result (e.g., result 1546 or 1548) in the candidate display region, e.g., as shown in fig. 15H-15I. In some embodiments, the user device receives (1630) the expand gesture input while displaying the initial recognition result in the candidate display region, as shown in fig. 15H. In some embodiments, in response to the expand gesture input, the user device re-renders (1632) the plurality of handwritten strokes to increase a distance between a first subset of strokes assigned to the first recognition unit and a second subset of handwritten strokes assigned to the second recognition unit in the handwriting input area, as shown in fig. 15H and 15I.
In some embodiments, immediately after the user provides the strokes and realizes that the strokes may be too discrete to be properly segmented based on standard segmentation processes, the user optionally provides a pinch gesture to inform the user device to process the multiple strokes as a single recognition unit. The user device may distinguish a pinch gesture from a normal stroke based on the simultaneous presence of two contacts in the pinch gesture. Similarly, in some embodiments, immediately after the user provides the strokes and realizes that the strokes may be too dense to be properly segmented based on the standard segmentation process, the user optionally provides an expand gesture to inform the user device to process the multiple strokes as two independent recognition units. The user device may distinguish the expand gesture from the normal stroke based on the simultaneous presence of two contacts in the pinch gesture.
In some embodiments, the direction of motion of the pinch gesture or the expand gesture is optionally used to provide additional guidance on how to split the stroke under the gesture. For example, if multiple lines of handwriting input are enabled for a handwriting input area, a pinch gesture in which two contacts move in a vertical direction may notify the handwriting input module to merge two recognition units recognized in two adjacent lines into a single recognition unit (e.g., as an upper radical and a lower radical). Similarly, a spread gesture in which two contacts move in a vertical direction may inform the handwriting input module to divide a single recognition unit into two recognition units in two adjacent rows. In some embodiments, pinch gestures and expand gestures may also provide segmentation guidance in sub-portions of the character input, such as merging two sub-portions in different portions of a compound character (e.g., upper, lower, left, or right portions) or dividing a single component in a compound character (abuse, restricted, wrong, bornan, Xin; etc.). This is particularly helpful in recognizing complex compound chinese characters, as users tend to lose the correct proportions and balance when handwriting complex compound characters. For example, adjusting the scale and balance of the handwritten input after the handwritten input is completed by a pinch gesture and a spread gesture may be particularly helpful for the user to input the correct characters without having to make several attempts to achieve the correct scale and balance.
As described herein, the handwriting input module allows a user to enter multi-character handwriting input and to allow for unordered strokes of the multi-character handwriting input within characters, between characters, and even between phrases, sentences, and/or rows in a handwriting input area. In some embodiments, the handwriting input module further provides character-by-character deletion in the handwriting input area, wherein the sequence of character deletion is opposite the writing direction and is independent of when strokes of each character are provided in the handwriting input area. In some embodiments, the deletion of each recognition unit (e.g., character or radical) in the handwriting input area is optionally performed on a stroke-by-stroke basis, with the deletion being in reverse chronological order in which the strokes are provided within the recognition unit. 17A-17H illustrate exemplary user interfaces for responding to a delete input from a user and providing character-by-character deletion in a multi-character handwriting input.
As shown in FIG. 17A, a user has provided a plurality of handwritten strokes 1702 in handwriting input area 804 of handwriting input interface 802. Based on the current accumulated stroke 1702, the user device presents three recognition results (e.g., results 1704, 1706, and 1708) in candidate display area 806. As shown in FIG. 17B, the user provides an additional plurality of strokes 1710 in handwriting input area 806. The user device recognizes the three new output characters and replaces the three previous recognition results 1704, 1706 and 1708 with the three new recognition results 1712, 1714 and 1716. In some embodiments, as shown in FIG. 17B, even if the user device has recognized two separate recognition units (e.g., stroke 1702 and stroke 1710) from the current handwriting input, the cluster of strokes 1710 will not correspond well to any known character in the vocabulary of the handwriting recognition module. Thus, candidate characters (e.g., "acre", "murder") identified in the recognition unit including strokes 1710 each have a recognition confidence below a predetermined threshold. In some embodiments, the user device presents a partial recognition result (e.g., result 1712) that includes only the candidate characters for the first recognition unit (e.g., "day"), and does not include any candidate characters for the second recognition unit in the candidate display area 806. In some embodiments, the user device also displays a complete recognition result (e.g., results 1714 or 1716) including candidate characters for both recognition units, regardless of whether the recognition confidence exceeds a predetermined threshold. A partial recognition result is provided to inform the user which part of the handwritten input needs to be corrected. Furthermore, the user may also choose to first enter the correctly recognized portion of the handwritten input and then overwrite the incorrectly recognized portion.
Fig. 17C shows the user continuing to provide additional handwritten strokes 1718 to the left of stroke 1710. Based on the relative position and distance of strokes 1718, the user device determines that the newly added stroke belongs to the same recognition unit as the cluster of handwritten strokes 1702. Based on the revised recognition units, a new character (e.g., "electricity") is recognized for the first recognition unit, and a new set of recognition results 1720, 1722, and 1724 is generated. Likewise, first recognition result 1720 is a partial recognition result because none of the candidate characters recognized for stroke 1710 meets a predetermined confidence threshold.
FIG. 17D shows that the user is now entering a number of new strokes 1726 between stroke 1702 and stroke 1710. The user device assigns newly entered stroke 1726 to the same recognition unit as stroke 1710. The user has now finished entering all handwritten strokes for two Chinese characters (e.g., "computer"), and the correct recognition result 1728 is displayed in the candidate display area 806.
FIG. 27E shows that the user has entered the initial portion of the delete input, for example, by making a light contact 1730 on the delete button 1732. If the user remains in contact with the delete button 1732, the user can delete the current handwritten input character by character (or recognition unit by recognition unit). Deletion is not performed for all handwriting inputs at the same time.
In some embodiments, when the user's finger first touches the delete button 1732 on the touch-sensitive screen, the last recognition cell (e.g., for the character "brain") in the default writing direction (e.g., left to right) is visually highlighted (e.g., highlighting the border 1734, or highlighting the background, etc.) relative to one or more other recognition cells concurrently displayed within the handwriting input area 804, as shown in fig. 17E.
In some embodiments, when the user device detects that the user remains in contact 1730 on the delete button 1732 for more than a threshold duration of time, the user device removes the highlighted recognition unit from the handwriting input area 806 (e.g., in block 1734), as shown in fig. 17F. Further, the user equipment also revises the recognition result displayed in the candidate display area 608 to delete any output character generated based on the deleted recognition unit, as shown in fig. 17F.
Fig. 17F also shows that if the user continues to keep touching 1730 on the delete button 1732 after deleting the last recognition unit (e.g., the recognition unit for the character "brain") in the handwriting input area 806, the recognition unit adjacent to the deleted recognition unit (e.g., the recognition unit for the character "electricity") becomes the next recognition unit to be deleted. As shown in fig. 17F, the remaining recognition units become visually highlighted recognition units (e.g., in block 1736) and are ready to be deleted. In some embodiments, visually highlighting the recognition unit provides a preview of the recognition unit that would be deleted if the user continued to remain in contact with the delete button. If the user discontinues contact with the delete button before the threshold duration is reached, the visual highlighting is removed from the last recognition element and the recognition element is not deleted. Those skilled in the art will recognize that the duration of contact is reset after each deletion of an identification cell. Further, in some embodiments, the threshold duration is optionally adjusted using the contact intensity (e.g., the pressure with which the user applies contact 1730 with the touch-sensitive screen) to confirm the user's intent to delete the currently highlighted recognition unit. Fig. 17F and 17G show that the user has interrupted the contact 1730 on the delete button 1732 before the threshold duration is reached and the recognition unit for the character "electronic" is left in the handwriting input area 806. When the user has selected (e.g., as indicated by contact 1740) the first recognition result (e.g., result 1738) for the recognition unit, the text in the first recognition result 1738 is entered into the text entry area 808, as shown in fig. 17G-17H.
18A-18B are flow diagrams of an exemplary process 1800 in which a user device provides character-by-character deletion in multi-character handwriting input. In some embodiments, deletion of the handwritten input is performed before the characters recognized from the handwritten input have been confirmed and entered into the text entry area of the user interface. In some embodiments, deleting characters in the handwritten input is performed according to a reverse spatial order of recognition units recognized from the handwritten input, and is independent of the temporal order in which the recognition units are formed. Fig. 17A-17H illustrate an exemplary process 1800 according to some embodiments.
As shown in fig. 18A, in an exemplary process 1800, a user device receives 1802 handwritten input from a user that includes a plurality of handwritten strokes provided in a handwriting input area (e.g., area 804 of fig. 17D) of a handwriting input interface. The user device identifies (1804) a plurality of recognition units from the plurality of handwritten strokes, each recognition unit including a respective subset of the plurality of handwritten strokes. For example, as shown in FIG. 17D, the first recognition unit includes strokes 1702 and 1718, and the second recognition unit includes strokes 1710 and 1726. The user device generates (1806) a multi-character recognition result (e.g., result 1728 in FIG. 17D) including respective characters recognized from the plurality of recognition units. In some embodiments, the user device displays the multi-character recognition result (e.g., result 1728 of FIG. 17D) in a candidate display area of the handwriting input interface. In some embodiments, while displaying the multi-character recognition results in the candidate display area, the user device receives (1810) a deletion input from the user (e.g., a contact 1730 on a delete button 1732), as shown in fig. 17E. In some embodiments, in response to receiving the deletion input, the user device removes (1812) an end character (e.g., the character "brain" appearing at the end of the spatial sequence "computer") from the multi-character recognition result (e.g., result 1728) displayed in the candidate display area (e.g., candidate display area 806), e.g., as shown in fig. 17E-17F.
In some embodiments, the user device renders (1814) the plurality of handwritten strokes in real-time in a handwriting input area of the handwriting input interface as the plurality of handwritten strokes are provided by the user in real-time, for example, as shown in fig. 17A-17D. In some embodiments, in response to receiving the deletion input, the user device removes (1816) a respective subset of the plurality of handwritten strokes from the handwriting input area (e.g., handwriting input area 804 in fig. 17E), the respective subset of the plurality of handwritten strokes corresponding to an end recognition unit in a spatial sequence formed by the plurality of recognition units in the handwriting input area (e.g., a recognition unit containing strokes 1726 and 1710). The last recognition unit corresponds to the last character (e.g., the character "brain") in the multi-character recognition result (e.g., the result 1728 in fig. 17E).
In some embodiments, the end recognition unit does not include (1818) a temporally last handwritten stroke of the plurality of handwritten strokes provided by the user. For example, if the user provided stroke 1718 after they provided strokes 1726 and 1710, the last recognition unit that included strokes 1726 and 1710 would still be deleted first.
In some embodiments, in response to receiving the initial portion of the deletion input, the user device visually distinguishes (1820) the end recognition unit from other recognition units recognized in the handwriting input area, e.g., as shown in fig. 17E. In some embodiments, the initial portion of the deletion input is (1822) an initial contact detected on a deletion button in the handwriting input interface, and the deletion input is detected when the initial contact is sustained for more than a predetermined threshold amount of time.
In some embodiments, the end recognition unit corresponds to a handwritten chinese character. In some embodiments, the handwritten input is written in a cursive writing style. In some embodiments, the handwritten input corresponds to a plurality of chinese characters written in a cursive writing style. In some embodiments, at least one of the handwritten strokes is divided into two adjacent recognition units of the plurality of recognition units. For example, sometimes a user may use a long stroke that extends into multiple characters, and in such cases, a segmentation module of the handwriting input module optionally divides the long stroke into several recognition units. When handwriting deletion is performed character-by-character (or recognition unit-by-recognition unit), only one section of a long stroke (e.g., a section within a corresponding recognition unit) is deleted at a time.
In some embodiments, the deletion input is (1824) a contact maintained on a deletion button provided in the handwriting input interface, and removing the respective subset of the plurality of handwritten strokes further includes removing the subset of handwritten strokes in the end recognition unit from the handwriting input area on a stroke-by-stroke basis in an order opposite of a chronological order in which the subset of handwritten strokes has been provided by the user.
In some embodiments, the user device generates (1826) a partial recognition result comprising a subset of respective characters recognized from the plurality of recognition units, wherein each character in the subset of respective characters satisfies a predetermined confidence threshold, for example as shown in fig. 17B and 17C. In some embodiments, the user device displays (1828) a portion of the recognition results (e.g., results 1712 in FIG. 17B and results 1720 in FIG. 17C) concurrently with the multi-character recognition results (e.g., results 1714 and 1722) in the candidate display area of the handwriting input interface.
In some embodiments, the partial recognition result does not include at least the last character of the multi-character recognition result. In some embodiments, the partial recognition result does not include at least the initial character in the multi-character recognition result. In some embodiments, the partial recognition result does not include at least an intermediate character in the multi-character recognition result.
In some embodiments, the smallest unit deleted is the radical, and one radical of the handwritten input is deleted at a time whenever the radical happens to be the last recognized unit in the handwritten input that remains in the handwritten input area.
As described herein, in some embodiments, a user device provides a horizontal writing mode and a vertical writing mode. In some embodiments, the user device allows the user to enter text in one or both of a left-to-right writing direction and a right-to-left direction in a horizontal writing mode. In some embodiments, the user device allows the user to enter text in one or both of a top-to-bottom writing direction and a bottom-to-top direction in a vertical writing mode. In some embodiments, the user device provides various affordances (e.g., writing modes or writing direction buttons) on the user interface to invoke a corresponding writing mode and/or writing direction for the current handwriting input. In some embodiments, the text input direction in the text input area is by default the same as the handwriting input direction in the handwriting input direction. In some embodiments, the user device allows the user to manually set the input direction in the text input area and the writing direction in the handwriting input area. In some embodiments, the text display direction in the candidate display area is by default the same as the handwriting input direction in the handwriting input area. In some embodiments, the user device allows the user to manually set the text display direction in the text entry area regardless of the handwriting input direction in the handwriting entry area. In some embodiments, the user device associates a writing mode and/or writing direction of the handwriting input interface with a corresponding device orientation, and a change in the device orientation automatically triggers a change in the writing mode and/or writing direction. In some embodiments, the change in writing direction automatically causes the top ranked recognition result to be entered into the text entry area.
19A-19F illustrate exemplary user interfaces of a user device that provides both a horizontal input mode and a vertical input mode.
Fig. 19A shows the user equipment in the horizontal input mode. In some embodiments, a horizontal input mode is provided when the user device is in a landscape orientation, as shown in fig. 19A. In some embodiments, a horizontal input mode is optionally associated with and provided when the device is operated in a portrait orientation. The association between device orientation and writing mode may be different in different applications.
In the horizontal input mode, the user may provide handwritten characters in a horizontal writing direction (e.g., a default writing direction from left to right, or a default writing direction from right to left). In the horizontal input mode, the user device segments the handwritten input into one or more recognition units in a horizontal writing direction.
In some embodiments, the user device only allows a single line of input in the handwriting input area. In some embodiments, as shown in FIG. 19A, the user device allows for multiple line input (e.g., two line input) in the handwriting input area. In FIG. 19A, a user has provided a plurality of strokes in several lines in handwriting input area 806. Based on the order in which the user has provided the plurality of handwritten strokes and the relative positions and distances between the plurality of handwritten strokes, the user device determines that the user has entered two lines of characters. After dividing the handwritten input into two separate rows, the device determines one or more recognition units within each row.
As shown in fig. 19A, the user device recognizes a corresponding character for each recognition unit recognized in the current handwriting input 1902, and generates several recognition results 1904 and 1906. As further shown in fig. 19A, in some embodiments, if the output character (e.g., the letter "I") priority for a particular group of recognition units (e.g., recognition units formed from initial strokes) is low, the user device optionally generates a partial recognition result (e.g., result 1906) that shows only output characters with sufficient recognition confidence. In some embodiments, the user may be aware from the partial recognition result 1906 that the first stroke may be corrected or independently deleted or rewritten for the recognition model to produce the correct recognition result. In this particular example, the first recognition unit does not have to be edited, as the first recognition unit 1904 does show the desired recognition result for the first recognition unit.
In this example, as shown in fig. 19A-19B, the user rotates the device to a portrait orientation (e.g., as shown in fig. 19B). In response to the change in the orientation of the device, the handwriting input interface is changed from the horizontal input mode to the vertical input mode, as shown in fig. 19B. In the vertical input mode, the layout of the handwriting input area 804, the candidate display area 806, and the text input area 808 may be different from that shown in the horizontal input mode. The particular layout of the horizontal input mode and the vertical input mode may be varied to accommodate different device shapes and application requirements. In some embodiments, with the device orientation rotated and the input mode changed, the user device automatically enters the top ranked result (e.g., result 1904) into the text entry area 808 as the text entry 1910. The orientation and position of the cursor 1912 also reflects changes in the input mode and writing direction.
In some embodiments, a change in input mode is optionally triggered by the user touching a particular input mode selection affordance 1908. In some embodiments, the input mode selection affordance is a graphical user interface element that also shows a current writing mode, a current writing direction, and/or a current paragraph direction. In some embodiments, the input mode selection affordance may cycle through all available input modes and writing directions provided by handwriting input interface 802. As shown in FIG. 19A, affordance 1908 shows that the current input mode is a horizontal input mode, with the writing direction from left to right and the paragraph direction from top to bottom. In FIG. 19B, the affordance 1908 shows that the current input mode is a vertical input mode, where the writing direction is from top to bottom, and the paragraph direction is from right to left. Other combinations of writing directions and paragraph directions are also possible according to various embodiments.
As shown in FIG. 19C, the user has entered a plurality of new strokes 1914 (e.g., handwritten strokes for the two Chinese characters "Chunxiao") in handwriting input area 804 in the vertical input mode. Handwriting input is written in a vertical writing direction. The user device divides the handwriting input in the vertical direction into two recognition units, and displays two recognition units 1916 and 1918 each including two recognition characters arranged in the vertical direction.
Fig. 19C-19D illustrate that when the user selects a displayed recognition result (e.g., result 1916), the selected recognition result is entered into the text entry area 808 in the vertical direction.
19E-19F show additional lines of handwritten input 1920 having been entered by the user in a vertical writing direction. The rows extend from left to right according to the paragraph direction of traditional chinese character writing. In some embodiments, candidate display area 806 also shows recognition results (e.g., results 1922 and 1924) in the same writing direction and paragraph direction as the handwriting input area. In some embodiments, other writing directions and paragraph directions may be provided by default, depending on the primary language associated with the user device or the language of the soft keyboard installed on the user device (e.g., arabic, chinese, japanese, english, etc.).
19E-19F illustrate when a user has selected a recognition result (e.g., result 1922), entering text for the selected recognition result into text entry area 808. As shown in fig. 19F, the current text input in the text input area 808 thus includes text written in the horizontal mode with a writing direction from left to right and text written in the vertical mode with a writing direction from top to bottom. The paragraph direction for horizontal text is from top to bottom, while the paragraph direction for vertical text is from right to left.
In some embodiments, the user device allows the user to independently establish a preferred writing direction, paragraph direction, for each of handwriting input area 804, candidate display area 806, and text input area 808. In some embodiments, the user device allows the user to independently establish a preferred writing direction and paragraph direction for each of handwriting input area 804, candidate display area 806, and text input area 808 to associate with each device orientation.
20A-20C are flow diagrams of an exemplary process 2000 for changing text input directions and handwriting input directions of a user interface. Fig. 19A-19F illustrate a process 2000 according to some embodiments.
In some embodiments, a user device determines (2002) an orientation of the device. The orientation of the device and changes in the orientation of the device may be detected by an accelerometer and/or other orientation sensing elements in the user device. In some embodiments, the user device provides (2004) a handwriting input interface on the device in a horizontal input mode in accordance with the device being in a first orientation. A respective line of handwriting input entered in the horizontal input mode is divided into one or more respective recognition units along the horizontal writing direction. In some embodiments, the device provides (2006) a handwriting input interface on the device in the vertical input mode in accordance with the device being in the second orientation. A respective line of handwriting input entered in the vertical input mode is divided into one or more respective recognition units along the vertical writing direction.
In some embodiments, when operating in the horizontal input mode (2008): the device detects (2010) a change in the orientation of the device from a first orientation to a second orientation. In some embodiments, in response to a change in the orientation of the device, the device switches (2012) from the horizontal input mode to the vertical input mode. This is shown, for example, in fig. 19A-19B. In some embodiments, operating in the vertical input mode (2014): the user device detects (2016) a change in the orientation of the device from the second orientation to the first orientation. In some embodiments, the user device switches (2018) from a vertical input mode to a horizontal input mode in response to a change in the orientation of the device. In some embodiments, the association between device orientation and input mode may be reversed from that described above.
In some embodiments, when operating in the horizontal input mode (2020): the user device receives (2022) a first multi-word handwriting input from a user. In response to the first multi-word handwriting input, the user device presents (2024) the first multi-word recognition result in a candidate display area of the handwriting input interface according to a horizontal writing direction. This is shown, for example, in fig. 19A. In some embodiments, when operating in the vertical input mode (2026): the user device receives (2028) a second multi-word handwriting input from the user. In response to the second multi-word handwriting input, the user device presents (2030) a second multi-word recognition result in the candidate display area according to the vertical writing direction. This is shown, for example, in fig. 19C and 19E.
In some embodiments, the user device receives (2032) a first user input for selecting a first multi-word recognition result, for example as shown in fig. 19A-19B, where the selection is made implicitly with an input for changing the input direction (e.g., rotating the device or selecting the affordance 1908). The user device receives (2034) a second user input for selecting a second multi-word recognition result, e.g., as shown in fig. 19C or fig. 19E. The user device currently displays (2036) respective texts of the first and second multi-word recognition results in a text entry area of the handwriting input interface, wherein the respective texts of the first multi-word recognition result are displayed according to a horizontal writing direction and the respective texts of the second multi-word recognition result are displayed according to a vertical writing direction. This is shown, for example, in the text entry area 808 of fig. 19F.
In some embodiments, the handwriting input area accepts multiple lines of handwriting input in a horizontal writing direction and has a default top-to-bottom paragraph direction. In some embodiments, the horizontal writing direction is from left to right. In some embodiments, the horizontal writing direction is from right to left. In some embodiments, the handwriting input area accepts multiple lines of handwriting input in a vertical writing direction and has a default left-to-right paragraph direction. In some embodiments, the handwriting input area accepts multiple lines of handwriting input in a vertical writing direction and has a default right-to-left paragraph direction. In some embodiments, the vertical writing direction is from top to bottom. In some embodiments, the first orientation is a landscape orientation by default, and the second orientation is a portrait orientation by default. In some embodiments, the user device provides a corresponding affordance in the handwriting input interface for manually switching between a horizontal input mode and a vertical input mode regardless of device orientation. In some embodiments, the user device provides a corresponding affordance in the handwriting input interface for manually switching between two selectable writing directions. In some embodiments, the user device provides a corresponding affordance in the handwriting input interface for manually switching between two selectable paragraph orientations. In some embodiments, the affordance is a toggle button that rotates through each possible combination of input direction and paragraph direction when invoked one or multiple times in succession.
In some embodiments, the user device receives (2038) handwritten input from the user. The handwriting input includes a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface. In response to the handwriting input, the user device displays (2040) one or more recognition results in a candidate display area of the handwriting input interface. While displaying the one or more recognition results in the candidate display area, the user device detects (2042) a user input for switching from the current handwriting input mode to an alternative handwriting input mode. In response to user input (2044): the user device switches (2046) from the current handwriting input mode to an alternative handwriting input mode. In some embodiments, the user device clears (2048) the handwriting input from the handwriting input area. In some embodiments, the user device automatically inputs (2050) a top-ranked recognition result of the one or more recognition results displayed in the candidate display area into a text entry area of the handwriting input interface. This is shown, for example, in fig. 19A-19B, where the current handwriting input mode is a horizontal input mode and the alternative handwriting input mode is a vertical input mode. In some embodiments, the current handwriting input mode is a vertical input mode and the alternative handwriting input mode is a horizontal input mode. In some embodiments, the current handwriting input mode and the alternative handwriting input mode are modes that provide any two different handwriting input directions or paragraph directions. In some embodiments, the user input is (2052) rotating the device from a current orientation to a different orientation. In some embodiments, the user input is invoking an affordance to manually switch from a current handwriting input mode to an alternate handwriting input mode.
As described herein, the handwriting input module allows a user to enter handwritten strokes and/or characters in any temporal order. Thus, deleting individual handwritten characters in a multi-character handwriting and rewriting the same or different handwritten characters at the same locations as the characters being deleted is advantageous because it may facilitate a user in correcting long handwriting without deleting the entire handwriting.
20A-20H illustrate an exemplary user interface for visually highlighting and/or deleting recognition units recognized in a plurality of handwritten strokes currently accumulated in a handwriting input area. It is particularly useful to allow a user to individually select, view and delete any one of a plurality of recognition units recognized in a plurality of inputs, when the user device allows multi-character or even multi-line handwriting input. By allowing the user to delete a particular recognition unit at the beginning or in the middle of the handwritten input, the user is allowed to make corrections to long inputs without requiring the user to delete all recognition units after the undesired recognition unit.
As shown in fig. 21A-21C, a user has provided a plurality of handwritten strokes (e.g., strokes 2102, 2104, and 2106) in handwriting input area 804 of handwriting input user interface 802. While the user continues to provide additional strokes to handwriting input area 804, the user device updates the recognition unit recognized from the handwriting input currently accumulated in the handwriting input area and corrects the recognition result according to the output character recognized from the updated recognition unit. As shown in fig. 20C, the user device has recognized two recognition units from the current handwriting input and presented three recognition results (e.g., 2108, 2010, and 2112) each including two chinese characters.
In this example, after the user has written two handwritten characters, the user realizes that the first recognition unit has not written correctly, and as a result, the user device has not recognized and presented the desired recognition result in the candidate display area.
In some embodiments, when the user provides a tap gesture (e.g., a contact followed by an immediate lift at the same location) on the touch-sensitive display, the user device interprets the tap gesture as causing the inputs of the various recognition units currently recognized in the handwriting input area to be visually highlighted. In some embodiments, the user device is caused to highlight individual recognition units in handwriting input area 804 using another predetermined gesture (e.g., a multi-finger swipe gesture over the handwriting input area). Flick gestures are sometimes preferred because they are relatively easily distinguished from handwritten strokes, which typically involve a longer duration of contact and movement of the contact within the handwriting input area 804. Multi-touch gestures are sometimes preferred because they are relatively easily distinguished from handwritten strokes, which typically involve a single contact within handwriting input area 804. In some embodiments, the user device provides an affordance 2112 in the user interface that may be invoked by the user (e.g., by contacting 2114) to cause visual highlighting of the respective recognition unit (e.g., as shown in blocks 2108 and 2110). In some embodiments, affordances are preferred when there is sufficient screen space to accommodate such affordances. In some embodiments, the affordance may be invoked by the user multiple times in succession, which causes the user device to visually highlight one or more recognition units recognized from different segmentation chains in the segmentation grid, and to turn off the highlighting when all segmentation chains have been shown.
As shown in fig. 21D, when the user provides the necessary gestures to highlight individual recognition units in handwriting input area 804, the user device also displays a corresponding delete affordance (e.g., small delete buttons 2116 and 2118) over each highlighted recognition unit. 21E-21F illustrate removal of the respective recognition element (e.g., in box 2118) from the handwriting input area 804 when the user touches (e.g., via contact 2120) the deletion affordance (e.g., delete button 2116 for the first recognition element in box 2118) for the respective recognition element. In this particular example, the recognition unit deleted is neither the recognition unit that was last entered in time nor the recognition unit that was last in space in the writing direction. In other words, the user may delete any recognition unit whenever it is provided in the handwriting input area. Fig. 21F shows that the user equipment further updates the recognition result displayed in the candidate display area 806 in response to deleting the first recognition unit in the handwriting input area. As shown in fig. 21F, the user equipment also deletes the candidate character corresponding to the recognition unit deleted from the recognition result. Thus, the new recognition result 2120 is displayed in the candidate display area 806.
21G-21H, after the first recognition unit has been removed from handwriting input interface 804, the user has provided a plurality of new handwritten strokes 2122 in the area previously occupied by the deleted recognition unit. The user device has re-segmented the currently accumulated handwriting input in handwriting input area 804. Based on the recognition units recognized from the handwriting input, the user device regenerates recognition results (e.g., results 2124 and 2126) in candidate display area 806. 21G-21H illustrate that when the user has selected one of the recognition results (e.g., result 2124) (e.g., via contact 2128), the text of the selected recognition result is entered into text entry area 808.
22A-22B are flow diagrams for an exemplary process 2200 in which individual recognition units recognized in a current handwriting input are visually presented and may be independently deleted, regardless of the chronological order in which the recognition units were formed. Fig. 21A-21H illustrate a process 2200 according to some embodiments.
In exemplary process 2200, a user device receives (2202) handwriting input from a user. The handwritten input includes a plurality of handwritten strokes provided on a touch-sensitive surface coupled to the device. In some embodiments, a user device renders 2204 a plurality of handwritten strokes in a handwriting input area (e.g., handwriting input area 804) of a handwriting input interface. In some embodiments, the user device segments (2206) the plurality of handwritten strokes into two or more recognition units, each recognition unit including a respective subset of the plurality of handwritten strokes.
In some embodiments, a user device receives (2208) an edit request from a user. In some embodiments, the edit request is (2210) a contact detected over a predetermined affordance (e.g., affordance 2112 in fig. 21D) provided in a handwriting input interface. In some embodiments, the edit request is (2212) a tap gesture detected over a predetermined area in the handwriting input interface. In some embodiments, the predetermined area is within a handwriting input area of the handwriting input interface. In some embodiments, the predetermined area is outside of a handwriting input area of the handwriting input interface. In some embodiments, another predetermined gesture outside of the handwriting input area (e.g., a cross gesture, a horizontal swipe gesture, a vertical swipe gesture, a tilt swipe gesture) may be used as the edit request. The hand outside the handwriting input area can be easily distinguished from the handwritten strokes because it is provided outside the handwriting input area.
In some embodiments, in response to the edit request, the user device visually distinguishes (2214) the two or more recognition units in the handwriting input area, e.g., using blocks 2108 and 2110 in fig. 21D. In some embodiments, visually distinguishing the two or more recognition units further includes (2216) highlighting respective boundaries between the two or more recognition units in the handwriting input area. In various embodiments, different ways of visually distinguishing the recognition units recognized in the current handwriting input may be used.
In some embodiments, the user device provides (2218) means for independently deleting each of the two or more recognition units from the handwriting input area. In some embodiments, the means for independently deleting each of the two or more identification cells is a respective delete button displayed adjacent to each identification cell, e.g., as shown by delete buttons 2116 and 2118 in fig. 21D. In some embodiments, the means for independently deleting each of the two or more recognition units is means for detecting a predetermined delete gesture input over each recognition unit. In some embodiments, the user device does not visually display the respective deletion affordances over the highlighted recognition unit. Instead, in some embodiments, the user is allowed to use a delete gesture to delete the corresponding recognition unit below the delete gesture. In some embodiments, the user device does not accept additional handwritten strokes in the handwriting input area while the user device displays the recognition unit in a visually highlighted manner. Conversely, a predetermined gesture or any gesture detected over a visually highlighted recognition unit will cause the user device to remove the recognition unit from the handwriting input area and modify the recognition results displayed in the candidate display area accordingly. In some embodiments, the flick gesture causes the user device to visually highlight individual recognition units recognized in the handwriting recognition area, and the user may then use the delete button to independently delete individual recognition units in the opposite writing direction.
In some embodiments, the user device receives (2224) a deletion input from the user and through the provided means for independently deleting a first recognition unit of the two or more recognition units from the handwriting input area, e.g., as shown in fig. 21E. In response to the deletion input, the user device removes (2226) a corresponding subset of the handwritten strokes in the first recognition unit from the handwriting input area, e.g., as shown in FIG. 21F. In some embodiments, the first recognition unit is a spatially initial recognition unit of the two or more recognition units. In some embodiments, the first recognition unit is a spatially intermediate recognition unit of the two or more recognition units, e.g., as shown in fig. 21E-21F. In some embodiments, the first recognition unit is a spatially last recognition unit of the two or more recognition units.
In some embodiments, the user device generates (2228) a segmentation grid from the plurality of handwritten strokes, the segmentation grid including a plurality of alternating segmentation chains that each represent a respective set of recognition units recognized from the plurality of handwritten strokes. For example, fig. 21G shows recognition results 2024 and 2026, in which the recognition result 2024 is generated from one division chain having two recognition units, and the recognition result 2026 is generated from another division chain having three recognition units. In some embodiments, a user device receives (2230) two or more consecutive edit requests from a user. For example, two or more consecutive edit requests can be several consecutive taps on the affordance 2112 in fig. 21G. In some embodiments, in response to each of the two or more consecutive edit requests, the user device visually distinguishes the respective set of recognition units from a different alternate partition chain of the plurality of alternate partition chains in the handwriting input area (2232). For example, in response to a first tap gesture, two recognition units are highlighted in handwriting input area 804 (e.g., for the characters "cap" and "child," respectively), and in response to a second tap gesture, three recognition units are highlighted (e.g., for the characters "towel," "top," and "child," respectively). In some embodiments, in response to the third tap gesture, the visual highlighting is optionally removed from all recognition units and the handwriting input area is returned to a normal state ready to accept additional strokes. In some embodiments, the user device provides (2234) means for independently deleting each recognition unit of the respective set of recognition units of the current representation in the handwriting input area. In some embodiments, the means is a respective delete button for each highlighted identification cell. In some embodiments, the apparatus is a means for detecting a predetermined delete gesture above each highlighted recognition unit and for invoking a function to delete the highlighted recognition unit below the predetermined delete gesture.
As described herein, in some embodiments, a user device provides a continuous input pattern in a handwriting input area. Because the area of the handwriting input area is limited to portable user devices, it is sometimes desirable to provide a way to cache handwriting input provided by a user and allow the user to reuse screen space without submitting previously provided handwriting input. In some embodiments, the user device provides for scrolling the handwriting input area, wherein the input area is gradually shifted by an amount (e.g., one recognition unit at a time) when the user is sufficiently close to the end of the handwriting input area. In some embodiments, it is sometimes advantageous to reuse previously used areas of the input area without dynamically shifting the recognition unit, since shifting an existing recognition unit in the handwriting input area may interfere with the user's writing process and may interfere with the correct segmentation of the recognition unit. In some embodiments, when the user reuses the area occupied by the handwriting input that has not yet been entered into the text entry area, the top recognition result for the handwriting entry area is automatically entered into the text entry area so that the user can continue to provide new handwriting input without explicitly selecting the top-ranked recognition result.
In some conventional systems, a user is allowed to write over existing handwriting input that is still displayed in the handwriting input area. In such systems, time information is used to determine whether a new stroke is an earlier recognition unit or part of a new recognition unit. Such systems, which depend on time information, put stringent requirements on the speed and rhythm with which the user provides handwritten input, which many users have difficulty meeting. Furthermore, visually rendering handwritten input can be a confusing situation that is difficult for a user to crack. Thus, the writing process may be frustrating and confusing to the user, resulting in a poor user experience.
As described herein, a fallback procedure is used to indicate when the user can reuse the area occupied by the previously written recognition unit and continue writing in the handwriting input area. In some embodiments, the fading process gradually reduces the visibility of each recognition unit that has provided a threshold amount of time in the handwriting input area such that existing text does not visually compete with new strokes as they are written over it. In some embodiments, writing is automatically over a retired recognition unit such that the top-ranked recognition result for that recognition unit is entered into the text entry area without requiring the user to stop writing and explicitly provide selection input for the top-ranked recognition result. The suggestion and automatic confirmation of the recognition result with the top ranking improves the input efficiency and speed of a handwriting input interface, and reduces the cognitive burden applied to a user so as to keep the current text writing thought smooth. In some embodiments, writing over a retired recognition unit does not result in automatic selection of the top ranked search results. Instead, the retired recognition unit may be cached in the handwriting input stack and combined with the new handwriting input as the current handwriting input. The user may see the recognition results generated based on all recognition results accumulated in the handwriting input stack before making a selection.
23A-23J illustrate exemplary user interfaces and processes in which recognition units provided in different areas of a handwriting input area are gradually faded out of their respective areas, for example, after a predetermined amount of time, and a user is allowed to provide new handwritten strokes in a particular area after fading occurs in that area.
As shown in FIG. 23A, a user has provided a plurality of handwritten strokes 2302 (e.g., three handwritten strokes for the capital letter "I") in handwriting input interface 804. The user device recognizes handwritten strokes 2302 as a recognition unit. In some embodiments, the handwriting input currently shown in handwriting input area 804 is cached in a first layer in a handwriting input stack of the user device. A number of recognition results generated based on the recognized recognition units are provided in the candidate display area 804.
FIG. 23B shows that as the user continues to write one or more strokes 2302 to the right of stroke 2304, handwritten strokes 2302 in the first recognition unit begin to fade gradually in handwriting input area 804. In some embodiments, the animation is displayed to simulate a gradual fade-out or dissipation of the visual rendering of the first recognition unit. For example, the animation may produce a visual effect of ink evaporating from the whiteboard. In some embodiments, the fade-out of the recognition unit is not uniform throughout the recognition unit. In some embodiments, the fading out of the recognition unit increases over time, and eventually the recognition unit is not visible at all in the handwritten area. However, even though the recognition unit is no longer visible in handwriting input area 804, in some embodiments, the invisible recognition unit remains at the top of the handwriting input stack and recognition results generated from the recognition unit continue to be displayed in the candidate display area. In some embodiments, the faded-out recognition unit is not completely removed from view until a new handwritten input is written over it.
In some embodiments, the user device allows for providing a new handwritten input immediately above the area occupied by the faded recognition unit at the start of the fading animation. In some embodiments, the user device allows providing a new handwritten input over the area occupied by the faded recognition unit only after the fade has progressed to a certain stage (e.g. the lightest level or until recognition is completely invisible in this area).
FIG. 23C shows that the first recognition unit (i.e., stroke 2302) has completed its fading process (e.g., the ink color has stabilized at a very pale level or has become invisible). The user device has recognized additional recognition units (e.g., recognition units for handwritten letters "a" and "m") from additional handwritten strokes provided by the user, and presented updated recognition results in candidate display area 804.
22D-22F illustrate that over time, and that the user has provided a plurality of additional handwritten strokes (e.g., 2304 and 2306) in handwriting input area 804. At the same time, the previously recognized recognition units fade out gradually from the handwriting input area 804. In some embodiments, after the recognition units have been recognized, it takes a predetermined amount of time for each recognition unit to begin its fade-out process. In some embodiments, the fade-out process for each recognition unit does not begin until the user has begun inputting the second recognition unit downstream therefrom. As shown in fig. 23B-23F, when the handwritten input is provided in a cursive style, a single stroke (e.g., stroke 2304 or stroke 2306) may traverse multiple recognition units in the handwritten input region (e.g., a recognition unit for each handwritten letter in the words "am" or "back").
Fig. 22G illustrates that even after the recognition unit has begun its fade-out process, the user can return it to the un-faded state by a predetermined resume input, such as a tap gesture on the delete button 2310 (e.g., as represented by the immediately-up contact 2308). When the recognition unit is restored, its appearance returns to the normal visibility level. In some embodiments, the recovery of the faded-out recognition units is performed character-by-character in a direction opposite to the writing direction in handwriting input area 804. In some embodiments, the recovery of the faded-out recognition units is performed word-by-word in handwriting input area 804. As shown in fig. 23G, the recognition unit with the word "back" has been restored from the completely faded-out state to the completely un-faded-out state. In some embodiments, the clock used to initiate the fade-out process is reset for each recognition unit when the recognition unit is restored to the un-faded state.
Fig. 22H shows that continued contact on the delete button causes the last recognition cell in the default writing direction (e.g., the recognition cell for the letter "k" in the word "back") to be deleted from handwriting input area 804. Since the delete input is maintained at all times, more recognition units (e.g., for the letters "c", "a", "b" in the word "back") are deleted independently in the opposite writing direction. In some embodiments, the deletion of the recognition unit is performed word-by-word, and all letters of the handwritten word "back" deleted from handwriting input area 804 are removed simultaneously. Fig. 22H also shows that the recognition unit "m" that was previously faded out is also restored due to the contact 2308 that was held on the delete button 2310 after the recognition unit for the letter "b" in the handwritten word "back" was deleted.
Fig. 23I shows that if the deletion input is stopped before deleting the recognition unit "m" restored in the handwritten word "am", the restored recognition unit will fade out gradually again. In some embodiments, the state of each recognition unit (e.g., a state selected from a set of one or more faded-out and non-faded-out states) is maintained and updated in the handwriting input stack.
Fig. 23J illustrates that when the user has provided one or more strokes 2312 over the area occupied by the faded-out recognition unit (e.g., the recognition unit for the letter "I") in the handwriting input area, in some embodiments, the text that makes the top-ranked recognition result (e.g., result 2314) for the handwriting input is made before the strokes 2312 are automatically entered into the text input area 808, as shown in fig. 23I-23J. As shown in FIG. 23J, the text "Iam" is no longer shown to be tentative, but has been committed in the text entry area 808. In some embodiments, once text input has been made for the fully faded or partially faded handwriting input, the handwriting input is removed from the handwriting input stack. The newly entered stroke (e.g., stroke 2312) becomes the current input in the handwriting input stack.
As shown in FIG. 23J, the text "Iam" is no longer shown to be tentative, but has been committed in the text entry area 808. In some embodiments, once text input has been made for the fully faded or partially faded handwriting input, the handwriting input is removed from the handwriting input stack. The newly entered stroke (e.g., stroke 2312) becomes the current input in the handwriting input stack.
In some embodiments, text that was ordered against the handwritten input prior to stroke 2312 to the top recognition result (e.g., result 2314) may not be automatically entered into text entry area 808 when stroke 2312 is provided over the area occupied by the faded-out recognition unit (e.g., the recognition unit for the letter "I") in the handwriting entry area. Instead, the current handwriting input (both faded and un-faded) in handwriting input area 804 is cleared and cached in the handwriting input stack. The new stroke 2312 is appended to the cached handwriting input in the handwriting input stack. The user device determines a recognition result based on the integrity of the current accumulated handwriting input in the handwriting input stack. And displaying the recognition result in the candidate display area. In other words, even though only a portion of the currently accumulated handwriting input is shown in handwriting input area 804, a recognition result is generated based on the entire cached handwriting input (both visible and no longer visible portions) in the handwriting input stack.
FIG. 23K shows that the user has entered more strokes 2316 in the handwriting input area 804 fading out over time. FIG. 23L illustrates that a new stroke 2318 written over the faded-out strokes 2312 and 2316 causes text of the tip recognition result 2320 for the faded-out strokes 2312 and 2316 to be entered into the text entry area 808.
In some embodiments, the user optionally provides handwritten input in multiple lines. In some embodiments, the same fading process may be used to clear the handwriting input area for new handwriting input when multiple lines of input are enabled.
24A-24B are flow diagrams of an exemplary process 2400 for providing a fade-out process in a handwriting input area of a handwriting input interface. 23A-23K illustrate a process 2400 according to some embodiments.
In some embodiments, the device receives (2402) a first handwriting input from a user. The first handwriting input includes a plurality of handwritten strokes, and the plurality of handwritten strokes form a plurality of recognition units distributed along respective writing directions associated with a handwriting input area of the handwriting input interface. In some embodiments, as the user provides the handwritten stroke, the user device renders (2404) each of the plurality of handwritten strokes in the handwriting input area.
In some embodiments, the user device starts (2406) a respective fade-out process for each of the plurality of recognition units after fully rendering the recognition unit. In some embodiments, the rendering of the recognition unit in the first handwritten input fades out during a respective fading process. This is illustrated in fig. 23A-23F, according to some embodiments.
In some embodiments, the user device receives (2408) a second handwriting input from the user over an area of the handwriting input area occupied by a faded recognition unit of the plurality of recognition units, for example as shown in fig. 23I-23J and 23K-23L. In some embodiments, in response to receiving the second handwriting input (2410): the user device renders (2412) the second handwriting input in the handwriting input area and clears (2414) all faded recognition units from the handwriting input area. In some embodiments, all recognition units are entered in the handwriting input area before the second handwriting input is cleared from the handwriting input area, regardless of whether the recognition units begin their fading process. This is shown, for example, in fig. 23I to 23J and fig. 23K to 23L.
In some embodiments, the user device generates (2416) one or more recognition results for the first handwritten input. In some embodiments, the user device displays (2418) the one or more recognition results in a candidate display area of the handwriting input interface. In some embodiments, in response to receiving the second handwriting input, the user device automatically enters (2420) the top-ranked recognition result displayed in the candidate display area into the text entry area of the handwriting input interface without user selection. This is shown, for example, in fig. 23I to 23J and fig. 23K to 23L.
In some embodiments, the user device stores (2422) an input stack comprising a first handwritten input and a second handwritten input. In some embodiments, the user device generates (2424) one or more multi-character recognition results that each include a respective spatial sequence of characters recognized from the concatenated form of the first and second handwritten inputs. In some embodiments, the user device displays (2426) one or more multi-character recognition results in the candidate display area of the handwriting input interface while the rendering of the second handwriting input has replaced the rendering of the first handwriting input in the handwriting input area.
In some embodiments, the respective fade-out process is started for each recognition unit after a predetermined period of time has elapsed after the user has completed the recognition unit.
In some embodiments, the fade-out process is initiated for each recognition unit when the user begins to enter a stroke for the next recognition unit after that recognition unit.
In some embodiments, the final state of the respective fading process for each recognition unit is a state with a predetermined minimum visibility for the recognition unit.
In some embodiments, the final state of the respective fading process for each recognition unit is a state with zero visibility for the recognition unit.
In some embodiments, the user device receives (2428) a predetermined recovery input from the user after the last recognition unit in the first handwritten input fades out. In response to receiving a predetermined restore input, the user device restores (2430) the last identified cell from the faded-out state to the un-faded-out state. This is shown, for example, in fig. 23F-23H. In some embodiments, the predetermined recovery input is an initial contact detected on a delete button provided in the handwriting input interface. In some embodiments, the continuous contact detected on the delete button deletes the last recognition unit from the handwriting input area and restores the second recognition unit to the last recognition unit from the faded-out state to the non-faded-out state. This is shown, for example, in fig. 23G to 23H.
As described herein, the multi-word handwriting recognition model performs stroke order-independent and stroke direction-independent recognition of handwritten characters. In some embodiments, the recognition model is trained only on spatially-derived features contained in planar images of writing samples corresponding to different characters in a handwriting recognition model vocabulary. Since the image of the writing sample does not contain any temporal information related to the individual strokes contained in the image, the resulting recognition model is independent of stroke order and independent of stroke direction.
As described above, handwriting recognition independent of stroke order and stroke direction provides a number of advantages over conventional recognition systems that rely on information related to the temporal generation of characters (e.g., the temporal order of strokes in a character). However, in real-time handwriting recognition scenarios, there is time information available about individual strokes, and it is sometimes beneficial to utilize such information to improve the recognition accuracy of the handwriting recognition system. The following describes a technique for integrating time-derived stroke distribution information into spatial feature extraction of a handwriting recognition model, wherein the use of time-derived stroke distribution information does not disrupt stroke order and/or stroke direction independence of the handwriting recognition system. Based on stroke distribution information associated with different characters, it is possible to distinguish between appearance-similar characters produced with significantly different sets of strokes.
In some embodiments, the time information associated with individual strokes is lost when converting the handwritten input into an input image (e.g., an input bitmap image) for a handwriting recognition model (e.g., CNN). For example, for the Chinese character "nation," the Chinese character may be written using eight strokes (# 1- #8 in FIG. 27). The order and direction of the strokes for that character provides certain unique characteristics associated with that character. One untested way to capture stroke order information and stroke direction information without destroying stroke order and stroke direction independent of the recognition system is to explicitly enumerate all possible permutation combinations in terms of stroke order and stroke direction in the training sample. But even for characters of only modest complexity, this can be over a billion possibilities, which makes it impracticable, if not impossible, in practice. As described herein, a stroke distribution profile is generated for each writing sample that abstracts out temporal aspects of stroke generation (i.e., temporal information). The stroke distribution profile of the writing sample is trained to extract a set of temporally-derived features, which are then combined with spatially-derived features (e.g., from the input bitmap image) to improve recognition accuracy without affecting stroke order and stroke direction independence of the handwriting recognition system.
As described herein, temporal information associated with a character is extracted by computing a plurality of pixel distributions to characterize each handwritten stroke. Each handwritten stroke of a character acquires a deterministic pattern (or outline) when projected in a given direction. While such a pattern may not be sufficient by itself to unambiguously recognize a stroke, it may be sufficient to capture the specific characteristics inherent to this particular stroke when combined with other similar patterns. Sequentially integrating such stroke representations with spatially extracted features (e.g., feature extraction based on input images in CNNs) provides orthogonal information that can be used to disambiguate between similar-looking characters in a vocabulary of a handwriting recognition model.
FIGS. 25A-25B are flow diagrams of an exemplary process 2500 for integrating temporally-derived features and spatially-derived features of a handwriting sample during training of a handwriting recognition model, where the resulting recognition model remains independent of stroke order and stroke direction. In some embodiments, the example process 2500 is performed on a server device that provides a trained recognition model to a user device (e.g., the portable device 100). In some embodiments, the server device includes one or more processors and memory containing instructions that, when executed by the one or more processors, perform process 2500.
In exemplary process 2500, the device independently trains (2502) a set of spatially-derived features and a set of temporally-derived features of a handwriting recognition model, wherein the set of spatially-derived features are trained for a corpus of training images each being an image of a handwriting sample for a respective character in a respective output character set, and the set of temporally-derived features are trained for stroke distribution profiles, each stroke distribution profile numerically characterizing a spatial distribution of a plurality of strokes in the handwriting sample for the respective character in the output character set.
In some embodiments, independently training the set of spatially derived features further includes (2504) training a convolutional neural network having an input layer, an output layer, and a plurality of convolutional layers, the convolutional layers including a first convolutional layer, a last convolutional layer, zero or more intermediate convolutional layers between the first convolutional layer and the last convolutional layer, and a hidden layer between the last convolutional layer and the output layer. An exemplary convolutional network 2602 is shown in fig. 26. The exemplary convolutional network 2602 may be implemented in substantially the same manner as the convolutional network 602 shown in fig. 6. Convolutional network 2602 includes an input layer 2606, an output layer 2608, a plurality of convolutional layers including a first convolutional layer 2610a, zero or more intermediate convolutional layers and a last convolutional layer 2610n, and a hidden layer 2614 between the last convolutional layer and output layer 2608. The convolutional network 2602 also includes kernel layers 2616 and sub-sampling layers 2612 according to the arrangement shown in fig. 6. The training of the convolutional network is based on images 2614 of the written samples in the training corpus 2604. Spatially derived features are obtained and respective weights associated with the different features are determined by minimizing recognition errors of training samples in a training corpus. Once trained, the same features and weights are used to identify new handwritten samples that are not present in the training corpus.
In some embodiments, independently training the set of temporally-derived features further includes (2506) providing a plurality of stroke distribution profiles to a statistical model to determine a plurality of temporally-derived parameters and respective weights for the plurality of temporally-derived parameters for classifying respective characters in the output character set. In some embodiments, as shown in fig. 26, stroke distribution profile 2620 is derived from each writing sample in training corpus 2622. Training corpus 2622 optionally includes the same writing samples as corpus 2604, but also includes temporal information associated with the stroke generation in each writing sample. The stroke distribution profile 2622 is provided to a statistical modeling process 2624 during which time-derived features are extracted and respective weights for the different features are determined by minimizing recognition or classification errors based on statistical modeling methods (e.g., CNN, K-nearest neighbors, etc.). As shown in fig. 26, the set of temporally-derived features and corresponding weights are converted into a set of feature vectors (e.g., feature vector 2626 or feature vector 2628) and injected into corresponding layers of convolutional neural network 2602. The resulting network thus comprises spatially derived parameters and temporally derived parameters that are orthogonal to each other and that together contribute to the recognition of the character.
In some embodiments, the apparatus combines (2508) the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model. In some embodiments, combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model includes (2510) injecting a plurality of spatially-derived parameters and a plurality of temporally-derived parameters into one of a convolutional layer or a hidden layer of a convolutional neural network. In some embodiments, a plurality of time-derived parameters and corresponding weights for the plurality of time-derived parameters are injected into a last convolutional layer of a convolutional neural network for handwriting recognition (e.g., last convolutional layer 2610n in fig. 26). In some embodiments, a plurality of temporally-derived parameters and respective weights for the plurality of temporally-derived parameters are injected into a hidden layer (e.g., hidden layer 2614 in fig. 26) of a convolutional neural network for handwriting recognition.
In some embodiments, the device provides (2512) real-time handwriting recognition for the user's handwriting input using a handwriting recognition model.
In some embodiments, the device generates (2514) a corpus of stroke distribution profiles from a plurality of writing samples. In some embodiments, each handwriting sample of the plurality of handwriting samples corresponds to (2516) a character in the output character set, and the respective spatial information in writing it is independently retained for each constituent stroke of the writing sample. In some embodiments, to generate the corpus of stroke distribution profiles, the device performs (2518) the steps of:
For each handwriting sample (2520) of the plurality of handwriting samples: the device identifies (2522) constituent strokes in the handwriting sample; for each of the recognized strokes of the handwriting sample, the device calculates (2524) a respective duty cycle in each of a plurality of predetermined directions, the duty cycle being a ratio between a projection span of said each stroke direction and a maximum projection span of said handwriting sample; for each of the identified strokes of the handwriting sample, the device also calculates (2526) a respective saturation ratio for said each stroke based on a ratio between a respective number of pixels within said each stroke and a total number of pixels within said handwriting sample. The user device then generates (2528) a feature vector for the handwriting sample as a stroke distribution profile of the handwriting sample, the feature vector including respective duty cycles and respective saturation ratios for at least N strokes of the handwriting sample, where N is a predetermined natural number. In some embodiments, N is less than the maximum stroke count observed in any single writing sample within the plurality of writing samples.
In some embodiments, for each of the plurality of handwriting samples: the device sorts the respective duty cycles of the recognized strokes in each of the predetermined directions in descending order; and only the N top-ranked duty cycles and saturation ratios of the writing samples are included in the feature vector of the writing sample.
In some embodiments, the plurality of predetermined directions includes a horizontal direction, a vertical direction, a positive 45 degree direction, and a negative 45 degree direction of the writing sample.
In some embodiments, to provide real-time handwriting recognition for a user's handwriting input using a handwriting recognition model, a device receives the user's handwriting input; and in response to receiving the handwriting input of the user, providing a handwriting recognition output to the user substantially simultaneously with receiving the handwriting input.
Using the character "nation" shown in fig. 27, an exemplary embodiment is described herein for exemplary purposes. In some embodiments, each input image of the handwritten character is optionally normalized to a square. The span of each individual handwritten stroke (e.g., strokes #1, # 2., #8) is measured when projected to the horizontal, vertical, +45 degree diagonal, and-45 degree diagonal of a square. The span of each stroke Si is recorded for the four projection directions as xspan (i), expand (i), cspan (i), and dspan (i), respectively. In addition, the maximum span observed across the entire image is also recorded. The maximum span of the character is recorded for the four projection directions as xspan, yspan, cspan, and dspan, respectively. For exemplary purposes, four projection directions are optionally contemplated herein, although in principle any arbitrary set of projections may be used in various embodiments. The maximum span (e.g., denoted xspan, yspan, cspan, and dspan) and span (e.g., denoted xspan (4), yspan (4), cspan (4), and dspan (4)) of one of the strokes in the character "nation" in the four projection directions are shown in fig. 27.
In some embodiments, once the above span is measured for all strokes 1 through 5, a respective duty cycle in each projection direction is calculated, where 5 is the number of individual handwritten strokes associated with the input image. For example, it will be for stroke SiIn the x-direction of the respective duty cycle Rx(i) Is calculated as Rx(i) Xspan (i)/xspan. Similarly, the corresponding duty cycles, R, along the other projection directions may be calculatedy(i)=yspan(i)/yspan,Rc(i)=cspan(i)/cspan,Rd(i)=dspan(i)/dspan。
In some embodiments, the duty cycles of all strokes in each direction are sorted independently in descending order, and for each projection direction, a respective ordering of all strokes in the input image is obtained for that direction in terms of its duty cycle in that direction. The ordering of the strokes in each projection direction reflects the relative importance of each stroke along the associated projection direction. This relative importance is independent of the order and direction in which strokes are generated in the writing sample. Thus, this duty cycle based ordering is a time derived information that is independent of stroke order and stroke direction.
In some embodiments, each stroke is given a relative weight indicating the importance of the stroke relative to the entire character. In some embodiments, the weight is measured by the ratio of the number of pixels in each stroke to the total number of pixels in the character. This ratio is referred to as the saturation ratio associated with each stroke.
In some embodiments, a feature vector may be created for each stroke based on the duty cycle and saturation ratio of each stroke. For each character, a set of feature vectors is created that includes 5S features. This set of characteristics is referred to as the stroke distribution profile of the character.
In some embodiments, only the predetermined number of top-ranked strokes are used in constructing the stroke profile for each character. In some embodiments, the predetermined number of strokes is 10. Based on the first ten strokes, 50 stroke-derived features may be generated for each character. In some embodiments, these features are injected into the last convolutional layer or subsequent hidden layers of the convolutional neural network.
In some embodiments, during real-time recognition, an input image of a recognition unit is provided to a handwriting recognition mode that has been trained using both spatially derived features and temporally derived features. The input image is processed by each layer of the handwriting recognition model shown in fig. 26. When the processing of the input image reaches a layer requiring stroke distribution profile input (e.g., the last convolution layer or hidden layer), the stroke distribution profile of the recognition unit is injected into the layer. The input image and stroke distribution profile continues to be processed until output classifications (e.g., one or more candidate characters) are provided in the output layer 2608. In some embodiments, a stroke distribution profile is calculated for all recognition units and provided as input to the handwriting recognition model along with the input image for the recognition units. In some embodiments, the input image of the recognition unit initially passes through the handwriting recognition model (without the benefit of temporal training features). When two or more similar appearing candidate characters are identified with close recognition confidence values, then the stroke distribution profile of the recognition unit is injected into the handwriting recognition model at the layer that has been trained with the time-derived features (e.g., the last convolutional layer or hidden layer). As the input image and stroke profile of the recognition unit are passed through the final layers of the handwriting recognition model, two or more candidate characters of similar appearance may be better distinguished due to differences in their stroke profile. Thus, time-derived information about how the recognition units are formed from individual handwritten strokes is used to improve recognition accuracy without affecting the stroke order and stroke direction independence of the handwriting recognition system.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the exemplary discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Claims (162)
1. A method of providing multi-script handwriting recognition, comprising:
at a device having memory and one or more processors:
training a multi-script handwriting recognition model based on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts; and
providing real-time handwriting recognition for a user's handwriting input using the multi-word handwriting recognition model that has been trained against the spatially-derived features of the multi-word training corpus.
2. The method of claim 1, wherein the spatially derived features of the multi-script training corpus are stroke order independent and stroke direction independent.
3. The method of claim 1, wherein the training of the multi-script handwriting recognition model is independent of temporal information associated with respective strokes in the handwriting samples.
4. The method of claim 1, wherein the at least three non-overlapping scripts including chinese characters, emoji characters, and latin scripts.
5. The method of claim 1, wherein the at least three non-overlapping literals comprise chinese characters, arabic literals, and latin literals.
6. The method of claim 1, wherein the at least three non-overlapping literals comprise non-overlapping literals defined by the Unicode standard.
7. The method of claim 1, wherein training the multi-script handwriting recognition model further comprises:
providing the handwriting samples of the multi-script training corpus to a single convolutional neural network having a single input plane and a single output plane; and
using the convolutional neural network to determine the spatially-derived features of the handwriting samples and corresponding weights for the spatially-derived features for distinguishing characters of the at least three non-overlapping scripts represented in the multi-script training corpus.
8. The method of claim 1, wherein the multi-script handwriting recognition model has at least thirty thousand output categories representing at least thirty thousand characters across the at least three non-overlapping scripts.
9. The method of claim 1, wherein providing real-time handwriting recognition for a user's handwriting input further comprises:
providing the multi-script handwriting recognition model to a user device, wherein the user device receives a plurality of handwritten strokes from the user and performs handwriting recognition locally on one or more recognition units recognized from the plurality of handwritten strokes based on the received multi-script handwriting recognition model.
10. The method of claim 1, wherein providing real-time handwriting recognition for a user's handwriting input further comprises:
continuously correcting one or more recognition results for the user's handwriting input in response to continuing to add or correct the handwriting input by the user; and
in response to each correction of the one or more recognition results, displaying the respective corrected one or more recognition results to the user in a candidate display area of the handwriting input user interface.
11. The method of claim 1, further comprising:
providing the multi-script handwriting recognition model to a plurality of devices that do not have an existing overlap in input languages, wherein the multi-script handwriting recognition model is used on each of the plurality of devices for handwriting recognition of a different input language associated with said each user device.
12. A method comprising any combination of the features of claims 1-11.
13. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform operations comprising:
Training a multi-script handwriting recognition model based on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts; and
providing real-time handwriting recognition for a user's handwriting input using the multi-word handwriting recognition model that has been trained against the spatially-derived features of the multi-word training corpus.
14. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the methods of claims 1-11.
15. A system, comprising:
one or more processors; and
memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform operations comprising:
training a multi-script handwriting recognition model based on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts; and
providing real-time handwriting recognition for a user's handwriting input using the multi-word handwriting recognition model that has been trained against the spatially-derived features of the multi-word training corpus.
16. A system, comprising:
one or more processors; and
memory having instructions stored thereon, which when executed by the one or more processors, cause the processors to perform any of the methods of claims 1-11.
17. A method of providing multi-script handwriting recognition, comprising:
at a user device having memory and one or more processors:
receiving a multi-script handwriting recognition model, the multi-script recognition model having been trained on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts;
receiving a handwritten input from a user, the handwritten input including one or more handwritten strokes provided on a touch-sensitive surface coupled to the user device; and
in response to receiving the handwriting input, providing one or more handwriting recognition results to the user in real-time based on the multi-script handwriting recognition model that has been trained for the spatially-derived features of the multi-script training corpus.
18. The method of claim 17, wherein providing real-time handwriting recognition results to the user further comprises:
Segmenting the user's handwritten input into one or more recognition units, each recognition unit including one or more of the handwritten strokes provided by the user;
providing a respective image of each of the one or more recognition units as input to the multi-script handwriting recognition model; and
for at least one recognition unit of the one or more recognition units, at least a first output character from a first word and at least a second output character from a second word different from the first word are obtained from the multi-word handwriting recognition model.
19. The method of claim 18, wherein providing real-time handwriting recognition results to the user further comprises:
displaying both the first output character and the second output character in a candidate display area of a handwriting input user interface of the user device.
20. The method of claim 18, wherein providing real-time handwriting recognition results to the user further comprises:
selectively displaying one of the first output character and the second output character based on which of the first or second text is a respective text for use in a soft keyboard currently installed on the user device.
21. The method of claim 17, wherein providing real-time handwriting recognition for a user's handwriting input further comprises:
continuously correcting one or more recognition results for the user's handwriting input in response to continuing to add or correct the handwriting input by the user; and
in response to each correction of the one or more recognition results, displaying the respective corrected one or more recognition results to the user in a candidate display area of the handwriting input user interface.
22. The method of claim 17, wherein the at least three non-overlapping scripts including chinese characters, emoji characters, and latin scripts.
23. The method of claim 17, wherein the at least three non-overlapping literals comprise chinese characters, arabic literals, and latin literals.
24. The method of claim 17, wherein the multi-script handwriting recognition model is a single convolutional neural network having a single input plane and a single output plane, and includes spatially-derived features and corresponding weights for the spatially-derived features for distinguishing characters of the at least three non-overlapping scripts represented in the multi-script training corpus.
25. The method of claim 17, wherein the multi-script handwriting recognition model has at least thirty thousand output categories representing at least thirty thousand characters spanning at least three non-overlapping scripts.
26. The method of claim 17, wherein the multi-script handwriting recognition model is configured to recognize characters based on respective input images of one or more recognition units recognized in the handwriting input, and wherein respective spatially-derived features for recognition are independent of respective stroke order, stroke direction, and continuity of strokes in the handwriting input.
27. A method comprising any combination of the features of claims 17-26.
28. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform operations comprising:
receiving a multi-script handwriting recognition model, the multi-script recognition model having been trained on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts;
Receiving a handwritten input from a user, the handwritten input including one or more handwritten strokes provided on a touch-sensitive surface coupled to the user device; and
in response to receiving the handwriting input, providing one or more handwriting recognition results to the user in real-time based on the multi-script handwriting recognition model that has been trained for the spatially-derived features of the multi-script training corpus.
29. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the methods of claims 17-26.
30. A system, comprising:
one or more processors; and
memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform operations comprising:
receiving a multi-script handwriting recognition model, the multi-script recognition model having been trained on spatially derived features of a multi-script training corpus, the multi-script training corpus including corresponding handwriting samples corresponding to characters of at least three non-overlapping scripts;
Receiving a handwritten input from a user, the handwritten input including one or more handwritten strokes provided on a touch-sensitive surface coupled to the user device; and
in response to receiving the handwriting input, providing one or more handwriting recognition results to the user in real-time based on the multi-script handwriting recognition model that has been trained for the spatially-derived features of the multi-script training corpus.
31. A system, comprising:
one or more processors; and
memory having instructions stored thereon, which when executed by the one or more processors, cause the processors to perform any of the methods of claims 17-26.
32. A method of providing real-time handwriting recognition, comprising:
at a device having memory and one or more processors:
receiving a plurality of handwritten strokes from a user, the plurality of handwritten strokes corresponding to a handwritten character; generating an input image based on the plurality of handwritten strokes;
providing the input image to a handwriting recognition model to perform real-time recognition of the handwritten character, wherein the handwriting recognition model provides stroke-order independent handwriting recognition; and
Displaying the same first output character in real-time as the plurality of handwritten strokes is received, regardless of the respective order in which the plurality of handwritten strokes have been received from the user.
33. The method of claim 32, wherein the handwriting recognition model provides stroke-direction independent handwriting recognition, and wherein displaying the same first output character further comprises:
displaying the same first output character in response to receiving the plurality of handwritten strokes, regardless of a respective stroke direction of each of the plurality of handwritten strokes that has been provided by the user.
34. The method of claim 32, wherein the handwriting recognition model provides stroke count independent handwriting recognition, and wherein displaying the same first output character further comprises:
displaying the same first output character in response to receiving the plurality of handwritten strokes, regardless of how many handwritten strokes are used to form consecutive strokes in the input image.
35. The method of claim 32, wherein stroke order independent handwriting recognition is performed independently of time information associated with individual strokes within the handwritten character.
36. The method of claim 32, further comprising:
receiving a second plurality of handwritten strokes from the user, the second plurality of handwritten strokes corresponding to a second handwritten character;
generating a second input image based on the second plurality of handwritten strokes; providing the second input image to the handwriting recognition model to perform real-time recognition of the second handwritten character; and
displaying, in real-time, a second output character corresponding to the second plurality of handwritten strokes as the second plurality of handwritten strokes is received, wherein the first output character and the second output character are simultaneously displayed in a spatial sequence regardless of a respective order in which the first plurality of handwritten inputs and the second plurality of handwritten inputs have been provided by the user.
37. The method of claim 36, wherein the spatial sequence of the first output character and the second output character corresponds to a spatial distribution of the first plurality of handwritten strokes and the second plurality of strokes along a default writing direction of a handwriting input interface of the user device.
38. The method of claim 36, wherein the first handwritten character is provided by the user as part of a first handwritten sentence and the second handwritten character is provided by the user as part of a second handwritten sentence, and wherein the first and second handwritten sentences are displayed simultaneously in a handwriting input area of the user device.
39. The method of claim 36, wherein the second plurality of handwritten strokes is received temporarily after the first plurality of handwritten strokes, and the second output character precedes the first output character in a spatial sequence along a default writing direction of a handwriting input interface of the user device.
40. The method of claim 36, wherein the second plurality of handwritten strokes is spatially subsequent to the first plurality of handwritten strokes along a default writing direction of a handwriting input interface of the user device, and the second output character is subsequent to the first output character in a spatial sequence along the default writing direction, and wherein the method further comprises:
receiving a third handwritten stroke from the user to correct the handwritten character, the third handwritten stroke being temporarily received after the first and second pluralities of handwritten strokes;
in response to receiving the third handwritten stroke, assigning the third handwritten stroke to the same recognition unit as the first plurality of handwritten strokes based on a relative proximity of the third handwritten stroke to the first plurality of handwritten strokes;
Generating a revised input image based on the first plurality of handwritten strokes and the third handwritten stroke;
providing the corrected input image to the handwriting recognition model to perform real-time recognition of the corrected handwritten character; and
displaying a third output character corresponding to the revised input image in response to receiving the third handwriting input, wherein the third output character replaces the first output character and is displayed in the spatial sequence concurrently with the second output character in the default writing direction.
41. The method of claim 40, further comprising:
receiving a deletion input from the user while the third output character and the second output character are simultaneously displayed as recognition results in a candidate display area of the handwriting input interface; and
in response to the deletion input, deleting the second output character from the recognition result while retaining the third output character in the recognition result.
42. The method of claim 41, further comprising:
rendering, in real-time, the first plurality of handwritten strokes, the second plurality of handwritten strokes, and the third handwritten stroke in the handwriting input area of the handwriting input interface as each of the handwritten strokes is provided by the user; and
In response to receiving the deletion input, deleting the respective renderings of the second plurality of handwritten strokes from the handwriting input area while maintaining the respective renderings of the first and third pluralities of handwritten strokes in the handwriting input area.
43. The method of claim 32, wherein the handwritten character is a multi-stroke chinese character.
44. The method of claim 32, wherein the first plurality of handwritten strokes is provided in a cursive writing style.
45. The method of claim 32, wherein the first plurality of handwritten strokes is provided in a cursive writing style, and the handwritten character is a multi-stroke chinese character.
46. The method of claim 40, further comprising:
establishing respective predetermined constraints on a set of acceptable sizes for handwritten character input; and
segmenting the currently accumulated plurality of handwritten strokes into a plurality of recognition units based on the respective predetermined constraints, wherein a respective input image is generated from each of the recognition units, provided to the handwriting recognition model, and recognized as a corresponding output character.
47. The method of claim 46, further comprising:
receiving additional handwritten strokes from the user after segmenting the currently accumulated plurality of handwritten strokes into the plurality of recognition units; and
assigning the additional handwritten stroke to a respective one of the plurality of recognition units based on a spatial position of the additional handwritten stroke relative to the plurality of recognition units.
48. A method comprising any combination of the features of claims 32-47.
49. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform operations comprising:
receiving a plurality of handwritten strokes from a user, the plurality of handwritten strokes corresponding to a handwritten character;
generating an input image based on the plurality of handwritten strokes;
providing the input image to a handwriting recognition model to perform real-time recognition of the handwritten character, wherein the handwriting recognition model provides stroke-order independent handwriting recognition; and
displaying the same first output character in real-time as the plurality of handwritten strokes is received, regardless of the respective order in which the plurality of handwritten strokes have been received from the user.
50. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the methods of claims 32-47.
51. A system, comprising:
one or more processors; and
memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform operations comprising:
receiving a plurality of handwritten strokes from a user, the plurality of handwritten strokes corresponding to a handwritten character;
generating an input image based on the plurality of handwritten strokes; providing the input image to a handwriting recognition model to perform real-time recognition of the handwritten character, wherein the handwriting recognition model provides stroke-order independent handwriting recognition; and
displaying the same first output character in real-time as the plurality of handwritten strokes is received, regardless of the respective order in which the plurality of handwritten strokes have been received from the user.
52. A system, comprising:
one or more processors; and
memory having instructions stored thereon, which when executed by the one or more processors, cause the processors to perform any of the methods of claims 32-47.
53. A method of providing real-time handwriting recognition, comprising: at a device having memory and one or more processors:
receiving a handwriting input from a user, the handwriting input comprising one or more handwritten strokes provided in a handwriting input area of a handwriting input interface;
identifying a plurality of output characters for the handwriting input based on a handwriting recognition model;
classifying the plurality of output characters into two or more categories based on predetermined classification criteria;
displaying a respective output character of a first category of the two or more categories in an initial view of a candidate display area of the handwriting input interface, wherein the initial view of the candidate display area is provided concurrently with an affordance for invoking an expanded view of the candidate display area;
receiving a user input for selecting the affordance for invoking the expanded view; and
in response to the user input, displaying, in the expanded view of the candidate display area, the respective output characters of the first category and respective output characters of at least a second category of the two or more categories not previously displayed in the initial view of the candidate display area.
54. The method of claim 53, wherein the predetermined classification criteria determine whether a respective character is a common character or an uncommon character.
55. The method of claim 53, the respective characters of the first category being characters found in a dictionary of common characters and the respective characters of the second category being characters found in a dictionary of uncommon characters.
56. The method of claim 55, wherein the dictionary of common characters and the dictionary of uncommon characters are dynamically adjusted based on a usage history associated with the device.
57. The method of claim 53, further comprising:
identifying a set of characters from the plurality of output characters that are visually similar to each other according to a predetermined similarity criterion;
selecting a representative character from the set of visually similar characters based on predetermined selection criteria; and
displaying the representative character in the initial view of the candidate display area in place of other characters in the set of visually similar characters.
58. The method of claim 57, further comprising:
receiving a predetermined expand input from the user, the predetermined expand input relating to the representative character displayed in the initial view of the candidate display area; and
Concurrently display, in response to receiving the predetermined expand input, a magnified view of a representative character and a corresponding magnified view of one or more other characters of the set of visually similar characters.
59. The method of claim 58, wherein the predetermined expand input comprises an expand gesture detected over a representative character displayed in the candidate display area.
60. The method of claim 58, wherein the predetermined expansion input includes a contact detected above a representative character displayed in the candidate display area and lasting longer than a predetermined threshold time.
61. The method of claim 57, wherein the predetermined selection criteria is based on relative frequency of use of the characters in the group.
62. The method of claim 57, wherein the predetermined selection criteria is based on a preferred input language associated with the device.
63. A method comprising any combination of the features of claims 53-62.
64. A method of providing real-time handwriting recognition, comprising, at a device having one or more processors and memory:
Receiving a handwriting input from a user, the handwriting input comprising a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface;
identifying a plurality of output characters from the handwriting input based on a handwriting recognition model, the output characters including at least a first emoji character and at least a first character from a script of a natural human language; and
displaying recognition results in a candidate display area of the handwriting input interface, the recognition results including the first emoji character and the first character from the text of the natural human language.
65. The method of claim 64, further comprising:
identifying at least a first semantic unit from the handwriting input based on the handwriting recognition model, wherein the first semantic unit comprises respective characters, words, or phrases capable of conveying respective semantic meanings in respective human languages;
identifying a second emoji character associated with the first semantic unit identified from the handwriting input; and
displaying a second recognition result in the candidate display area of the handwriting input interface, wherein the second recognition result at least comprises the second emoticon character recognized from the first semantic unit.
66. The method of claim 65, displaying the second recognition result further comprising:
displaying the second recognition result simultaneously with a third recognition result including at least the first semantic unit.
67. The method of claim 64, further comprising:
receiving a user input selecting the first recognition result displayed in the candidate display area; and
in response to the user input, entering text of the selected first recognition result in a text entry area of the handwriting input interface, wherein the text includes at least the first emoji character and the first character of the word from the natural human language.
68. The method of claim 64, wherein the handwriting recognition model has been trained for a multi-script training corpus comprising writing samples corresponding to characters of at least three non-overlapping scripts, and the three non-overlapping scripts comprise a set of emoji characters, Chinese characters, and Latin scripts.
69. The method of claim 64, further comprising:
identifying a second semantic unit corresponding to the first emoji character identified from the handwriting input;
Displaying a fourth recognition result in the candidate display area of the handwriting input interface, wherein the fourth recognition result at least comprises the second semantic unit recognized from the first emoticon character.
70. The method of claim 69, wherein displaying the fourth recognition result further comprises:
displaying the fourth recognition result in the candidate display area simultaneously with the first recognition result.
71. A method comprising any combination of the features of claims 64-70.
72. A method of providing handwriting recognition, comprising:
at a device having memory and one or more processors:
receiving a handwritten input from a user, the handwritten input including a plurality of handwritten strokes provided in a touch-sensitive surface coupled to the device;
rendering the plurality of handwritten strokes in real-time in a handwriting input area of a handwriting input interface;
receiving one of a pinch gesture input and an expand gesture input over the plurality of handwritten strokes;
generating, when a pinch gesture input is received, a first recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as a single recognition unit;
Generating, when an expand gesture input is received, a second recognition result based on the plurality of handwritten strokes by processing the plurality of handwritten strokes as two independent recognition units pulled apart by the expand gesture input; and
when a respective one of the first recognition result and the second recognition result is generated, displaying the generated recognition result in a candidate display area of the handwriting input interface.
73. The method of claim 72, wherein the pinch gesture input includes two contacts on the touch-sensitive surface that are close to each other in an area occupied by the plurality of handwritten strokes.
74. The method of claim 72, wherein the expand gesture input includes two contacts on the touch-sensitive surface that are separated from each other in an area occupied by the plurality of handwritten strokes.
75. The method of claim 72, further comprising:
identifying two adjacent recognition units from the plurality of handwritten strokes;
displaying an initial recognition result in the candidate display area, the initial recognition result comprising
Respective characters identified from the two adjacent identification cells; and
Receiving the pinch gesture input while displaying the initial recognition result in the candidate display area.
76. The method of claim 75, wherein displaying the first recognition result further comprises replacing the initial recognition result with the first recognition result in the candidate display area.
77. The method of claim 75, further comprising:
in response to the pinch gesture input, re-rendering the plurality of handwritten strokes to reduce a distance between the two adjacent recognition units in the handwriting input area.
78. The method of claim 72, further comprising:
identifying a single recognition unit from the plurality of handwritten strokes; displaying an initial recognition result including characters recognized from the single recognition unit in the candidate display area; and
receiving the expand gesture input while displaying the initial recognition result in the candidate display area.
79. The method of claim 78, wherein displaying the second recognition result further comprises replacing the initial recognition result with the second recognition result in the candidate display area.
80. The method of claim 79, further comprising:
in response to the expand gesture input, re-rendering the plurality of handwritten strokes to increase a distance between a first subset of strokes assigned to a first recognition unit and a second subset of handwritten strokes assigned to a second recognition unit in the handwriting input area.
81. A method comprising any combination of the features of claims 72-80.
82. A method of providing handwriting recognition, comprising:
receiving a handwriting input from a user, the handwriting input comprising a plurality of handwritten strokes provided in a handwriting input area of a handwriting input interface;
identifying a plurality of recognition units from the plurality of handwritten strokes, each recognition unit including a respective subset of the plurality of handwritten strokes;
generating a multi-character recognition result including respective characters recognized from the plurality of recognition units;
displaying the multi-character recognition result in a candidate display area of the handwriting input interface;
receiving a deletion input from the user while the multi-character recognition result is displayed in the candidate display area; and
in response to receiving the deletion input, removing an end character from the multi-character recognition result displayed in the candidate display area.
83. The method of claim 82, further comprising:
rendering the plurality of handwritten strokes in the handwriting input area of the handwriting input interface while the plurality of handwritten strokes are provided by the user in real-time; and
in response to receiving the deletion input, removing the respective subset of the plurality of handwritten strokes from the handwriting input area, the respective subset of the plurality of handwritten strokes corresponding to an end recognition unit in a spatial sequence formed by the plurality of recognition units in the handwriting input area, wherein the end recognition unit corresponds to the end character in the multi-character recognition result.
84. The method of claim 83, wherein the end recognition unit does not include a temporally last handwritten stroke of the plurality of handwritten strokes provided by the user.
85. The method of claim 83, further comprising:
visually distinguishing the end recognition unit from other recognition units recognized in the handwriting input area in response to receiving the initial portion of the deletion input.
86. The method of claim 85, wherein the initial portion of the deletion input is an initial contact detected on a deletion button in the handwriting input interface, and the deletion input is detected when the initial contact is sustained for more than a predetermined threshold amount of time.
87. The method of claim 83, wherein the end recognition unit corresponds to a handwritten chinese character.
88. The method of claim 83, wherein the handwritten input is written in a cursive writing style.
89. The method of claim 83, wherein the handwritten input corresponds to a plurality of Chinese characters written in a cursive writing style.
90. The method of claim 83, wherein at least one of the handwritten strokes is divided into two adjacent recognition units of the plurality of recognition units.
91. The method of claim 83, wherein the deletion input is a continuous contact on a deletion button provided in the handwriting input interface, and wherein removing the respective subset of the plurality of handwritten strokes further comprises:
removing said subset of handwritten strokes in said end recognition unit from said handwriting input area on a stroke-by-stroke basis in a reverse order of the chronological order of said subset of handwritten strokes that has been provided by said user.
92. The method of claim 82, further comprising:
generating a partial recognition result comprising a subset of the respective characters recognized from the plurality of recognition units, wherein each character in the subset of the respective characters satisfies a predetermined confidence threshold; and
Displaying the partial recognition result in the candidate display area of the handwriting input interface simultaneously with the multi-character recognition result.
93. The method of claim 92, wherein the partial recognition result does not include at least the last character in a multi-character recognition result.
94. The method of claim 92, wherein the partial recognition result does not include at least an initial character in a multi-character recognition result.
95. The method of claim 92, wherein the partial recognition result does not include at least an intermediate character in a multi-character recognition result.
96. A method comprising any combination of the features of claims 82-95.
97. A method of providing real-time handwriting recognition, comprising:
at a device having memory and one or more processors:
determining an orientation of the device;
providing a handwriting input interface on the device in a horizontal input mode in accordance with the device being in a first orientation, wherein a respective line of handwriting input entered in the horizontal input mode is segmented into one or more respective recognition units along a horizontal writing direction; and
providing the handwriting input interface on the device in a vertical input mode in accordance with the device being in a second orientation, wherein a respective line of handwriting input entered in the vertical input mode is segmented into one or more respective recognition units along a vertical writing direction.
98. The method of claim 97, further comprising:
when operating in the horizontal input mode:
detecting a change in device orientation from the first orientation to the second orientation; and
switching from the horizontal input mode to the vertical input mode in response to the change in device orientation.
99. The method of claim 97, further comprising:
when operating in the vertical input mode:
detecting a change in device orientation from the second orientation to the first orientation; and
switching from the vertical input mode to the horizontal input mode in response to the change in device orientation.
100. The method of claim 97, further comprising:
when operating in the horizontal input mode:
receiving a first multi-word handwriting input from the user; and
in response to the first multi-word handwriting input, presenting a first multi-word recognition result in a candidate display area of the handwriting input interface according to the horizontal writing direction; and
when operating in the vertical input mode:
receiving a second multi-word handwriting input from the user; and
and responding to the second multi-word handwriting input, and presenting a second multi-word recognition result in the candidate display area according to the vertical writing direction.
101. The method of claim 100, further comprising:
receiving a first user input for selecting the first multi-word recognition result;
receiving a second user input for selecting the second multi-word recognition result;
displaying corresponding texts of the first multi-word recognition result and the second multi-word recognition result simultaneously in a text input area of the handwriting input interface, wherein the corresponding text of the first multi-word recognition result is displayed according to the horizontal writing direction, and the corresponding text of the second multi-word recognition result is displayed according to the vertical writing direction.
102. The method of claim 97, wherein the handwriting input area accepts multiple lines of handwriting input in the horizontal writing direction and has a default top-to-bottom paragraph direction.
103. The method of claim 97, wherein the horizontal writing direction is from left to right.
104. The method of claim 97, wherein the horizontal writing direction is from right to left.
105. The method of claim 97, wherein the handwriting input area accepts multiple lines of handwriting input in the vertical writing direction and has a default left-to-right paragraph direction.
106. The method of claim 97, wherein the handwriting input area accepts multiple lines of handwriting input in the vertical writing direction and has a default right-to-left paragraph direction.
107. The method of claim 97, wherein the vertical writing direction is from top to bottom.
108. The method of claim 97, wherein the first orientation defaults to a landscape orientation and the second orientation defaults to a portrait orientation.
109. The method of claim 97, further comprising:
providing a respective affordance in the handwriting input interface for manually switching between the horizontal input mode and the vertical input mode regardless of the device orientation.
110. The method of claim 97, further comprising:
providing a corresponding affordance in the handwriting input interface for manually switching between two selectable writing directions.
111. The method of claim 97, further comprising:
providing a corresponding affordance in the handwriting input interface for manually switching between two selectable paragraph directions.
112. The method of claim 97, further comprising:
Receiving a handwriting input from a user, the handwriting input comprising a plurality of handwritten strokes provided in the handwriting input area of the handwriting input interface;
displaying one or more recognition results in a candidate display area of the handwriting input interface in response to the handwriting input;
while displaying the one or more recognition results in the candidate display area, detecting a user input for switching from a current handwriting input mode to an alternative handwriting input mode;
in response to the user input:
switching from the current handwriting input mode to the alternate handwriting input mode;
clearing the handwriting input from the handwriting input area; and
automatically inputting a top-ranked recognition result of the one or more recognition results displayed in the candidate display area into a text input area of the handwriting input interface.
113. The method of claim 112, wherein the user input is rotating the device from a current orientation to a different orientation.
114. The method of claim 112, wherein the user input is invoking an affordance to manually switch from the current handwriting input mode to the alternate handwriting input mode.
115. A method comprising any combination of the features of claims 97-114.
116. A method of providing real-time handwriting recognition, comprising:
at a device having memory and one or more processors:
receiving a handwritten input from a user, the handwritten input including a plurality of handwritten strokes provided on a touch-sensitive surface coupled to the device;
rendering the plurality of handwritten strokes in a handwriting input area of a handwriting input interface;
segmenting the plurality of handwritten strokes into two or more recognition units, each recognition unit including a respective subset of the plurality of handwritten strokes;
receiving an edit request from the user;
visually distinguishing the two or more recognition units in the handwriting input area in response to the edit request; and
means are provided for deleting each of the two or more recognition units independently from the handwriting input area.
117. The method of claim 116, wherein the means for independently deleting each of the two or more identification units is a respective delete button displayed adjacent to the each identification unit.
118. The method of claim 116, wherein the means for independently deleting each of the two or more recognition units is a means for detecting a predetermined delete gesture input over the each recognition unit.
119. The method of claim 116, wherein visually distinguishing the two or more recognition units further comprises highlighting respective boundaries between the two or more recognition units in the handwriting input area.
120. The method of claim 116, wherein the edit request is a contact detected over a predetermined affordance provided in the handwriting input interface.
121. The method of claim 116, wherein the edit request is a tap gesture detected over a predetermined area in the handwriting input interface.
122. The method of claim 121, wherein the predetermined area is within the handwriting input area of the handwriting input interface.
123. The method of claim 121, wherein the predetermined area is outside of the handwriting input area of the handwriting input interface.
124. The method of claim 116, further comprising:
receiving a deletion input from the user and through the provided device for independently deleting a first recognition unit of the two or more recognition units from the handwriting input area; and
in response to the deletion input, removing the respective subset of handwritten strokes in the first recognition unit from the handwriting input area.
125. The method of claim 124, wherein the first recognition unit is a spatially initial recognition unit of the two or more recognition units.
126. The method of claim 124, wherein the first recognition unit is a spatially intermediate recognition unit of the two or more recognition units.
127. The method of claim 124, further comprising:
generating a segmentation grid from the plurality of handwritten strokes, the segmentation grid including a plurality of alternating segmentation chains that each represent a respective set of recognition units recognized from the plurality of handwritten strokes;
receiving two or more consecutive edit requests from the user;
visually distinguishing the respective set of identified cells from a different one of the plurality of alternate segmentation chains in the handwriting input area in response to each of the two or more successive editing requests; and
Means are provided for independently deleting each recognition unit of the respective set of recognition units currently represented in the handwriting input area.
128. A method comprising any combination of the features of claim 116 and 127.
129. A method of providing real-time handwriting recognition, comprising:
at a device having memory and one or more processors:
receiving a first handwritten input from a user, the first handwritten input including a plurality of handwritten strokes, and the plurality of handwritten strokes forming a plurality of recognition units distributed along respective writing directions associated with a handwriting input area of a handwriting input interface;
rendering each of the plurality of handwritten strokes in the handwriting input area when the handwritten stroke is provided by the user;
after completely rendering the recognition unit, starting a respective fade-out process for each recognition unit of the plurality of recognition units, wherein the rendering of the recognition unit in the first handwritten input fades out gradually during the respective fade-out process;
receiving, from the user, a second handwriting input over an area of the handwriting input area occupied by a faded-out recognition unit of the plurality of recognition units; and
In response to receiving the second handwriting input:
rendering the second handwriting input in the handwriting input area; and
all faded recognition units are removed from the handwriting input area.
130. The method of claim 129, further comprising:
generating one or more recognition results for the first handwritten input;
displaying the one or more recognition results in a candidate display area of the handwriting input interface; and
in response to receiving the second handwriting input, automatically inputting, without user selection, the top-ranked recognition result displayed in the candidate display area into a text input area of the handwriting input interface.
131. The method of claim 129, further comprising:
storing an input stack comprising the first handwritten input and the second handwritten input;
generating one or more multi-character recognition results that each include a respective spatial sequence of characters recognized from the concatenated form of the first and second handwritten inputs; and
displaying the one or more multi-character recognition results in a candidate display area of the handwriting input interface while the rendering of the second handwriting input has replaced the rendering of the first handwriting input in the handwriting input area.
132. The method of claim 129, wherein the respective fade-out process is initiated for each recognition unit when a predetermined period of time has elapsed after completion of the recognition unit by the user.
133. The method of claim 129, wherein the respective fading process is initiated for each recognition unit when the user has begun inputting the stroke for a next recognition unit after the recognition unit.
134. The method of claim 129, wherein a final state of the respective fade-out process for each recognition unit is a state of predetermined minimum visibility for the recognition unit.
135. The method of claim 129, wherein a final state of the respective fade-out process for each recognition unit is a state with zero visibility for the recognition unit.
136. The method of claim 129, further comprising:
receiving a predetermined recovery input from the user after a last recognition unit in the first handwritten input has faded out; and
restoring the last identified cell from the faded-out state to a non-faded-out state in response to receiving the predetermined restoration input.
137. The method of claim 136, wherein the predetermined recovery input is an initial contact detected on a delete button provided in the handwriting input interface.
138. The method of claim 136, wherein holdings detected on the delete button
The continued contact deletes the last recognition unit from the handwriting input area and restores the penultimate recognition unit from the faded-out state to the un-faded-out state.
139. A method comprising any combination of the features of claims 129-138.
140. A method of providing handwriting recognition, comprising:
at a device having memory and one or more processors:
independently training a set of spatially-derived features and a set of temporally-derived features of a handwriting recognition model, wherein:
training the set of spatially-derived features against a corpus of training images, each image in the corpus of training images being an image of a handwritten sample for a respective character in an output character set; and
training the set of temporally-derived features against a corpus of stroke distribution profiles, each stroke distribution profile numerically characterizing a spatial distribution of a plurality of strokes in a handwriting sample for a respective character in the output set of characters;
Combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model; and
the handwriting recognition model is used to provide real-time handwriting recognition for a user's handwriting input.
141. The method of claim 140, wherein independently training the set of spatially-derived features further comprises:
training a convolutional neural network having an input layer, an output layer, and a plurality of convolutional layers, including a first convolutional layer, a last convolutional layer, zero or more intermediate convolutional layers between the first convolutional layer and the last convolutional layer, and a hidden layer between the last convolutional layer and the output layer.
142. The method of claim 141, wherein independently training the set of temporally-derived features further comprises:
providing the plurality of stroke distribution profiles to a statistical model to determine a plurality of temporally-derived parameters and respective weights for the plurality of temporally-derived parameters for classifying the respective characters in the output character set.
143. The method of claim 142, wherein combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model comprises:
Injecting the plurality of spatially-derived parameters and the plurality of temporally-derived parameters into one of the convolutional layer or the hidden layer of the convolutional neural network.
144. The method of claim 143, wherein the plurality of temporally-derived parameters and respective weights for the plurality of temporally-derived parameters are injected into the last convolutional layer of the convolutional neural network for handwriting recognition.
145. The method of claim 143, wherein the plurality of temporally-derived parameters and respective weights for the plurality of temporally-derived parameters are injected into the hidden layer of the convolutional handwriting recognition.
146. The method of claim 140, further comprising:
generating a corpus of the stroke distribution profiles from a plurality of writing samples,
wherein each handwriting sample of the plurality of handwriting samples corresponds to a character in the output character set and retains, for each constituent stroke of the handwriting sample, respective spatial information as it was written, an
Wherein generating the corpus of stroke distribution profiles further comprises:
for each handwriting sample of the plurality of handwriting samples:
identifying constituent strokes in the handwriting sample;
For each of the recognized strokes of the handwriting sample, calculating a respective duty cycle in each of a plurality of predetermined directions, the respective duty cycle being a ratio between a projection span of the each stroke direction and a maximum projection span of the handwriting sample;
for each of the identified strokes of the handwriting sample, calculating a respective saturation ratio for the each stroke based on a ratio between a respective number of pixels within the each stroke and a total number of pixels within the handwriting sample; and
generating a feature vector for the handwriting sample as the stroke distribution profile of the handwriting sample, the feature vector comprising the respective duty cycles and the respective saturation ratios for at least N strokes of the handwriting sample, where N is a predetermined natural number.
147. The method of claim 146, wherein N is less than a maximum stroke count observed in any single writing sample within the plurality of writing samples.
148. The method of claim 147, further comprising: for each handwriting sample of the plurality of handwriting samples:
Sorting the respective duty cycles of the recognized strokes in each of the predetermined directions in descending order; and
only the N top-ranked duty cycles and saturation ratios of a writing sample are included in the feature vector of the writing sample.
149. The method of claim 146, wherein the plurality of predetermined directions includes a horizontal direction, a vertical direction, a positive 45 degree direction, and a negative 45 degree direction of the writing sample.
150. The method of claim 140, wherein using the handwriting recognition model to provide real-time handwriting recognition for a user's handwriting input further comprises:
receiving handwriting input of the user;
in response to receiving the user's handwriting input, providing a handwriting recognition output to the user substantially simultaneously with the receiving the handwriting input.
151. A method comprising any combination of the features as recited in claims 140-150.
152. A non-transitory computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform operations comprising:
Independently training a set of spatially-derived features and a set of temporally-derived features of a handwriting recognition model, wherein:
training the set of spatially-derived features against a corpus of training images, each image in the corpus of training images being an image of a handwritten sample for a respective character in an output character set; and
training the set of temporally-derived features against a corpus of stroke distribution profiles, each stroke distribution profile numerically characterizing a spatial distribution of a plurality of strokes in a handwriting sample for a respective character in the output set of characters;
combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model; and
real-time handwriting recognition is provided for handwriting input by a user using the handwriting recognition model.
153. A non-transitory computer readable medium having stored thereon instructions which, when executed by one or more processors, cause the processors to perform any of the methods as described in claims 140-150.
154. A system, comprising:
one or more processors; and
memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform operations comprising:
Independently training a set of spatially-derived features and a set of temporally-derived features of a handwriting recognition model, wherein:
training the set of spatially-derived features against a corpus of training images, each image in the corpus of training images being an image of a handwritten sample for a respective character in an output character set; and
training the set of temporally-derived features against a corpus of stroke distribution profiles, each stroke distribution profile numerically characterizing a spatial distribution of a plurality of strokes in a handwriting sample for a respective character in the output set of characters;
combining the set of spatially-derived features and the set of temporally-derived features in the handwriting recognition model; and
the handwriting recognition model is used to provide real-time handwriting recognition for a user's handwriting input.
155. A system, comprising:
one or more processors; and
memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform any of the methods recited in claims 140-150.
156. A method comprising any combination of the features of claims 1-150.
157. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the methods of claims 1-150.
158. A system, comprising:
one or more processors; and
memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform any of the methods of claims 1-150.
159. An electronic device, comprising:
a display;
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-150.
160. A graphical user interface on an electronic device with a display, a memory, and one or more processors to execute one or more programs stored in the memory, the graphical user interface comprising user interfaces displayed in accordance with any of the methods of claims 1-150.
161. An electronic device, comprising:
a display; and
means for performing any of the methods of claims 1-150.
162. An information processing apparatus for use in an electronic device having a display, comprising:
means for performing any of the methods of claims 1-150.
Applications Claiming Priority (13)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201361832921P | 2013-06-09 | 2013-06-09 | |
| US201361832934P | 2013-06-09 | 2013-06-09 | |
| US201361832908P | 2013-06-09 | 2013-06-09 | |
| US201361832942P | 2013-06-09 | 2013-06-09 | |
| US61/832,934 | 2013-06-09 | ||
| US61/832,921 | 2013-06-09 | ||
| US61/832,942 | 2013-06-09 | ||
| US61/832,908 | 2013-06-09 | ||
| US14/290,935 US9898187B2 (en) | 2013-06-09 | 2014-05-29 | Managing real-time handwriting recognition |
| US14/290,945 | 2014-05-29 | ||
| US14/290,945 US9465985B2 (en) | 2013-06-09 | 2014-05-29 | Managing real-time handwriting recognition |
| US14/290,935 | 2014-05-29 | ||
| PCT/US2014/040417 WO2014200736A1 (en) | 2013-06-09 | 2014-05-30 | Managing real - time handwriting recognition |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1220276A1 true HK1220276A1 (en) | 2017-04-28 |
| HK1220276B HK1220276B (en) | 2019-11-22 |
Family
ID=
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10884617B2 (en) | 2016-06-12 | 2021-01-05 | Apple Inc. | Handwriting keyboard for screens |
| US11016658B2 (en) | 2013-06-09 | 2021-05-25 | Apple Inc. | Managing real-time handwriting recognition |
| US11112968B2 (en) | 2007-01-05 | 2021-09-07 | Apple Inc. | Method, system, and graphical user interface for providing word recommendations |
| US11194467B2 (en) | 2019-06-01 | 2021-12-07 | Apple Inc. | Keyboard management user interfaces |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11112968B2 (en) | 2007-01-05 | 2021-09-07 | Apple Inc. | Method, system, and graphical user interface for providing word recommendations |
| US11416141B2 (en) | 2007-01-05 | 2022-08-16 | Apple Inc. | Method, system, and graphical user interface for providing word recommendations |
| US11016658B2 (en) | 2013-06-09 | 2021-05-25 | Apple Inc. | Managing real-time handwriting recognition |
| US11182069B2 (en) | 2013-06-09 | 2021-11-23 | Apple Inc. | Managing real-time handwriting recognition |
| US11816326B2 (en) | 2013-06-09 | 2023-11-14 | Apple Inc. | Managing real-time handwriting recognition |
| US10884617B2 (en) | 2016-06-12 | 2021-01-05 | Apple Inc. | Handwriting keyboard for screens |
| US11640237B2 (en) | 2016-06-12 | 2023-05-02 | Apple Inc. | Handwriting keyboard for screens |
| US11941243B2 (en) | 2016-06-12 | 2024-03-26 | Apple Inc. | Handwriting keyboard for screens |
| US12422979B2 (en) | 2016-06-12 | 2025-09-23 | Apple Inc. | Handwriting keyboard for screens |
| US11194467B2 (en) | 2019-06-01 | 2021-12-07 | Apple Inc. | Keyboard management user interfaces |
| US11620046B2 (en) | 2019-06-01 | 2023-04-04 | Apple Inc. | Keyboard management user interfaces |
| US11842044B2 (en) | 2019-06-01 | 2023-12-12 | Apple Inc. | Keyboard management user interfaces |
Also Published As
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7361156B2 (en) | Managing real-time handwriting recognition | |
| US11816326B2 (en) | Managing real-time handwriting recognition | |
| US9934430B2 (en) | Multi-script handwriting recognition using a universal recognizer | |
| US20140361983A1 (en) | Real-time stroke-order and stroke-direction independent handwriting recognition | |
| US20140363082A1 (en) | Integrating stroke-distribution information into spatial feature extraction for automatic handwriting recognition | |
| HK1220276B (en) | Managing real - time handwriting recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PC | Patent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee) |
Effective date: 20230528 |