[go: up one dir, main page]

WO2000034942A1 - Method and system for recognizing musical notations using a compass-direction user interface - Google Patents

Method and system for recognizing musical notations using a compass-direction user interface Download PDF

Info

Publication number
WO2000034942A1
WO2000034942A1 PCT/US1999/029410 US9929410W WO0034942A1 WO 2000034942 A1 WO2000034942 A1 WO 2000034942A1 US 9929410 W US9929410 W US 9929410W WO 0034942 A1 WO0034942 A1 WO 0034942A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
directions
starting
ending
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1999/029410
Other languages
French (fr)
Inventor
Marlin Eller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUNHAWK Corp
Original Assignee
SUNHAWK Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUNHAWK Corp filed Critical SUNHAWK Corp
Priority to AU20505/00A priority Critical patent/AU2050500A/en
Publication of WO2000034942A1 publication Critical patent/WO2000034942A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/228Character recognition characterised by the type of writing of three-dimensional handwriting, e.g. writing in the air
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present invention relates to user interfaces and, in particular, to a user interface that translates input gestures received from an input device into a corresponding musical notation.
  • Human readable music is typically represented as sheet music, which is the printed representation of the musical notes.
  • Sheet music uses staffs to represent, for example, a treble clef and a base clef.
  • the placement of symbols on and between the horizontal lines of the staffs indicates the music that comprises the musical score. These symbols include full notes, half notes, quarter notes, full rests, half rests and quarter rests, etc.
  • Machine readable music can be represented in many different formats based on the machine that is to play the music. For example, the Musical Instrument Digital Interface (“MIDI”) standard specifies that aspects of music, such as pitch and volume, are encoded in 8-bit bytes of digital information.
  • MIDI Musical Instrument Digital Interface
  • Computers have been used to assist composers in generating sheet music for musical compositions.
  • composers can use an electronic pen and tablet in much the same way as they would use a pen and paper tablet to create sheet music.
  • the composer can more easily revise the musical composition using the electronic pen and tablet.
  • Some computer-based composition tools may even perform limited "recognition" of the musical symbols (e.g., recognizing an A-sharp quarter note).
  • Such computer-based composition tools have several drawbacks.
  • recognition of the notes has typically been less than satisfactoiy. To obtain satisfactory recognition, a composer would need to handwrite the notes very slowly and carefully, which tends to significantly slow the creative process of generating the musical composition.
  • Embodiments of the present invention provide a computer-based method and system for selecting data items using gestures.
  • Each data item has a direction associated with it.
  • the system receives a gesture from a user.
  • the system determines the direction associated with the received gesture.
  • the system identifies the data item associated with the determined direction and indicates that the identified data item is the selected data item.
  • two directions may be associated with each data item and gesture.
  • the directions may be the starting and ending direction of the gesture.
  • a musical editing system uses the gestures as an indication of editing commands to apply to a musical score.
  • each gesture may have a starting and ending direction.
  • the music editing system displays the musical notation to be edited.
  • the music editing system receives a gesture with a starting and ending direction that is drawn in relation to the displayed musical notation.
  • the music editing system then recognizes the gesture as one of the pre-defined gestures.
  • the music editing system determines the relation between the recognized gesture and the displayed musical notation.
  • the music editing system modifies the notation based on the recognized gesture and the determined relation.
  • Figure 1 illustrates possible gestures of a compass-direction user interface that differentiates eight directions.
  • Figure 2 illustrates an exemplary computing system for use in conjunction with the compass-direction user interface for editing musical scores.
  • Figure 3 is a block diagram of a musical editing system that employs a compass-direction user interface.
  • Figures 4A-4J illustrate the inputting of musical notations using the compass-direction user interface.
  • Figures 5A-5H illustrate an exemplary embodiment of the present invention for drawing another series of musical notations.
  • Figures 6A-6M provide an exemplary compass-direction user interface applied to a system for drawing electronic circuits.
  • Embodiments of the present invention provide a method and system for inputting and recognizing information that uses a compass-direction user interface to specify the information.
  • a compass-direction user interface allows a user to specify information by drawing one or more lines in various compass directions. For example, a line drawn in the south direction (i.e., from top to bottom) may represent one type of information, and a line drawn in the north-west direction may represent another type of information.
  • the compass- direction user interface also allows a user to specify information by drawing a line that changes direction.
  • a line drawn starting in the south direction that then abruptly changes direction to the north-west direction may represent one type of information
  • a line drawn starting in the south direction that then abruptly changes direction to the west direction may represent another type of information.
  • eight possible compass directions N, NE, E, SE, S, SW, W and NW
  • 64 different lines also referred to as gestures
  • many more types of information can be represented if the context of where the line is drawn is considered.
  • the compass-direction user interface is particularly well-suited to input of musical information.
  • the compass-direction user interface captures gestures (e.g., ", " J", “ - ⁇ -J ”) from an electronic tablet and determines the corresponding musical notation represented by the gesture.
  • the compass-direction user interface provides a low error rate in recognizing the correct musical notations for a user's input but also mimic the traditional input procedures of composers and music copyists in drawing music on paper using a pen.
  • the compass-direction user interface operates in an intuitive manner for those already skilled in the art of transcribing music, minimizing their training time in use of the invention.
  • Figure 1 illustrates possible gestures of a compass-direction user interface that differentiates eight directions.
  • the number of directions can be decreased or increased based on the number of gestures needed to represent the types of information. For example, if eight gestures are needed, then four compass directions (i.e., N, E, S, and W) would be sufficient. If 256 gestures are needed, then 16 compass directions (e.g., N, N-NE, NE, E-NE, and E) are necessary. As the number of gestures increases, however, it becomes more difficult for the user to accurately draw the intended gesture and thus the recognition of the intended gesture also becomes more difficult.
  • each row of Figure 1 represents the starting direction of the gesture, and each column represents the ending direction of the gesture.
  • the gesture at row N and column E starts in the north direction and ends in the east direction (i.e. , "I ")
  • the gesture at row SE and column SW starts in the
  • Each gesture on the diagonal of the table represents a gesture that starts in a compass direction and ends in that same direction.
  • these gestures comprise a single stroke, which also can be considered as two strokes in the same direction.
  • a stroke can be considered to be a line that does not change direction.
  • the gestures that start in one compass direction and end in the opposite compass direction are written as two strokes: the first stroke in one direction and the second stroke in the opposite direction. The second stroke overlaps the first stroke and is represented as the dashed line in Figure 1.
  • the compass-direction user interface allows a user to specify up to 64 different gestures based on compass direction. Each of these gestures can be mapped to a different data item that can either represent an action to perform (e.g., "insert"), information selection (e.g., "a quarter note”) or both.
  • one embodiment of the compass-direction user interface allows a user to specify information by using a circular gesture of varying sizes. For example, a large circular gesture can be used to select an encircled item or indicate a whole note depending on the context, and a small circular gesture can be used to indicate the quarter portion of a three-quarter note if drawn close to a half note.
  • FIG. 2 illustrates an exemplary computing system for use in conjunction with the compass-direction user interface for editing musical scores.
  • An exemplary computing system comprises a personal computer 206 having a keyboard 203, a monitor 205, a display 204, a pen 201, and a tablet 202.
  • the computer may also include a persistent storage device 207 that stores the musical notations provided by the user.
  • the computer contains a central processing unit for executing instructions stored in memory.
  • the memory and persistent storage are considered to be types of computer-readable medium.
  • the pen and the tablet transmit to the computer gestures made by a user using the pen.
  • the display displays the pen's trace on the tablet, providing the user with visual feedback of the gestures.
  • the tablet may also overlay a display so that the user can see the gestures and the musical score as it is edited on the tablet.
  • a gesture recognizer associated with the compass-direction user interface identifies the input data as a particular gesture, and a music recognizer selects the appropriate musical notation by referencing a context associated with the identified gesture.
  • the gesture recognizer of the compass-based user interface may be implemented as part of the electronic tablet. That is, a computer that is part of the electronic tablet may implement the gesture recognizer. This "computer" may be implemented as discrete logic specially developed to implement the gesture recognizer.
  • the display displays the translated input data as a musical notation.
  • the computer may also include a means for uploading and downloading musical information across a network.
  • the music shown on the display may have been provided from a repository of digital music retrieved by the user over a network.
  • the user may edit retrieved musical scores by making appropriate inputs with the pen.
  • the compass-direction user interface allows users to input musical notations for recognition and interpretation by the computer according to a predetermined format.
  • the user Utilizing the pen (or other input device such as a mouse or finger in a touch sensitive display), the user inputs gestures corresponding to a predetermined format for recognition by a gesture recognizer in the computer.
  • the predetermined gesture format utilized by the compass-direction user interface comprises line gestures and circular gestures, according to an embodiment of the invention. Each line gesture has two compass directions: a starting and ending direction. Some line gestures will appear to have one direction because the starting and ending directions are the same direction.
  • FIG. 3 is a block diagram of a musical editing system that employs a compass-direction user interface.
  • the system comprises a tablet interface 301, a gesture recognizer 302, a music recognizer 303, and a music editing component 304.
  • the tablet interface receives the x, y-coordinates of the pen strokes from the tablet and echoes the pen strokes onto the display to give the user visual feedback.
  • the x, y coordinates are provided to the gesture recognizer, which categorizes the strokes as one of the 66 gestures and passes the recognized gestures to the music recognizer.
  • the music recognizer identifies the edit represented by the gesture based on the position of the gesture relative to the currently displayed musical notation.
  • the music recognizer then sends the appropriate instruction to the music editing component to edit the music.
  • the music editing component can be a conventional component with the music recognizer being considered a user interface front end.
  • the music editing system may define some input gestures to indicate an action for the music recognizer to perform rather than a musical notation. For example, a gesture comprising in the upward vertical direction (/ ' . e. , north-to-north) may indicate an "undo" notation that instructs the music recognizer to undo the effect of the last gesture.
  • a gesture that creates a note head, followed immediately by a gesture comprising a north-to-north gesture instructs the music recognizer to delete the note head.
  • the gesture recognizer captures gestures from the moment of pen down (i.e., pen comes in contact with the tablet) until pen up (i.e., pen ceases contact with the tablet).
  • the gesture recognizer tracks movements of the pen.
  • a gesture is delimited by a pen up and a pen down.
  • the input gesture recognizer recognizes the gesture and provides a data structure as shown in Table 1 to the music recognizer.
  • the data structure includes a start direction ("StartD”) and an end direction (“EndD”). If the gesture involves only a single direction, then the ending direction is left empty or, alternatively, may be set to the same direction.
  • the data structure further includes a starting x-coordinate ("StartX”) and a starting y-coordinate (“StartY”). The starting coordinates correspond to a location where the pen first contacts the tablet.
  • the data structure also includes an ending x-coordinate (“EndX”) and an ending y-coordinate (“EndY”). The ending coordinates correspond to a location where the pen last contacts the tablet.
  • the data structure includes an x-coordinate ("MidX”) and a y-coordinate (“MidY”) corresponding to the location where the gesture changed direction.
  • the data structure further includes a bounding box ("BBox”) that defines the smallest rectangle (or octagon) into which the gesture will fit.
  • BBox bounding box
  • Some embodiments of the gesture recognizer may contain a timer that measures the time passing between gestures.
  • the gesture recognizer interprets gestures arriving within a predetermined period of time as being a single gesture, rather than two independent gestures. These embodiments may be particularly useful for individuals with disabilities that prevent them from keeping the pen in contact with the tablet long enough to execute two connected lines.
  • the music recognizer assigns some gestures to musical notations directly while using context to disambiguate other gestures. For example, a large circle may either be a whole note or indicate selection of an existing musical notation, depending on whether the user has drawn the circle in a blank area near a staff line or in an area containing other musical notations.
  • the following provides an example of how a user may input a musical notation, according to an embodiment of the invention. If a user wishes to draw a quarter note, the user first draws a particular gesture which the music recognizer will interpret as the note head. The user next draws a south-to-south gesture (i.e., a vertical line drawn downward) that attaches to the note head which the music recognizer interprets as a note stem. The music recognizer determines that the user has input a quarter note, combines these two gestures, and replaces them with a single quarter note on the display. 10
  • a south-to-south gesture i.e., a vertical line drawn downward
  • Figures 4A-4J illustrate the inputting of musical notations using the compass-direction user interface.
  • the music recognizer interprets the gestures according to the data shown in Table 2 as described below.
  • Figure 4 A shows a southwest-to-southwest gesture 401 drawn in between two staff lines.
  • the music recognizer interprets that gesture as corresponding to a quarter note head 402.
  • the music recognizer replaces that gesture with the note head placed in between the staff lines.
  • Figure 4C shows the next gesture as a south-to-south gesture 403 that touches the note head.
  • the music recognizer identifies that gesture connected to the note head indicating a quarter note 404.
  • Figure 4D also shows that the user has input an east-to-east gesture 405 near the top of the stem of the quarter note.
  • the music recognizer identifies that gesture as changing the quarter note into an eighth note 406.
  • the completed eighth note and its location within the musical score now constitute a musical notation that may be stored in a digital music format in the persistent storage device, displayed on the display screen, and played by an instrumentation device that recognizes digital music.
  • Figure 4F shows quarter note 407.
  • the user next inputs a southwest- to-southwest gesture 408 at a higher point in the scale than the quarter note, as shown in Figure 4G.
  • the music recognizer interprets that gesture as a quarter note head 409, as shown in Figure 4H.
  • the user next inputs a south-to-south gesture 410 that connects to the note head 409.
  • the music recognizer interprets the gesture as creating a quarter note 411 as shown in Figure 41.
  • the user next inputs east-to-east gesture 412, that 411 is drawn near the top of the stems of notes 407 and 411.
  • the music recognizer identifies the context of that gesture (its proximity to the notes 407 and 411) and determines that that gesture corresponds to a beam 413 running between the note 407 and the note 41 1, as shown in Figure 4J.
  • Figures 5A-5H illustrate an exemplary embodiment of the present invention for drawing another series of musical notations.
  • the music recognizer also interprets the gestures according to the data shown in Table 2 as described below.
  • the user has input an east-to-east gesture 501 in the lower part of a space between two staff lines.
  • the music recognizer determines that that gesture is not located near to other musical notations. Accordingly, the music recognizer determines that that gesture corresponds to a half rest 501, as shown in Figure 5B.
  • Figure 5C shows the quarter note 503 around which the user has drawn the circular gesture 504.
  • the music recognizer characterizes the circular gesture as a large circular gesture by determining that the bounding box of the circular gesture is larger than the predetermined size for the small circular gesture.
  • the music recognizer then examines the context information for large circular gestures.
  • the music recognizer determines that the quarter note 503 lies inside that circular gesture. Based upon this context data, the music recognizer selects the quarter note 503. Once selected the display color associated with the note on the display may change, and the user may input changes to the note.
  • Figure 5D shows a south-to-south gesture 505 that ends near the quarter note head 506.
  • the music recognizer identifies this as a quarter note 507.
  • the user inputs a north-to-north gesture 508.
  • the music recognizer accordingly undoes the effect of the last gesture, which in this example is the south-to-south gesture 504.
  • the quarter note head 509 is left as shown in Figure 5F.
  • the user inputs a south-to-south gesture 510 starting near the left side of the note head 509, as shown in Figure 5G.
  • the music recognizer interprets that gesture as indicating the quarter note 511 having a downward stem, as shown in Figure 5H.
  • Table 2 provides the various gestures, their corresponding musical notations, and some of the relevant context information, according to an embodiment of the invention.
  • the music recognizer identifies an east- to-east gesture as a whole rest if it is drawn above the center of a staff space and as a half rest if it is drawn below the center of a staff space.
  • the music recognizer identifies a natural sign as a modification to a flat.
  • the music recognizer identifies a Southeast-to-Northeast gesture as a flat. If a user inputs a Northeast-to-Southeast gesture near the flat sign, then upon receiving the second gesture, the music recognizer will raise the flat to a natural.
  • the music recognizer identifies double flat and double sharp symbols in a similar manner.
  • the music recognizer identifies a chord by a stack of note heads followed by a single stem (e.g. , a north-to-north gesture) that passes through all of them.
  • the music recognizer is forgiving of seconds in stem matching, that is, if two note heads are displaced from one another because they are only a second apart in pitch, the stem will not attach on the "correct" side on all heads.
  • the music recognizer identifies a beamed group of 4 sixteenth notes as follows.
  • the user inputs the note heads for the four notes, typically beginning with the note heads at the ends of the four notes. If necessary, the user indicates any accidentals for a given note just after providing its note head.
  • the user next inputs the stems for both of the end notes before connecting the stems to each other by drawing a beam between them.
  • the user then inputs the stems for the two middle notes. Finally, the user inputs another beam across all four of the notes. In this manner, the music recognizer will recognize a quadruplet, rather than just four notes independent of each other.
  • the music recognizer allows the user to enter musical notations in various orders. However, the particular order described above for the quadruplets mimics the procedure taught to music copyists.
  • the correct approach requires targets for the pen strokes. If the user draws the interior notes' stems first, and the beam second, the stems may miss the beam or cross over too far. Even if the user misses the intended target, the music recognizer will still perform the intended interpretation if the resulting input gesture is close to its target.
  • a single gesture typically creates a single musical symbol and may also lead to several linkages.
  • the music recognizer provides useful feedback by visually indicating the linkages between musical symbols. For example, if a stem is connected to a beam, the music recognizer may make a small red circle around the link point. The music recognizer may also indicate other musical symbol linkage in a similar way.
  • the music recognizer may also map multiple gestures to the same action.
  • the compass-direction user interface defines 66 recognizable gestures.
  • the music recognizer may use only 12 of the gestures. Therefore, the music recognizer provides a reasonable inteipretation for most gestures having no formal definition in the compass-direction user interface. For example, a southeast-
  • to-southwest (i.e., " j ") gesture may have the same effect as a northeast-to-
  • Embodiments of the compass-direction user interface may also recognize gestures other than those 66 described above. For example, a gesture having a hook may be recognized and used to delete a musical symbol. Also, the input gesture recognizer may recognize gestures made of more than two strokes (e.g., "W,” “N,” “M,” and “Z” shaped gestures). A human hand cannot typically input perfectly straight lines when using a pen. Accordingly, the gesture recognizer performs an initial interpretation of the data input provided by the pen. The gesture recognizer may be aided in its initial interpretation by predetermined thresholds for various characteristics of the gestures. One predetermined threshold may concern the size of a dot or small circular gesture.
  • the gesture recognizer interprets the gesture as a dot or small circular gesture. Conversely, the gesture recognizer interprets a gesture as a large circle if the bounding box is equal to or greater than a predetermined amount.
  • the gesture recognizer maintains prototypes for each of the 66 input gestures. When the gesture recognizer detects a gesture, then the gesture recognizer identifies the gesture by locating the prototype that is the closest to the gesture. The gesture recognizer may use any of the well-known nearest neighbor classifiers. In addition, the gesture recognizer may maintain more than just the 66 prototypes so that more than one representative example for each gesture is maintained. For example, the southeast-to-northeast gesture may have three prototypes: one with both strokes of equal length, one with the southeast stroke being shorter, and one with the northeast stroke being shorter.
  • Table 3 provides exemplary pseudocode for performing a nearest neighbor classification.
  • "U” is an unknown gesture
  • "P(i)” is a list of all the gesture prototypes available for classifying the unknown gesture U.
  • This nearest neighbor classification algorithm calculates the distance between the unxnown gesture U and each prototype and selects the prototype with the shortest distance.
  • Best_Dist Distance (U, P ( 1 ) ) ;
  • the gesture recognizer in one embodiment represents a gesture and gesture prototype by a vector of 14 numbers, which is referred to as a feature vector.
  • the metric or the distance function (e.g., Distance() on line 9), may be the Euclidean distance function, which is the square root of the sum of the squares of the differences between two feature vectors.
  • the 14 numbers of a feature vector are: 17
  • the x,y-coordinates of the first point (2 numbers) The x,y-coord ⁇ nates of the last point (2 numbers)
  • Table 4 contains pseudocode for this distance function. "U(i)” is an unknown gesture, "P(i)” is one of the prototypes, and "i" represents the 14 numbers of the feature vector.
  • the gesture recognizer normalizes feature vectors by translating each feature vector so that its center point is at the origin and by scaling the feature vector appropriately. For example, if the gesture is drawn very small on the upper right portion of the screen or very large on the lower portion of the screen, then the gesture recognizer first translates the coordinates of these gestures to center them around the origin. The gesture recognizer will then shrink or expand the gestures accordingly so that the gesture fills a box having a uniform size. After normalization, the gesture recognizer then compares the unknown gesture ("U") to the gesture prototypes ("P"). This normalization enables the gesture recognizer to identify gestures independently of where the gesture has been drawn and independently of the size of the gesture. The gesture recognizer does not scale circular gestures because large and small circular gestures need to be distinguished.
  • a bounding octagon is an example of the bounding box ("BBox") shown in Table 1 and comprises a bounding rectangle and a bounding diamond.
  • the bounding octagon represents an extension of the well-known bounding rectangle.
  • the bounding rectangle comprises four numbers calculated as:
  • the bounding diamond comprises four numbers calculated as:
  • MinS Min(X(I) + Y(I))
  • MaxS Max(X(I) + Y(I)) MinD - Min(X(I) - Y(I))
  • MaxD Max(X(I) - Y(I))
  • the center point of the gesture determines the amount of translation in the X and Y directions that is applied to a gesture.
  • the center point is calculated as:
  • Xoffset (Min(X(I)) + Max(X(I)))/2 Yoffset - (Min(Y(I)) + Max(Y(I)))/2
  • the center point is the center of the bounding rectangle, and is not the middle point (midX and midY) of the gesture.
  • the gesture recognizer applies a scale to the gesture that is the larger of the width or height of the bounding box ("BBox") as given by the formula:
  • the gesture is probably the dot or the small circle gesture. This particular case may have a single threshold, if the scale value is smaller than the dot threshold, then the gesture recognizer may simply classify the unknown gesture as a dot and perform no more comparisons to any of the other prototypes. Moreover, when an unknown gesture is smaller than the dot threshold, the gesture recognizer does not scale the gesture in order to avoid a possible division by 0 which would produce an error. As previously discussed, the OFlag is set to "true.”
  • the gesture recognizer uses for its middle point a point that maximizes the cross-product between the vector from the start point to the end point and the vector from the start point to that point. Geometrically, the point is perpendicularly the farthest from the line that contains the start point and the end point.
  • the middle point may be calculated as:
  • the point i where d attains a maximum is the middle point (MidX, MidY).
  • the feature vector representing a stroke is defined in Table 5.
  • the gesture recognizer takes no action for unrecognized strokes.
  • the gesture recognizer makes a notation on a screen display to notify the user that an unrecognizable gesture has been received.
  • a training system is used to generate a database of prototypes.
  • the training system prompts the user with the name of a gesture (e.g., northeast-to-northwest) or displays a sample gesture (e.g., ". ") and asks the user to enter an example of that gesture, which the training system then stores in the database.
  • the ttaining system collects samples for a single gesture and averages the sampled gestures to generate one or more average prototypes for that gesture. When a sample is received, the training system generates the feature vector for the sample gesture and then identifies the closest matching prototype.
  • the training system performs a weighted averaging of the feature vectors of the sample gesture and of the prototype to generate an updated feature vector for prototype.
  • a weighted average of 75% of the feature vector of the prototype and 25% of the feature vector may be appropriate in many instances.
  • the gesture recognizer declares the sample gesture to be a new prototype of the prompted-for gesture and adds the feature vector of the sample gesture to the database as a new prototype for the prompted-for gesture.
  • the compass-direction user interface provided by the present invention can be used to enter many types of information other than musical notations.
  • the compass-direction user interface may be used to input information relating to electronic circuit elements as well as characters from languages such as English, Arabic, Hebrew, Hangul, Chinese, and Japanese.
  • the compass-direction user interface may be particularly well suited to aid in the editing of text.
  • Standard proofreaders' marks e.g., Webster 's II New College Dictionary, Houghton Mifflin Co., 1995, p. 1500
  • a north-to-north gesture ( ) can be interpreted to change a letter to
  • a south-to-south gesture (" , ') can be interpreted to mean change a letter to lower case.
  • an east-to-east gesture(" ") can be interpreted as a delete command.
  • FIGS. 6A through 6M provide an exemplary compass-direction user interface applied to a system for drawing electronic circuits.
  • Figure 6 A shows an east-to-east gesture 601. The circuit recognizer determines that that gesture is not located near another circuit element and takes no action.
  • Figure 6B shows a south-to-south gesture 602 that crosses the east-to-east gesture. The circuit recognizer now determines that the user has selected an "AND" gate 603.
  • the user next inputs an east-to-east gesture 604 connected to the front of the AND gate.
  • the circuit recognizer interprets that gesture as output 605 of the AND gate.
  • the user next enters the east-to-east gesture 606 that connects to the upper left of the AND gate.
  • the circuit recognizer interprets that gesture as input 607 to the AND gate.
  • the user next draws an east-to-east gesture 608 that connects to the lower left of the AND gate.
  • the circuit recognizer interprets that gesture as a second input 609 to the AND gate.
  • Figures 6G-6M illustrate the creation of a two input OR gate in a manner analogous to the AND gate described above.
  • the circuit recognizer may also recognize a small circular gesture drawn near the output of a gate as converting that gate to its logical complement (e.g., an AND gate to a NAND gate).
  • the circuit recognizer could alternatively recognize an OR gate having two inputs and an output by a single gesture and recognize an AND gate having two inputs and an output by another gesture.
  • the circuit elements could be moved by gestures and combined to construct a circuit diagram.
  • a third dimension can be time indicating the speed in which a stroke is written. That is, a gesture written quickly can be interpreted differently from a gesture written slowly.
  • a gesture can be represented by directions in three-dimensional space. Such gestures can be "written" by a user by moving the user's hand while using a virtual reality computer system. That is, the tilt of the gesture away from or towards the user can have different meanings.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A compass-direction user interface provides a method and system for entering musical notations in a computing system. A music recognizer (303) associated with the compass-direction user interface (301) receives gestures for translation into a corresponding musical notation. The gestures have a start and an ending direction. A gesture recognizer (302) recognizes gestures based on its starting and ending directions. The music recognizer (303) receives recognized gestures from the gesture recognizer (302), references the context in which the gestures are drawn, and selects and appropriate musical notation corresponding to the gesture an context. The compass-direction user interface (301) recognizes gestures based on combinations of compass directions.

Description

METHOD AND SYSTEM FOR RECOGNIZING MUSICAL NOTATIONS USING A COMPASS-DIRECTION USER INTERFACE
TECHNICAL FIELD
The present invention relates to user interfaces and, in particular, to a user interface that translates input gestures received from an input device into a corresponding musical notation.
BACKGROUND OF THE INVENTION
Recorded music has conventionally existed in human readable and machine readable forms. Human readable music is typically represented as sheet music, which is the printed representation of the musical notes. Sheet music uses staffs to represent, for example, a treble clef and a base clef. The placement of symbols on and between the horizontal lines of the staffs indicates the music that comprises the musical score. These symbols include full notes, half notes, quarter notes, full rests, half rests and quarter rests, etc. Machine readable music can be represented in many different formats based on the machine that is to play the music. For example, the Musical Instrument Digital Interface ("MIDI") standard specifies that aspects of music, such as pitch and volume, are encoded in 8-bit bytes of digital information.
Computers have been used to assist composers in generating sheet music for musical compositions. For example, composers can use an electronic pen and tablet in much the same way as they would use a pen and paper tablet to create sheet music. However, the composer can more easily revise the musical composition using the electronic pen and tablet. Some computer-based composition tools may even perform limited "recognition" of the musical symbols (e.g., recognizing an A-sharp quarter note). Such computer-based composition tools, however, have several drawbacks. First, recognition of the notes has typically been less than satisfactoiy. To obtain satisfactory recognition, a composer would need to handwrite the notes very slowly and carefully, which tends to significantly slow the creative process of generating the musical composition. Second, even if the recognition was accurate, the composer still would need to draw each musical symbol as they would on paper. Such drawing is time-consuming.
Music editors are increasingly using computer-based editing tools to assist in the editing process. For example, a musical composition that is either input by a composer or input by scanning of sheet music may contain numerous recognition errors that need to be corrected. Unfortunately, the current computer- based editing tools do not allow for efficient editing for many of the same reasons that computer-based composition tools do not allow for efficient composition.
SUMMARY OF THE INVENTION
Embodiments of the present invention provide a computer-based method and system for selecting data items using gestures. Each data item has a direction associated with it. The system receives a gesture from a user. The system then determines the direction associated with the received gesture. The system then identifies the data item associated with the determined direction and indicates that the identified data item is the selected data item. In an embodiment, two directions may be associated with each data item and gesture. For example, the directions may be the starting and ending direction of the gesture. By using two directions, the number of data items that can be selected increases, and the accuracy of the recognition of the gestures may also increase.
In another embodiment, a musical editing system uses the gestures as an indication of editing commands to apply to a musical score. In such an embodiment, each gesture may have a starting and ending direction. The music editing system displays the musical notation to be edited. The music editing system then receives a gesture with a starting and ending direction that is drawn in relation to the displayed musical notation. The music editing system then recognizes the gesture as one of the pre-defined gestures. The music editing system then determines the relation between the recognized gesture and the displayed musical notation. The music editing system then modifies the notation based on the recognized gesture and the determined relation.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates possible gestures of a compass-direction user interface that differentiates eight directions.
Figure 2 illustrates an exemplary computing system for use in conjunction with the compass-direction user interface for editing musical scores.
Figure 3 is a block diagram of a musical editing system that employs a compass-direction user interface.
Figures 4A-4J illustrate the inputting of musical notations using the compass-direction user interface. Figures 5A-5H illustrate an exemplary embodiment of the present invention for drawing another series of musical notations.
Figures 6A-6M provide an exemplary compass-direction user interface applied to a system for drawing electronic circuits.
DETAILED DESCRIPTION OF THE INVENTION Embodiments of the present invention provide a method and system for inputting and recognizing information that uses a compass-direction user interface to specify the information. A compass-direction user interface allows a user to specify information by drawing one or more lines in various compass directions. For example, a line drawn in the south direction (i.e., from top to bottom) may represent one type of information, and a line drawn in the north-west direction may represent another type of information. In addition, the compass- direction user interface also allows a user to specify information by drawing a line that changes direction. For example, a line drawn starting in the south direction that then abruptly changes direction to the north-west direction (i.e., " j ") may represent one type of information, and a line drawn starting in the south direction that then abruptly changes direction to the west direction (/'. e. , " -<— ' ")may represent another type of information. If eight possible compass directions (N, NE, E, SE, S, SW, W and NW) are recognized by compass-direction user interface, then 64 different lines (also referred to as gestures) can be drawn to represent 64 different types of information. In addition, many more types of information can be represented if the context of where the line is drawn is considered.
The compass-direction user interface is particularly well-suited to input of musical information. In one embodiment, a compass-direction user
interface captures gestures (e.g., ", " J", " -<-J ") from an electronic tablet and determines the corresponding musical notation represented by the gesture. The compass-direction user interface provides a low error rate in recognizing the correct musical notations for a user's input but also mimic the traditional input procedures of composers and music copyists in drawing music on paper using a pen. Thus, the compass-direction user interface operates in an intuitive manner for those already skilled in the art of transcribing music, minimizing their training time in use of the invention.
Figure 1 illustrates possible gestures of a compass-direction user interface that differentiates eight directions. One skilled in the art would appreciate that the number of directions can be decreased or increased based on the number of gestures needed to represent the types of information. For example, if eight gestures are needed, then four compass directions (i.e., N, E, S, and W) would be sufficient. If 256 gestures are needed, then 16 compass directions (e.g., N, N-NE, NE, E-NE, and E) are necessary. As the number of gestures increases, however, it becomes more difficult for the user to accurately draw the intended gesture and thus the recognition of the intended gesture also becomes more difficult. In one embodiment, the use of eight compass directions is an acceptable compromise between the number of gestures and the ease of drawing and recognizing the gestures. Each row of Figure 1 represents the starting direction of the gesture, and each column represents the ending direction of the gesture. For example, the gesture at row N and column E starts in the north direction and ends in the east direction (i.e. , "I "), and the gesture at row SE and column SW starts in the
south-east direction and ends in the south-west direction (i.e., " ) "). Each gesture on the diagonal of the table (e.g., row N and column N, row NE and column NE) represents a gesture that starts in a compass direction and ends in that same direction. Thus, these gestures comprise a single stroke, which also can be considered as two strokes in the same direction. (A stroke can be considered to be a line that does not change direction.) Also, the gestures that start in one compass direction and end in the opposite compass direction (e.g., row N and column S, row NE and column SW, etc.) are written as two strokes: the first stroke in one direction and the second stroke in the opposite direction. The second stroke overlaps the first stroke and is represented as the dashed line in Figure 1.
The compass-direction user interface allows a user to specify up to 64 different gestures based on compass direction. Each of these gestures can be mapped to a different data item that can either represent an action to perform (e.g., "insert"), information selection (e.g., "a quarter note") or both. In addition, one embodiment of the compass-direction user interface allows a user to specify information by using a circular gesture of varying sizes. For example, a large circular gesture can be used to select an encircled item or indicate a whole note depending on the context, and a small circular gesture can be used to indicate the quarter portion of a three-quarter note if drawn close to a half note. Figure 2 illustrates an exemplary computing system for use in conjunction with the compass-direction user interface for editing musical scores. An exemplary computing system comprises a personal computer 206 having a keyboard 203, a monitor 205, a display 204, a pen 201, and a tablet 202. The computer may also include a persistent storage device 207 that stores the musical notations provided by the user. The computer contains a central processing unit for executing instructions stored in memory. The memory and persistent storage are considered to be types of computer-readable medium.
The pen and the tablet transmit to the computer gestures made by a user using the pen. The display displays the pen's trace on the tablet, providing the user with visual feedback of the gestures. The tablet may also overlay a display so that the user can see the gestures and the musical score as it is edited on the tablet. A gesture recognizer associated with the compass-direction user interface identifies the input data as a particular gesture, and a music recognizer selects the appropriate musical notation by referencing a context associated with the identified gesture. The gesture recognizer of the compass-based user interface may be implemented as part of the electronic tablet. That is, a computer that is part of the electronic tablet may implement the gesture recognizer. This "computer" may be implemented as discrete logic specially developed to implement the gesture recognizer. The display displays the translated input data as a musical notation.
As illustrated on the display, a user has already utilized the pen and the tablet to input a series of musical notations. Displaying the user's translated input on the display provides feedback to the user, allowing the user to see how the gesture recognizer and the music recognizer have inteipreted the gestures. The computer may also include a means for uploading and downloading musical information across a network. For example, the music shown on the display may have been provided from a repository of digital music retrieved by the user over a network. The user may edit retrieved musical scores by making appropriate inputs with the pen.
The compass-direction user interface allows users to input musical notations for recognition and interpretation by the computer according to a predetermined format. Utilizing the pen (or other input device such as a mouse or finger in a touch sensitive display), the user inputs gestures corresponding to a predetermined format for recognition by a gesture recognizer in the computer. The predetermined gesture format utilized by the compass-direction user interface comprises line gestures and circular gestures, according to an embodiment of the invention. Each line gesture has two compass directions: a starting and ending direction. Some line gestures will appear to have one direction because the starting and ending directions are the same direction.
Figure 3 is a block diagram of a musical editing system that employs a compass-direction user interface. The system comprises a tablet interface 301, a gesture recognizer 302, a music recognizer 303, and a music editing component 304. The tablet interface receives the x, y-coordinates of the pen strokes from the tablet and echoes the pen strokes onto the display to give the user visual feedback. The x, y coordinates are provided to the gesture recognizer, which categorizes the strokes as one of the 66 gestures and passes the recognized gestures to the music recognizer. The music recognizer identifies the edit represented by the gesture based on the position of the gesture relative to the currently displayed musical notation. The music recognizer then sends the appropriate instruction to the music editing component to edit the music. The music editing component can be a conventional component with the music recognizer being considered a user interface front end. The music editing system may define some input gestures to indicate an action for the music recognizer to perform rather than a musical notation. For example, a gesture comprising in the upward vertical direction (/'. e. , north-to-north) may indicate an "undo" notation that instructs the music recognizer to undo the effect of the last gesture. Thus, a gesture that creates a note head, followed immediately by a gesture comprising a north-to-north gesture instructs the music recognizer to delete the note head.
The gesture recognizer captures gestures from the moment of pen down (i.e., pen comes in contact with the tablet) until pen up (i.e., pen ceases contact with the tablet). The gesture recognizer tracks movements of the pen. A gesture is delimited by a pen up and a pen down. The input gesture recognizer recognizes the gesture and provides a data structure as shown in Table 1 to the music recognizer.
TABLE l
Figure imgf000010_0001
The data structure includes a start direction ("StartD") and an end direction ("EndD"). If the gesture involves only a single direction, then the ending direction is left empty or, alternatively, may be set to the same direction. The data structure further includes a starting x-coordinate ("StartX") and a starting y-coordinate ("StartY"). The starting coordinates correspond to a location where the pen first contacts the tablet. The data structure also includes an ending x-coordinate ("EndX") and an ending y-coordinate ("EndY"). The ending coordinates correspond to a location where the pen last contacts the tablet. If the gesture includes a different starting and ending direction, then the data structure includes an x-coordinate ("MidX") and a y-coordinate ("MidY") corresponding to the location where the gesture changed direction. The data structure further includes a bounding box ("BBox") that defines the smallest rectangle (or octagon) into which the gesture will fit. Finally, if the gesture corresponds to a circle or an oval, then the input gesture recognizer sets a flag ("OFlag") to be true.
Some embodiments of the gesture recognizer may contain a timer that measures the time passing between gestures. The gesture recognizer interprets gestures arriving within a predetermined period of time as being a single gesture, rather than two independent gestures. These embodiments may be particularly useful for individuals with disabilities that prevent them from keeping the pen in contact with the tablet long enough to execute two connected lines. Utilizing the gesture data structure shown in Table 1, the music recognizer assigns some gestures to musical notations directly while using context to disambiguate other gestures. For example, a large circle may either be a whole note or indicate selection of an existing musical notation, depending on whether the user has drawn the circle in a blank area near a staff line or in an area containing other musical notations.
The following provides an example of how a user may input a musical notation, according to an embodiment of the invention. If a user wishes to draw a quarter note, the user first draws a particular gesture which the music recognizer will interpret as the note head. The user next draws a south-to-south gesture (i.e., a vertical line drawn downward) that attaches to the note head which the music recognizer interprets as a note stem. The music recognizer determines that the user has input a quarter note, combines these two gestures, and replaces them with a single quarter note on the display. 10
Figures 4A-4J illustrate the inputting of musical notations using the compass-direction user interface. In this embodiment, the music recognizer interprets the gestures according to the data shown in Table 2 as described below.
Figure 4 A shows a southwest-to-southwest gesture 401 drawn in between two staff lines. As shown in Figure 4B, the music recognizer interprets that gesture as corresponding to a quarter note head 402. The music recognizer replaces that gesture with the note head placed in between the staff lines. Figure 4C shows the next gesture as a south-to-south gesture 403 that touches the note head. As shown in Figure 4D, the music recognizer identifies that gesture connected to the note head indicating a quarter note 404. Figure 4D also shows that the user has input an east-to-east gesture 405 near the top of the stem of the quarter note. As shown in Figure 4E, the music recognizer identifies that gesture as changing the quarter note into an eighth note 406. The completed eighth note and its location within the musical score now constitute a musical notation that may be stored in a digital music format in the persistent storage device, displayed on the display screen, and played by an instrumentation device that recognizes digital music.
Figure 4F shows quarter note 407. The user next inputs a southwest- to-southwest gesture 408 at a higher point in the scale than the quarter note, as shown in Figure 4G. The music recognizer interprets that gesture as a quarter note head 409, as shown in Figure 4H. The user next inputs a south-to-south gesture 410 that connects to the note head 409. The music recognizer interprets the gesture as creating a quarter note 411 as shown in Figure 41. The user next inputs east-to-east gesture 412, that 411 is drawn near the top of the stems of notes 407 and 411. The music recognizer identifies the context of that gesture (its proximity to the notes 407 and 411) and determines that that gesture corresponds to a beam 413 running between the note 407 and the note 41 1, as shown in Figure 4J.
Figures 5A-5H illustrate an exemplary embodiment of the present invention for drawing another series of musical notations. In this embodiment, the music recognizer also interprets the gestures according to the data shown in Table 2 as described below.
As shown in Figure 5 A, the user has input an east-to-east gesture 501 in the lower part of a space between two staff lines. The music recognizer determines that that gesture is not located near to other musical notations. Accordingly, the music recognizer determines that that gesture corresponds to a half rest 501, as shown in Figure 5B. Figure 5C shows the quarter note 503 around which the user has drawn the circular gesture 504. The music recognizer characterizes the circular gesture as a large circular gesture by determining that the bounding box of the circular gesture is larger than the predetermined size for the small circular gesture. The music recognizer then examines the context information for large circular gestures. The music recognizer determines that the quarter note 503 lies inside that circular gesture. Based upon this context data, the music recognizer selects the quarter note 503. Once selected the display color associated with the note on the display may change, and the user may input changes to the note.
Figure 5D shows a south-to-south gesture 505 that ends near the quarter note head 506. As previously discussed, the music recognizer identifies this as a quarter note 507. In Figure 5E, the user inputs a north-to-north gesture 508. The music recognizer accordingly undoes the effect of the last gesture, which in this example is the south-to-south gesture 504. Thus, when undone, the quarter note head 509 is left as shown in Figure 5F. The user inputs a south-to-south gesture 510 starting near the left side of the note head 509, as shown in Figure 5G. The music recognizer interprets that gesture as indicating the quarter note 511 having a downward stem, as shown in Figure 5H.
Table 2 provides the various gestures, their corresponding musical notations, and some of the relevant context information, according to an embodiment of the invention. For example, the music recognizer identifies an east- to-east gesture as a whole rest if it is drawn above the center of a staff space and as a half rest if it is drawn below the center of a staff space.
TABLE 2
Figure imgf000014_0001
Figure imgf000015_0001
The music recognizer identifies a natural sign as a modification to a flat. The music recognizer identifies a Southeast-to-Northeast gesture as a flat. If a user inputs a Northeast-to-Southeast gesture near the flat sign, then upon receiving the second gesture, the music recognizer will raise the flat to a natural. The music recognizer identifies double flat and double sharp symbols in a similar manner.
The music recognizer identifies a chord by a stack of note heads followed by a single stem (e.g. , a north-to-north gesture) that passes through all of them. The music recognizer is forgiving of seconds in stem matching, that is, if two note heads are displaced from one another because they are only a second apart in pitch, the stem will not attach on the "correct" side on all heads.
The music recognizer identifies a beamed group of 4 sixteenth notes as follows. The user inputs the note heads for the four notes, typically beginning with the note heads at the ends of the four notes. If necessary, the user indicates any accidentals for a given note just after providing its note head. The user next inputs the stems for both of the end notes before connecting the stems to each other by drawing a beam between them. The user then inputs the stems for the two middle notes. Finally, the user inputs another beam across all four of the notes. In this manner, the music recognizer will recognize a quadruplet, rather than just four notes independent of each other.
The music recognizer allows the user to enter musical notations in various orders. However, the particular order described above for the quadruplets mimics the procedure taught to music copyists. When writing with a pen on paper, the correct approach requires targets for the pen strokes. If the user draws the interior notes' stems first, and the beam second, the stems may miss the beam or cross over too far. Even if the user misses the intended target, the music recognizer will still perform the intended interpretation if the resulting input gesture is close to its target. A single gesture typically creates a single musical symbol and may also lead to several linkages. The music recognizer provides useful feedback by visually indicating the linkages between musical symbols. For example, if a stem is connected to a beam, the music recognizer may make a small red circle around the link point. The music recognizer may also indicate other musical symbol linkage in a similar way.
The music recognizer may also map multiple gestures to the same action. The compass-direction user interface defines 66 recognizable gestures. However, the music recognizer may use only 12 of the gestures. Therefore, the music recognizer provides a reasonable inteipretation for most gestures having no formal definition in the compass-direction user interface. For example, a southeast-
to-southwest (i.e., " j ") gesture may have the same effect as a northeast-to-
northwest gesture (i.e., "• "). Embodiments of the compass-direction user interface may also recognize gestures other than those 66 described above. For example, a gesture having a hook may be recognized and used to delete a musical symbol. Also, the input gesture recognizer may recognize gestures made of more than two strokes (e.g., "W," "N," "M," and "Z" shaped gestures). A human hand cannot typically input perfectly straight lines when using a pen. Accordingly, the gesture recognizer performs an initial interpretation of the data input provided by the pen. The gesture recognizer may be aided in its initial interpretation by predetermined thresholds for various characteristics of the gestures. One predetermined threshold may concern the size of a dot or small circular gesture. For example, if the bounding box ("BBox") for a gesture is less than a predetermined amount, then the gesture recognizer interprets the gesture as a dot or small circular gesture. Conversely, the gesture recognizer interprets a gesture as a large circle if the bounding box is equal to or greater than a predetermined amount. In one embodiment, the gesture recognizer maintains prototypes for each of the 66 input gestures. When the gesture recognizer detects a gesture, then the gesture recognizer identifies the gesture by locating the prototype that is the closest to the gesture. The gesture recognizer may use any of the well-known nearest neighbor classifiers. In addition, the gesture recognizer may maintain more than just the 66 prototypes so that more than one representative example for each gesture is maintained. For example, the southeast-to-northeast gesture may have three prototypes: one with both strokes of equal length, one with the southeast stroke being shorter, and one with the northeast stroke being shorter.
Table 3 provides exemplary pseudocode for performing a nearest neighbor classification. "U" is an unknown gesture, and "P(i)" is a list of all the gesture prototypes available for classifying the unknown gesture U. This nearest neighbor classification algorithm calculates the distance between the unxnown gesture U and each prototype and selects the prototype with the shortest distance.
TABLE 3
1. Nearest Neighbor (U, P, Number_of_Prototypes )
2. int Best_I, Classification_of_Un nown
3. real Best_Dist, D
4. {
5. Best_I = 1;
6. Best_Dist = Distance (U, P ( 1 ) ) ;
7. For I = 2 to Number_of_Prototypes
8. {
9. D = Distance (U, P ( I ) ) ;
10. If (D < Best_Dist) then
11. (
12. Best_Dist = D;
13. Best_I = I;
14. }
15. }
16. Classification of Unknown = Class (Best I);
17. }
The gesture recognizer in one embodiment represents a gesture and gesture prototype by a vector of 14 numbers, which is referred to as a feature vector. The metric, or the distance function (e.g., Distance() on line 9), may be the Euclidean distance function, which is the square root of the sum of the squares of the differences between two feature vectors. The 14 numbers of a feature vector are: 17
The x,y-coordinates of the first point (2 numbers) The x,y-coordιnates of the last point (2 numbers) The x,y-coordinates of a middle point (2 numbers) The x,y-bounding octagon (8 numbers)
These numbers represent a portion of the data structure shown in Table 1.
Table 4 contains pseudocode for this distance function. "U(i)" is an unknown gesture, "P(i)" is one of the prototypes, and "i" represents the 14 numbers of the feature vector.
TABLE 4
1. Distance function (U, P)
2. mt D, I;
3. {
4. D = 0;
5. For l = 1 to 14
6. {
7. D = D + (u(ι) - P( ) )Λ2;
8. ]
9. D = sqrt (D) ;
10.)
The gesture recognizer normalizes feature vectors by translating each feature vector so that its center point is at the origin and by scaling the feature vector appropriately. For example, if the gesture is drawn very small on the upper right portion of the screen or very large on the lower portion of the screen, then the gesture recognizer first translates the coordinates of these gestures to center them around the origin. The gesture recognizer will then shrink or expand the gestures accordingly so that the gesture fills a box having a uniform size. After normalization, the gesture recognizer then compares the unknown gesture ("U") to the gesture prototypes ("P"). This normalization enables the gesture recognizer to identify gestures independently of where the gesture has been drawn and independently of the size of the gesture. The gesture recognizer does not scale circular gestures because large and small circular gestures need to be distinguished.
A bounding octagon is an example of the bounding box ("BBox") shown in Table 1 and comprises a bounding rectangle and a bounding diamond. The bounding octagon represents an extension of the well-known bounding rectangle. The bounding rectangle comprises four numbers calculated as:
MinX = Min(X(I)) MaxX = Max(X(I)) MinY = Min(Y(I)) MaxY = Max(Y(I))
where X(I) represents a vector of x-coordinates of each point in the gesture and Y(I) represents a vector of y-coordinates of each point in the gesture. The bounding diamond comprises four numbers calculated as:
MinS = Min(X(I) + Y(I)) MaxS = Max(X(I) + Y(I)) MinD - Min(X(I) - Y(I)) MaxD = Max(X(I) - Y(I))
The combination of these two sets of four numbers provides the bounding octagon. The center point of the gesture determines the amount of translation in the X and Y directions that is applied to a gesture. The center point is calculated as:
Xoffset = (Min(X(I)) + Max(X(I)))/2 Yoffset - (Min(Y(I)) + Max(Y(I)))/2 The center point is the center of the bounding rectangle, and is not the middle point (midX and midY) of the gesture.
The gesture recognizer applies a scale to the gesture that is the larger of the width or height of the bounding box ("BBox") as given by the formula:
Scale = Max(MaxX - MinX, MaxY - MinY)
If the scale value is small (e.g., approaching 0), the gesture is probably the dot or the small circle gesture. This particular case may have a single threshold, if the scale value is smaller than the dot threshold, then the gesture recognizer may simply classify the unknown gesture as a dot and perform no more comparisons to any of the other prototypes. Moreover, when an unknown gesture is smaller than the dot threshold, the gesture recognizer does not scale the gesture in order to avoid a possible division by 0 which would produce an error. As previously discussed, the OFlag is set to "true."
The gesture recognizer uses for its middle point a point that maximizes the cross-product between the vector from the start point to the end point and the vector from the start point to that point. Geometrically, the point is perpendicularly the farthest from the line that contains the start point and the end point. The middle point may be calculated as:
dx = EndX - StartX dy = EndY - StartY select minimum d(i) where d(i) = Abs((X(i) - StartX)) * dy - (Y(i) - StartY) * dx
The point i where d attains a maximum is the middle point (MidX, MidY). The feature vector representing a stroke is defined in Table 5. 20
TABLE 5
Figure imgf000022_0001
According to one embodiment of the invention, the gesture recognizer takes no action for unrecognized strokes. According to another embodiment of the invention, the gesture recognizer makes a notation on a screen display to notify the user that an unrecognizable gesture has been received.
In one embodiment, a training system is used to generate a database of prototypes. The training system prompts the user with the name of a gesture (e.g., northeast-to-northwest) or displays a sample gesture (e.g., ". ") and asks the user to enter an example of that gesture, which the training system then stores in the database. In another embodiment, the ttaining system collects samples for a single gesture and averages the sampled gestures to generate one or more average prototypes for that gesture. When a sample is received, the training system generates the feature vector for the sample gesture and then identifies the closest matching prototype. If the closest match is the prompted-for gesture, then the training system performs a weighted averaging of the feature vectors of the sample gesture and of the prototype to generate an updated feature vector for prototype. A weighted average of 75% of the feature vector of the prototype and 25% of the feature vector may be appropriate in many instances. If the closest match was not the prompted-for gesture, the gesture recognizer declares the sample gesture to be a new prototype of the prompted-for gesture and adds the feature vector of the sample gesture to the database as a new prototype for the prompted-for gesture.
The compass-direction user interface provided by the present invention can be used to enter many types of information other than musical notations. For example, the compass-direction user interface may be used to input information relating to electronic circuit elements as well as characters from languages such as English, Arabic, Hebrew, Hangul, Chinese, and Japanese. The compass-direction user interface may be particularly well suited to aid in the editing of text. Standard proofreaders' marks (e.g., Webster 's II New College Dictionary, Houghton Mifflin Co., 1995, p. 1500) can be replaced with compass-based gestures.
For example, a north-to-north gesture (" ") can be interpreted to change a letter to
uppercase; and a south-to-south gesture (" , ') can be interpreted to mean change a letter to lower case. Similarly, an east-to-east gesture(" ") can be interpreted as a delete command.
The compass-direction user interface could be used in conjunction with a system for drawing electronic circuits. Figures 6A through 6M provide an exemplary compass-direction user interface applied to a system for drawing electronic circuits. Figure 6 A shows an east-to-east gesture 601. The circuit recognizer determines that that gesture is not located near another circuit element and takes no action. Figure 6B shows a south-to-south gesture 602 that crosses the east-to-east gesture. The circuit recognizer now determines that the user has selected an "AND" gate 603.
As shown in Figure 6C, the user next inputs an east-to-east gesture 604 connected to the front of the AND gate. As shown in Figure 6D, the circuit recognizer interprets that gesture as output 605 of the AND gate. The user next enters the east-to-east gesture 606 that connects to the upper left of the AND gate. As shown in Figure 6E, the circuit recognizer interprets that gesture as input 607 to the AND gate. The user next draws an east-to-east gesture 608 that connects to the lower left of the AND gate. As shown in Figure 6F, the circuit recognizer interprets that gesture as a second input 609 to the AND gate.
Figures 6G-6M illustrate the creation of a two input OR gate in a manner analogous to the AND gate described above. The circuit recognizer may also recognize a small circular gesture drawn near the output of a gate as converting that gate to its logical complement (e.g., an AND gate to a NAND gate).
The circuit recognizer could alternatively recognize an OR gate having two inputs and an output by a single gesture and recognize an AND gate having two inputs and an output by another gesture. The circuit elements could be moved by gestures and combined to construct a circuit diagram.
Although specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as will be recognized by those skilled in the relevant art. The teachings provided herein of the invention may be applied to other user interfaces, not necessarily the exemplary user interface described above. Various exemplary computing systems, and accordingly various other system configurations may be employed under the invention. The embodiments of the invention disclosed herein have been discussed with regard to a personal computing environment. However, the invention finds equal applicability in other computing systems, such as large centralized computing systems and portable computerized systems, and even hand-held computing devices. The invention also finds applicability in network telecommunication devices, such as a network of small hand-held devices that both send and receive information.
These and other changes can be made to the invention in light of the above detailed description. The principles of the present invention can be used to signal gestures in higher dimensions, rather than in the two dimensions of a compass. For example, a third dimension can be time indicating the speed in which a stroke is written. That is, a gesture written quickly can be interpreted differently from a gesture written slowly. Also, for example, a gesture can be represented by directions in three-dimensional space. Such gestures can be "written" by a user by moving the user's hand while using a virtual reality computer system. That is, the tilt of the gesture away from or towards the user can have different meanings. In general, in the following claims, the terms used should not be construed to limit the invention to the embodiments disclosed in the specification and the claims, but should be construed to include all user interfaces that operate under the claims. Accordingly, the invention is not limited by the disclosure, but instead its scope is to be determined by the following claims.

Claims

L A method for selecting one of a plurality of data items via a gesture, the method comprising: providing a direction that is associated with each of the data items; receiving a gesture; determining a direction associated with the received gesture; and indicating that data item associated with the determined direction is the selected data item.
2. The method of claim 1 wherein the direction is indicated by an angular value.
3. The method of claim 1 wherein the direction is indicated by a compass-based direction.
4. The method of claim 1 wherein the directions provided are separated by multiples of 90 degrees.
5. The method of claim 1 wherein the directions provided are separated by multiples of 45 degrees.
6. The method of claim 1 wherein two directions are associated with each data item and two directions of the received gesture indicate the data item to select.
7. The method of claim 6 wherein the two directions are a starting and an ending direction.
8. The method of claim 7 wherein each starting and ending direction is one of four compass directions.
9. The method of claim 7 wherein each starting and ending direction is one of eight compass directions.
10. The method of claim 7 wherein a gesture comprises one stroke and the starting and ending directions of each gesture are the same.
11. The method of claim 7 wherein a gesture comprises two strokes and the starting and ending directions are different.
12. The method of claim 11 wherein the two strokes are at angles that are multiples of 90 degrees.
13. The method of claim 1 1 wherein the two strokes are at angles that are multiples of 45 degrees.
14. The method of claim 1 wherein the direction is in three- dimensional space.
15. The method of claim 1 wherein a provided direction that is associated with a data item includes timing information so that a gesture drawn with different timing results in different data items being indicated.
16. The method of claim 1 wherein the data items represent proofreaders' marks.
17. A method in a computer system for editing a musical score, the method comprising: providing a plurality of pre-defined gestures, each provided gesture having a starting and an ending direction; displaying musical notation; receiving a gesture with a starting and an ending direction that is drawn in relation to the displayed musical notation; recognizing the received gesture as one of the provided pre-defined gestures; determining the relation between the recognized gesture and the displayed musical notation; and modifying the musical notation based on the recognized gesture and the determined relation.
18. The method of claim 17 wherein each starting and ending direction is one of four compass directions.
19. The method of claim 17 wherein each starting and ending direction is one of eight compass directions.
20. The method of claim 17 wherein a gesture comprises one stroke and the starting and ending directions of each gesture are the same.
21. The method of claim 17 wherein a gesture comprises two strokes and the starting and ending directions are different.
22. The method of claim 21 wherein the two strokes are at angles that are multiples of 90 degrees.
23. The method of claim 21 wherein the two strikes are at angles that are multiples of 45 degrees.
24. The method of claim 17 wherein the determined relation is the gesture being near a displayed musical symbol.
25. A computer-readable medium containing instructions for causing a computer system to select one of a plurality of data items via a gesture, by: receiving a gesture; determining a direction associated with the received gesture; and indicating a data item that is associated with the determined direction as the selected data item.
26. The computer-readable medium of claim 25 wherein the direction is indicated by an angular value.
27. The computer-readable medium of claim 25 wherein the direction is indicated by a compass-based direction.
28. The computer-readable medium of claim 25 wherein the directions provided are separated by multiples of 90 degrees.
29. The computer-readable medium of claim 25 wherein the directions provided are separated by multiples of 45 degrees.
30. The computer-readable medium of claim 25 wherein two directions are associated with each data item and two directions of the received gesture indicate the data item to select.
31. The computer-readable medium of claim 30 wherein the two directions are a starting and an ending direction.
32. The computer-readable medium of claim 31 wherein each starting and ending direction is one of four compass directions.
33. The computer-readable medium of claim 31 wherein each starting and ending direction is one of eight compass directions.
34. The computer-readable medium of claim 31 wherein a gesture comprises one stroke and the starting and ending directions of each gesture are the same.
35. The computer-readable medium of claim 31 wherein a gesture comprises two strokes and the starting and ending directions are different.
36. The computer-readable medium of claim 35 wherein the two strokes are at angles that are multiples of 90 degrees.
37. The computer-readable medium of claim 35 wherein the two strokes are at angles that are multiples of 45 degrees.
38. The computer-readable medium of claim 25 wherein the direction is in three-dimensional space.
39. The computer-readable medium of claim 25 wherein a direction that is associated with a data item includes timing information so that a gesture drawn with different timing results in different data items being indicated.
40. The computer-readable medium of claim 25 wherein the data items represent proofreaders' mark.
41. The computer-readable medium of claim 25 wherein the data item represents an action to perform.
42. The computer-readable medium of claim 25 wherein the data item represents an item of information.
43 3.. A computer system for editing a musical score, comprising: a collection of pre-defined gestures, each pre-defined gesture having a starting and an ending direction; a display component that displays musical notation; a receiving component that receives a gesture with a starting and an ending direction that is drawn in relation to the displayed musical notation; a recognizing component that recognizes the received gesture as one of the pre-defined gestures; and a musical score update component that modifies the musical notation based on the recognized gesture.
44. The system of claim 43 wherein each starting and ending direction is one of four compass directions.
45. The system of claim 43 wherein each starting and ending direction is one of eight compass directions.
46. The system of claim 43 wherein a gesture comprises one stroke and the starting and ending directions of each gesture are the same.
47. The system of claim 43 wherein a gesture comprises two strokes and the starting and ending directions are different.
48. The system of claim 47 wherein the two strokes are at angles that are multiples of 90 degrees.
49. The system of claim 47 wherein the two strokes are at angles that are multiples of 45 degrees.
50. The system of claim 43 including a relation determining component that determines a relation between the recognized gesture and the displayed musical notation and wherein the modifying of the musical notation is based on the deteπnined relation and wherein the determined relation is the gesture being near a displayed musical symbol.
PCT/US1999/029410 1998-12-11 1999-12-10 Method and system for recognizing musical notations using a compass-direction user interface Ceased WO2000034942A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU20505/00A AU2050500A (en) 1998-12-11 1999-12-10 Method and system for recognizing musical notations using a compass-direction user interface

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20958598A 1998-12-11 1998-12-11
US09/209,585 1998-12-11

Publications (1)

Publication Number Publication Date
WO2000034942A1 true WO2000034942A1 (en) 2000-06-15

Family

ID=22779372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/029410 Ceased WO2000034942A1 (en) 1998-12-11 1999-12-10 Method and system for recognizing musical notations using a compass-direction user interface

Country Status (2)

Country Link
AU (1) AU2050500A (en)
WO (1) WO2000034942A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1246048A1 (en) * 2001-03-26 2002-10-02 SAP Aktiengesellschaft Method and computer system for executing functions for objects based on the movement of an input device
EP1710670A3 (en) * 2005-04-06 2007-10-24 Nintendo Co., Limited Storage medium storing input position processing program, and input position processing device
US7460011B1 (en) 2004-06-16 2008-12-02 Rally Point Inc. Communicating direction information
WO2011059404A3 (en) * 2009-11-12 2011-07-21 Nanyang Polytechnic Method and system for interactive gesture-based control
US8169410B2 (en) 2004-10-20 2012-05-01 Nintendo Co., Ltd. Gesture inputs for a portable display device
CN102467327A (en) * 2010-11-10 2012-05-23 上海无戒空间信息技术有限公司 Method for generating and editing gesture object and operation method of audio data
CN103106403A (en) * 2013-01-08 2013-05-15 沈阳理工大学 Note element division method based on image processing and note knowledge
US9135927B2 (en) 2012-04-30 2015-09-15 Nokia Technologies Oy Methods and apparatus for audio processing
US10148903B2 (en) 2012-04-05 2018-12-04 Nokia Technologies Oy Flexible spatial audio capture apparatus
KR20220103102A (en) * 2019-11-29 2022-07-21 마이스크립트 Gesture Stroke Recognition in Touch-Based User Interface Input

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4980840A (en) * 1987-09-23 1990-12-25 Beijing Stone New Technology Research Institute Computerized editing and composing system
US5153829A (en) * 1987-11-11 1992-10-06 Canon Kabushiki Kaisha Multifunction musical information processing apparatus
EP0632427A2 (en) * 1993-06-30 1995-01-04 Casio Computer Co., Ltd. Method and apparatus for inputting musical data
US5512707A (en) * 1993-01-06 1996-04-30 Yamaha Corporation Control panel having a graphical user interface for setting control panel data with stylus
US5663514A (en) * 1995-05-02 1997-09-02 Yamaha Corporation Apparatus and method for controlling performance dynamics and tempo in response to player's gesture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4980840A (en) * 1987-09-23 1990-12-25 Beijing Stone New Technology Research Institute Computerized editing and composing system
US5153829A (en) * 1987-11-11 1992-10-06 Canon Kabushiki Kaisha Multifunction musical information processing apparatus
US5512707A (en) * 1993-01-06 1996-04-30 Yamaha Corporation Control panel having a graphical user interface for setting control panel data with stylus
EP0632427A2 (en) * 1993-06-30 1995-01-04 Casio Computer Co., Ltd. Method and apparatus for inputting musical data
US5663514A (en) * 1995-05-02 1997-09-02 Yamaha Corporation Apparatus and method for controlling performance dynamics and tempo in response to player's gesture

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1246048A1 (en) * 2001-03-26 2002-10-02 SAP Aktiengesellschaft Method and computer system for executing functions for objects based on the movement of an input device
US7460011B1 (en) 2004-06-16 2008-12-02 Rally Point Inc. Communicating direction information
US10324615B2 (en) 2004-10-20 2019-06-18 Nintendo Co., Ltd. Computing device and browser for same
US10996842B2 (en) 2004-10-20 2021-05-04 Nintendo Co., Ltd. Computing device and browser for same
US8169410B2 (en) 2004-10-20 2012-05-01 Nintendo Co., Ltd. Gesture inputs for a portable display device
US9052816B2 (en) 2004-10-20 2015-06-09 Nintendo Co., Ltd. Computing device and browser for same
US11763068B2 (en) 2004-10-20 2023-09-19 Nintendo Co., Ltd. Computing device and browser for same
US7750893B2 (en) 2005-04-06 2010-07-06 Nintendo Co., Ltd. Storage medium storing input position processing program, and input position processing device
EP1710670A3 (en) * 2005-04-06 2007-10-24 Nintendo Co., Limited Storage medium storing input position processing program, and input position processing device
WO2011059404A3 (en) * 2009-11-12 2011-07-21 Nanyang Polytechnic Method and system for interactive gesture-based control
CN102467327A (en) * 2010-11-10 2012-05-23 上海无戒空间信息技术有限公司 Method for generating and editing gesture object and operation method of audio data
US10148903B2 (en) 2012-04-05 2018-12-04 Nokia Technologies Oy Flexible spatial audio capture apparatus
US10419712B2 (en) 2012-04-05 2019-09-17 Nokia Technologies Oy Flexible spatial audio capture apparatus
US9135927B2 (en) 2012-04-30 2015-09-15 Nokia Technologies Oy Methods and apparatus for audio processing
CN103106403A (en) * 2013-01-08 2013-05-15 沈阳理工大学 Note element division method based on image processing and note knowledge
CN103106403B (en) * 2013-01-08 2016-08-03 沈阳理工大学 A kind of note primitive dividing method according with knowledge based on image procossing and pleasure
KR20220103102A (en) * 2019-11-29 2022-07-21 마이스크립트 Gesture Stroke Recognition in Touch-Based User Interface Input
EP4130966A1 (en) * 2019-11-29 2023-02-08 MyScript Gesture stroke recognition in touch-based user interface input
KR102677200B1 (en) 2019-11-29 2024-06-20 마이스크립트 Gesture stroke recognition in touch-based user interface input

Also Published As

Publication number Publication date
AU2050500A (en) 2000-06-26

Similar Documents

Publication Publication Date Title
JP3927407B2 (en) Method and system for writing common music notation (CMN) using a digital pen
US5454046A (en) Universal symbolic handwriting recognition system
Forsberg et al. The music notepad
US5636297A (en) Method and system for recognizing a graphic object&#39;s shape, line style, and fill pattern in a pen environment
JP3131287B2 (en) Pattern recognition device
JPH076204A (en) Adjustment method of line space and base line in handwriting recognition system
KR20190076008A (en) System and method for managing digital ink typographies
JPH05189617A (en) Method and apparatus for arc segmentation in handwritten-character recognition
JPS63184130A (en) Input/output device
CN108351746A (en) The system and method for guiding handwriting input
JPH1153402A (en) Information retrieval device
JP2006146894A (en) Parsing hierarchical lists and outlines
JP2019508770A (en) System and method for beautifying digital ink
WO2000034942A1 (en) Method and system for recognizing musical notations using a compass-direction user interface
Anstice et al. The design of a pen-based musical input system
US20050276480A1 (en) Handwritten input for Asian languages
Taubman et al. Musichand: A handwritten music recognition system
KR20250001382A (en) Handwriting text recognition apparatus supporting multiple writing directions
EP4086744B1 (en) Gesture stroke recognition in touch-based user interface input
Okamura et al. Handwriting interface for computer algebra systems
Tappert An adaptive system for handwriting recognition
KR101159323B1 (en) Handwritten input for asian languages
JP3075808B2 (en) Document processing device
JPH1049624A (en) Handwritten character recognition method and apparatus
JP3357343B2 (en) Input/Output Devices

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase