WO2009007512A1 - A gesture-controlled music synthesis system - Google Patents
A gesture-controlled music synthesis system Download PDFInfo
- Publication number
- WO2009007512A1 WO2009007512A1 PCT/FI2008/050421 FI2008050421W WO2009007512A1 WO 2009007512 A1 WO2009007512 A1 WO 2009007512A1 FI 2008050421 W FI2008050421 W FI 2008050421W WO 2009007512 A1 WO2009007512 A1 WO 2009007512A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- player
- cha
- musical
- computer program
- pose
- Prior art date
Links
- 230000015572 biosynthetic process Effects 0.000 title abstract description 4
- 238000003786 synthesis reaction Methods 0.000 title abstract description 4
- 238000000034 method Methods 0.000 claims abstract description 49
- 238000004590 computer program Methods 0.000 claims description 30
- 230000033001 locomotion Effects 0.000 claims description 30
- 230000000007 visual effect Effects 0.000 claims description 24
- 239000011295 pitch Substances 0.000 claims description 9
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003278 mimic effect Effects 0.000 claims description 2
- 230000002996 emotional effect Effects 0.000 abstract description 2
- 239000011435 rock Substances 0.000 abstract 1
- 238000004458 analytical method Methods 0.000 description 14
- 230000001360 synchronised effect Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000036544 posture Effects 0.000 description 4
- 230000000116 mitigating effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000000227 grinding Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 210000002414 leg Anatomy 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63H—TOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
- A63H5/00—Musical or noise- producing devices for additional toy effects other than acoustical
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63H—TOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
- A63H33/00—Other toys
- A63H33/30—Imitations of miscellaneous apparatus not otherwise provided for, e.g. telephones, weighing-machines, cash-registers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/46—Volume control
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/10—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
- A63F2300/1087—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
- A63F2300/1093—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera using visible light
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/80—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
- A63F2300/8047—Music games
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/201—User input interfaces for electrophonic musical instruments for movement interpretation, i.e. capturing and recognizing a gesture or a specific kind of movement, e.g. to control a musical instrument
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/441—Image sensing, i.e. capturing images or optical patterns for musical purposes or musical control purposes
- G10H2220/455—Camera input, e.g. analyzing pictures from a video camera and using the analysis results as control data
Definitions
- the present invention is related to gesture control, user interfaces and music synthesis.
- the present invention discloses a method for creating synthesized music in response to a player's gestures.
- the method is characterized in that at first a current player state is recognized. After that differences are calculated between the recognized player state and at least one pre-stored player state, wherein the pre-stored player states are each associated with a musical passage. Finally, playback volumes of the associated musical passages are adjusted according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume.
- the player state comprises a pose.
- the player state comprises playing speed.
- the playing speed is determined in relation to the frequency of consecutive trigger gestures performed by the player.
- such trigger gestures are recognized that mimic the strumming of the strings of a guitar.
- the player state comprises the distance between the hands of the player .
- the method further comprises the step of producing a plurality of pre-stored player states and associated musical passages so that the player states with a relatively high distance between the hands of the player are associated with musical passages that have notes with relatively low pitches .
- the method further comprises producing the musical passages so that each passage has a tempo that is an integer multiple of a base tempo, and playing the musical passages in sync with each other.
- the method further comprises the step of playing back a musical accompaniment track that has the same musical key as the musical passages and a tempo that is an integer multiple of the base tempo.
- the playback of an associated musical passage is stopped if its playback volume is below a threshold value, and a stopped musical passage whose playback volume is above a threshold value, is restarted at a playback position where the playback position would be in case the musical passage had not been stopped.
- the lengths of musical passages are defined as integer multiples of a base length, and the musical passages are looped consecutively and continuously.
- a threshold speed for the playing is defined, and playback of a single non-synchronized sound is started with each trigger gesture when playing at slower speed than the threshold speed.
- the playback of single non-synchronized sounds is started when predefined special poses and gestures are recognized.
- a pose is recognized among a predetermined discrete set of poses, and only the musical passages, which are associated with the pre-stored player state comprising the same discrete pose as the recognized player state, are played audibly.
- playing speed is recognized among a predetermined discrete set of playing speed values, and only the musical passages, which are associated with the pre-stored player state comprising the same discrete playing speed as the recognized player state, are played audibly.
- instruction images representing predefined poses are displayed for giving movement instructions to the player.
- the instruction images are highlighted as a function of the distance between the pose of the displayed image and the pose of the player.
- visual selection objects are provided for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
- a level of matching is calculating between the instruction images and the player' s current visual representation, and the desired pose is chosen as the best match among the instruction images.
- player' s hands and feet are included in said matching calculation.
- a limited subset of instruction images are provided at a time which show the best matches according to player' s current visual representation.
- the method steps described above are implemented in a form of a computer program.
- the computer program comprises program code configured to control a data- processing device to perform the previously disclosed method steps while applicable.
- the computer program is embodied on a computer readable medium.
- the parameters and additional features mentioned regarding different embodiments of the method are also applied to the program code of the computer program and to the system according to the invention.
- a system for creating synthesized music in response to a player's gestures comprises a motion tracking tool and data processing means configured to recognize a current player state.
- the system further comprises a memory configured to store a plurality of musical passages which each comprise an association to pre-stored player states.
- the system comprises a calculating means configured to calculate differences between the recognized player state and at least one pre-stored player state stored in the memory.
- the system also comprises a control unit configured to adjust playback volumes of the associated musical passages according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume.
- the system comprises a sound production device configured to produce the synthesized music according to the control unit output .
- a further embodiment of the invention is a system comprising a computing device controlled by said computer program.
- the system further comprises location indicators worn by the player in order to define the locations of player' s hands and possibly feet, and means for capturing the location data with the motion tracking tool.
- the system further comprises a camera in the motion tracking tool for taking plurality of pictures of the player, and means for using the picture data for the player state recognition.
- the system further comprises a screen and graphics rendering means configured to show the picture of the player composited inside three-dimensional computer graphics as a billboard texture that stays facing the virtual camera if the virtual camera moves.
- a screen and graphics rendering means are configured to display instruction images representing predefined poses for giving movement instructions to the player.
- visual selection objects are provided for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
- the calculating means is configured to calculate a level of matching between the instruction images and the player' s current visual representation, and the control unit is configured to choose the desired pose as the best match among the instruction images.
- the device is a useful virtual musical instrument which provides a new manner of transforming player' s gestures and postures into a realistic audio experience.
- the invention is useful e.g. for creating virtual air guitar playing system and corresponding guitar sound creation.
- Fig. 1 is a simple block diagram of the architecture according to an embodiment of the invention.
- Fig. 2 is a flow chart of the procedure with needed apparatus in an embodiment of the invention
- Fig. 3 is a flow chart of a more detailed embodiment of the procedure according to the invention
- Fig. 4 is a flow chart of another more detailed embodiment of the procedure according to the invention.
- Fig. 5 is an example of computer graphics rendered by a software based embodiment of the invention.
- Fig. 6 is a second example of computer graphics rendered by a software based embodiment of the invention.
- Fig. 7 is a third example of computer graphics rendered by a software based embodiment of the invention.
- the present invention discloses a method for synthesizing music by playing a so-called air guitar.
- the system according to the invention is able to capture the gestures and convert them to different licks, that is, musical passages like the real guitar would produce.
- the musical passages may be stored in computer memory or on a computer readable medium, for example, as MIDI or as sound waveform data.
- the present invention allows the user to control a virtual musical instrument without any physical user interface of an actual musical instrument.
- the physical user interface is exchanged into that of a virtual full- body interface, i.e. controlling the created sound with postures assumed by the user' s body, here referred to as poses, and movements of the body, here referred to as gestures.
- the user may use three controls to select the sound being played. These controls may be for instance the pose, playing speed defined by the frequency of performed trigger gestures, and distance between the left and right hands of the player.
- player's state is observed with a suitable apparatus.
- the player state may comprise gesture data of the player and the posture of the player.
- Gesture data comprises at least the aforementioned distance between the hands and the playing speed.
- Each combination of the three controls is mapped to a specific musical passage, called a lick, which can be stored in a database, for instance.
- the database with the lick data can be stored in a memory, which can form a part of a computer or it can be a separate memory unit. For example, standing in a certain pose, playing at slow speed (where the frequency of the player' s trigger gestures is low) , and holding the hands at a far mutual distance corresponds to one lick. Keeping the other parameters the same but playing at a high speed changes the lick to a different one, as does moving the hands closer to each other. In one embodiment of the invention, changing the pose opens up a new collection of licks for each combination of playing speed and the mutual distance of hands.
- each pose can be seen to contain a subset of licks, where each lick is activated by changing playing speed and the hand distance.
- the reason for this categorization is because changing poses is subjectively a major change, while the hand distance and playing speed are minor changes.
- the control parameters may be categorized into discrete parameter sets, but the user's movements and gestures are continuous by nature.
- licks are selected based on the combination of control parameters which the user is the nearest to. For example, if the user's position is between the two defined poses, the lick is selected based on which pose the user is closer to.
- selection is performed which means that the volumes of the other licks are decreased either completely or partially.
- the volumes may be normalized so that the overall loudness perceived by the player stays constant even though the relative volumes of the licks change.
- the pose may be described as a plurality of features, such as the locations of the player's hand and feet, body joint angles, or image features computed from an image acquired by a camera or from data provided by some other motion tracking tool, such as the control devices of Nintendo Wii game console.
- the pose may be determined as one of a predetermined discrete set of reference poses using pattern recognition and classification techniques, such as selecting the predetermined pose with a feature vector closest to the feature vector computed based on the motion tracking data.
- the predetermined poses may include, for example, "low grinding”, “basic playing", or "intense solo”.
- Each predetermined pose may have an associated a plurality of input image reference areas, and pose may be determined by computing the overlap of the player' s body parts and the reference areas .
- the poses represent the emotional and expressive qualities of the licks (e.g., “relaxed”, “anxious”, “heroic”) , whereas hand distance and playing speed are more directly mapped to musical parameters, such as the amount and pitch of notes in the lick.
- Learning to express emotion through playing a real instrument requires considerably more training and fine- motor skill than simply producing the correct notes.
- the benefit of the present invention is that the player can produce professional sounding music by expressing emotion with full body and controlling the essential musical parameters with simple gestures.
- frantic solo licks may be played by dropping on one's knees and arching one's back intensely, whereas confident, low, and crunchy licks may be played by assuming a stabile low stance with legs widely spread apart but keeping upper body relaxed.
- the player states may be categorized into discrete sets, but the user's movement is continuous by nature.
- licks are selected based on player state that the user is the nearest to. For example, if the user is between two poses, the lick is selected based on which pose the user is the closest to.
- selection means that the volumes of the other licks are decreased either completely or partially.
- the playing speed parameter is achieved by detecting the movements of one hand in a top-down motion similar to strumming the strings of a real guitar.
- the motion tracking tool 11 may be, for example, an ordinary digital camera that is capable of providing images at a certain desired resolution and rate.
- the motion tracking tool 11 may comprise computing means configured to perform data analysis, such as detecting a pose or filtering the signals.
- the motion tracking tool 11 may also comprise a number of devices held by or attached to the user 10 that are capable of sensing their location and/or orientation in space.
- these devices may be gloves which are worn by the virtual guitar player.
- the gloves can be connected to the computing unit 12 with wires or wirelessly via radio transmission, or the gloves can simply act as passive markers that can be recognized from a picture by computer vision software executed by the computing unit.
- the computing unit 12 may be, for example, an ordinary computer having sufficient computing power to provide the result at desired quality level. Furthermore, the computing unit 12 includes common means, such as a processor and a memory, in order to execute a computer program or a computer-implemented method according to the present invention. Furthermore, the computing device includes storage capacity for storing the recorded music pieces that are played back in the usage of the present invention.
- the sound production device 13 may be, for example, an ordinary loudspeaker capable of generating sound waves or a device that is capable of recording audio signals into digital or analog format on a storage device .
- Figure 2 discloses a flow chart disclosing different functional units in one embodiment of the invention.
- synthesized musical passages are chosen based on the poses and gestures of the user.
- the motion tracking tool 21 monitors the user 20 and transmits data concerning the user's body and movements to a player state analysis system 22.
- the player state analysis system 22 analyses the data and determines the posture of the user and/or gestures which the user is performing.
- the player state analysis system 22 outputs data that corresponds to the control parameters. These parameters can include for example a pose, speed or rate of playing, the distance between player's hands, the angle between the player's hands, and/or the locations of the player's hand and/or feet.
- This data is transmitted to a mixing unit 25, which may be simply a multiplexer, but also, for example, a volume controller.
- the mixing unit 25 reads the licks 24 (in this example, N pieces of choosable licks) from a lick database 23 and modifies their playback states and playback volumes so that the lick corresponding to the control parameters is being played back.
- the mixed audio is sent to the sound production device 26 for audio output .
- Figure 3 discloses a flow chart of a more detailed embodiment according to the invention.
- the motion tracking tool 31 monitors the player 30, and transmits data to a pose analysis unit 32 and a musical parameter analysis unit 33.
- the pose analysis unit 32 outputs pose data, such as a pose identifier.
- the musical parameter analysis unit 33 outputs musical gesture data, such as the hand distance and playing speed.
- the group selector 36 selects a lick group from the database 35 based on the pose data.
- the group selector 36 outputs the individual licks that belong to the selected group.
- the lick mixer 37 selects a lick based on the musical gesture data.
- the lick mixer 37 outputs the selected lick for audio output 34.
- Figure 4 discloses a flow chart of yet another embodiment of the invention where the selection of licks proceeds hierarchically.
- the motion tracking tool 41 monitors the player 40, and transmits data to the pose analysis unit 42, playing speed analysis unit 43 and hand distance analysis unit 44.
- the pose based selector 47 selects and outputs the licks from the lick database 46 that correspond to the detected pose.
- the speed based selector 48 selects and outputs the licks that correspond to the detected playing speed.
- the hand distance based selector 49 selects and outputs the lick that corresponds to the detected hand distance.
- Units 47, 48 and 49 may also be placed to any other mutual order than the one shown in Figure 4.
- the selected lick is directed to the audio output device 45.
- the content of the pre-recorded licks can correspond to the combinations of pose, speed and distance, for example. This means that each combination can be mapped to a certain lick and any chosen lick corresponds to one specific combination of the playing gesture parameters.
- For each playing speed there are licks containing passages, where the recorded instrument is played at different speeds. These speeds may be, for example, musical 4 th , 8 th and 16 th notes, corresponding to slow, medium and fast playing speeds, respectively. More accurately, the passages contain the perception of being played with 4 th , 8 th and 16 th notes where the actual sounds may contain syncopations or deviations in timing to make them musically rich and interesting while still maintaining the perception of slow, medium or fast playing speed.
- the same or similar lick exists with a certain pitch.
- broad grip large distance of the hands
- narrow grip results to a higher pitch of the same lick.
- Playing a certain lick at, for example, medium speed and then moving the hands from near to far distance changes the lick from high to low pitch.
- the licks within one subset of poses are typically fairly similar to each other compared to the licks with a different pose.
- an accompaniment track can be played throughout the performance.
- the user' s perception is that they are playing the lead instrument in an orchestra or band.
- the method of selecting discrete licks based on continuous movements is prone to fluctuations.
- ways of mitigating these fluctuations can be incorporated into the system.
- the system knows the user's Mistance' to each defined pose at all times.
- the same theory applies to hand distance and playing speed as well.
- the second way of the mitigation process is performing a cross-fade between two licks.
- all licks in the current subset of the pose are being played back, but only one of them is at full volume while the other licks are at zero volume.
- the playing volume of the two licks related to that control parameter' s extreme values changes as a function of the parameter change. For example, moving from a near hand distance towards a far hand distance, the volume of the lick corresponding to the near distance is reduced, and the volume of the lick corresponding to the far distance is increased correspondingly. Since the user can not be assumed to be proficient with any instrument, the system must ensure that the result sounds musical. This is accomplished by adhering to certain musical rules both in the recording of the musical content as well as in the real-time control.
- each lick in the database is recorded with a tempo (beats per minute) that is an integer multiple of a base tempo.
- a tempo beats per minute
- a lick with 8 th is played twice as fast, containing approximately 240 notes, and a lick with 16 th notes is four times as fast (480) .
- the length of each lick is an integer multiplier of a base length, e.g. a musical measure, so that memory and disk capacity can be saved by looping the licks (restarting playback once the lick has reached its end) so that a steady tempo is maintained.
- all licks have also been recorded in the same musical key and overall style so that they fit together well.
- the licks must be synchronized in time. Usually, this is accomplished by playing back all licks at the same time and controlling the volumes of each lick. However, for the sake of efficient calculation, licks to be played at zero volume may be stopped entirely and restarted, when the user picks up one of them. In this case, the playback must be started at a position that is synchronized to the global time.
- the new lick must start playing from the position of 1.317 seconds, not the beginning, and not the position it was previously stopped at. This ensures that each lick is always synchronized to a global tempo, thus ensuring that they fit into the accompaniment track.
- the user may also trigger single, non- synchronized sounds called long notes by playing at a speed even slower than the slowest licks.
- each playing gesture is interpreted as a trigger for a long note.
- the recorded content of long notes may be, for example, individual notes or chords that are let ring for a long time.
- the long notes may also be miscellaneous, non-musical sounds such as explosions or similar kinds of special sounds.
- triggering a long note immediately mutes any lick being played and begins the playback of a long note from the beginning.
- the note will continue playing until the user plays another note or chooses to mute the currently playing note with a special gesture.
- the user may mute the lick by lifting the right hand back up above the centerline, as if on top of an imaginary guitar's strings.
- An exception to this might be a case where the user moves the right hand horizontally away to the side of the imaginary guitar and only then lifts the hand back up, in which case the note would continue to be played.
- the pose and hand distance affect which long note is chosen. Typically, there may be 2-6 different hand distances which produce a different note, each of which is perceptually higher or lower in pitch corresponding to the distance.
- the user may also trigger single, non-synchronized sounds called special effects with predefined trigger gestures or poses. For example, assuming a certain pose may trigger the sound of an explosion, or moving the right hand in a circular motion may trigger the sound of hitting a drum.
- An embodiment of the invention may be a gaming system that comprises a screen configured to display the picture of the user captured by the motion tracking tool.
- the background may be removed from the picture using computer vision software, and the picture may be rendered as a billboard texture inside 3d graphics.
- the 3d graphics are displayed from the point of view of a virtual camera that can move, and the billboard texture is rotated so that it always faces the camera to maintain the illusion and not reveal the 2d nature of the texture.
- the present invention may comprise also instruction methods and visualizations that solve the problem of communicating to the player what player states
- a plurality of instruction images 51, 52, 53, one for each recognized player pose are visualized on screen in addition to the user 50.
- feedback provided by the system to the player about the pose analysis is important. The feedback helps the player determine how to move for allowing the system to detect a desired pose.
- the instruction images may be modified depending on their difference from the current pose of the player. For example, the instruction image depicting a pose closest to the player may be made larger than the other instruction images.
- all instruction images may be scaled according to the distances between the poses corresponding to the instruction images and the player's current pose. Additionally, the instruction image depicting a pose closest to the player may be highlighted, e.g., using a glow effect.
- the pose analysis may comprise the player' s visual representation 60 interacting on screen with visual elements, e.g., by touching at least one of a plurality of visual selection objects 61, 62, 63 to define the current pose.
- the user experience may become similar to the player playing a virtual guitar and selecting a lick group by touching virtual foot pedals on screen, and then modifying the volumes of the licks within the group with other player state information, such as the distance between hands and the playing speed.
- the pose analysis may comprise displaying a plurality of instruction images 71, 72, 73, 74, each corresponding to a pre-defined pose, and determining the recognized pose by measuring how well the player's current visual representation 70 on screen matches the instruction images.
- the instruction image 74 the instruction image 74
- the match may be measured by computing the distance between the on screen locations of the player' s body parts and the corresponding on screen locations of the body parts of the instruction images.
- the body parts used in the distance computation may comprise hands and feet.
- some of the instruction images may be hidden part of the time based on, e.g., the player's location on screen or on a virtual stage 75 so that the player may move around to search for the instruction images.
- the instruction images may move on screen so that they stay close to the player, and to minimize visual clutter, only a subset of the instruction images are visible at a given moment. The visible subset may be determined by computing which instruction images are matching the player's current pose best.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
A method, software, and system for gesture- controlled music synthesis are disclosed. The user's state is recognized and compared against pre-stored states, each associated with a musical passage. Based on state data, such as pose, frequency of trigger gestures (e.g., strumming the strings of an air guitar), and distance between the user's hands, suitable passages are chosen for playback. The invention lets the user generate high-quality music with no musical training. Typically, the invention is used for creating the illusion of the user playing lead guitar in a rock band, especially in embodiments that comprise displaying a picture of the user on a virtual stage in front of a virtual audience. The user may control simple musical parameters, such as pitch and amount of notes using simple gestures, and use full body poses for controlling emotional aspects of the musical output.
Description
A GESTURE-CONTROLLED MUSIC SYNTHESIS SYSTEM FIELD OF THE INVENTION
The present invention is related to gesture control, user interfaces and music synthesis.
BACKGROUND OF THE INVENTION
Traditional musical instruments, such as the guitar, violin and piano, are very expressive, and thus difficult to learn to play well. Since the beginning of the 20th century, gesture-controlled instruments without any tangible user interfaces have been developed, such as the Theremin (US Patent US1661058) . However, many of these interfaces, like the Theremin, have been difficult to learn to play - the Theremin in particular, because highly developed muscular precision is required for producing steady tones and keeping with a musical scale, because even minute variations in the player' s hand positions cause changes in the sound.
At the same time, ways of interacting with music have been developed that allow users to produce music creatively without the need for extensive training. Many of these systems are based on combining pre-recorded musical passages on a timeline. Playing back the created arrangement of these passages results in a musical piece. However, these systems are more composition tools than musical instruments, since they cannot be controlled in real time but require a period of preparation before the musical piece can be played back. This represents a significant problem in prior art.
SUMMARY OF THE INVENTION
The present invention discloses a method for creating synthesized music in response to a player's gestures. The method is characterized in that at first a current player state is recognized. After that differences are calculated between the recognized player state and at least one pre-stored player state, wherein the pre-stored player states are each associated with a
musical passage. Finally, playback volumes of the associated musical passages are adjusted according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume. In an embodiment of the invention, the player state comprises a pose.
In an embodiment of the invention, the player state comprises playing speed. In a further embodiment, the playing speed is determined in relation to the frequency of consecutive trigger gestures performed by the player. In a further embodiment, such trigger gestures are recognized that mimic the strumming of the strings of a guitar.
In an embodiment of the invention, the player state comprises the distance between the hands of the player .
In an embodiment of the invention, the method further comprises the step of producing a plurality of pre-stored player states and associated musical passages so that the player states with a relatively high distance between the hands of the player are associated with musical passages that have notes with relatively low pitches .
In an embodiment of the invention, the method further comprises producing the musical passages so that each passage has a tempo that is an integer multiple of a base tempo, and playing the musical passages in sync with each other.
In an embodiment of the invention, the method further comprises the step of playing back a musical accompaniment track that has the same musical key as the musical passages and a tempo that is an integer multiple of the base tempo.
In an embodiment of the invention, the playback of an associated musical passage is stopped if its playback volume is below a threshold value, and a stopped musical passage whose playback volume is above a threshold value, is restarted at a playback position
where the playback position would be in case the musical passage had not been stopped.
In an embodiment of the invention, the lengths of musical passages are defined as integer multiples of a base length, and the musical passages are looped consecutively and continuously.
In an embodiment of the invention, a threshold speed for the playing is defined, and playback of a single non-synchronized sound is started with each trigger gesture when playing at slower speed than the threshold speed.
In an embodiment of the invention, the playback of single non-synchronized sounds is started when predefined special poses and gestures are recognized. In an embodiment of the invention, a pose is recognized among a predetermined discrete set of poses, and only the musical passages, which are associated with the pre-stored player state comprising the same discrete pose as the recognized player state, are played audibly. In an embodiment of the invention, playing speed is recognized among a predetermined discrete set of playing speed values, and only the musical passages, which are associated with the pre-stored player state comprising the same discrete playing speed as the recognized player state, are played audibly.
In an embodiment of the present invention, instruction images representing predefined poses are displayed for giving movement instructions to the player.
In an embodiment of the present invention, the instruction images are highlighted as a function of the distance between the pose of the displayed image and the pose of the player.
In an embodiment of the present invention, visual selection objects are provided for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
In an embodiment of the present invention, a level of matching is calculating between the instruction images and the player' s current visual representation,
and the desired pose is chosen as the best match among the instruction images.
In an embodiment of the present invention, player' s hands and feet are included in said matching calculation.
In an embodiment of the present invention, a limited subset of instruction images are provided at a time which show the best matches according to player' s current visual representation. According to a second aspect of the invention, the method steps described above are implemented in a form of a computer program. The computer program comprises program code configured to control a data- processing device to perform the previously disclosed method steps while applicable. In one embodiment of the invention, the computer program is embodied on a computer readable medium. In one embodiment of the invention, the parameters and additional features mentioned regarding different embodiments of the method, are also applied to the program code of the computer program and to the system according to the invention.
According to a third aspect of the present invention, a system for creating synthesized music in response to a player's gestures is disclosed. The system comprises a motion tracking tool and data processing means configured to recognize a current player state. The system further comprises a memory configured to store a plurality of musical passages which each comprise an association to pre-stored player states. Furthermore, the system comprises a calculating means configured to calculate differences between the recognized player state and at least one pre-stored player state stored in the memory. The system also comprises a control unit configured to adjust playback volumes of the associated musical passages according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume. Finally, the system comprises a sound production device configured to produce
the synthesized music according to the control unit output .
A further embodiment of the invention is a system comprising a computing device controlled by said computer program.
In an embodiment of the invention, the system further comprises location indicators worn by the player in order to define the locations of player' s hands and possibly feet, and means for capturing the location data with the motion tracking tool.
In an embodiment of the invention, the system further comprises a camera in the motion tracking tool for taking plurality of pictures of the player, and means for using the picture data for the player state recognition. In a further embodiment of the invention, the system further comprises a screen and graphics rendering means configured to show the picture of the player composited inside three-dimensional computer graphics as a billboard texture that stays facing the virtual camera if the virtual camera moves.
In an embodiment of the invention, a screen and graphics rendering means are configured to display instruction images representing predefined poses for giving movement instructions to the player. In an embodiment of the invention, visual selection objects are provided for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
In an embodiment of the invention, the calculating means is configured to calculate a level of matching between the instruction images and the player' s current visual representation, and the control unit is configured to choose the desired pose as the best match among the instruction images. As an advantage of the present invention, the device is a useful virtual musical instrument which provides a new manner of transforming player' s gestures and postures into a realistic audio experience. The
invention is useful e.g. for creating virtual air guitar playing system and corresponding guitar sound creation.
BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention and constitute a part of this specification, illustrate embodiments of the invention and together with the description help to explain the principles of the invention. In the drawings:
Fig. 1 is a simple block diagram of the architecture according to an embodiment of the invention,
Fig. 2 is a flow chart of the procedure with needed apparatus in an embodiment of the invention, Fig. 3 is a flow chart of a more detailed embodiment of the procedure according to the invention,
Fig. 4 is a flow chart of another more detailed embodiment of the procedure according to the invention,
Fig. 5 is an example of computer graphics rendered by a software based embodiment of the invention,
Fig. 6 is a second example of computer graphics rendered by a software based embodiment of the invention, and
Fig. 7 is a third example of computer graphics rendered by a software based embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings.
The present invention discloses a method for synthesizing music by playing a so-called air guitar. By using different playing gestures, the system according to the invention is able to capture the gestures and convert them to different licks, that is, musical passages like the real guitar would produce. In an embodiment of the invention, the musical passages may be stored in computer
memory or on a computer readable medium, for example, as MIDI or as sound waveform data.
The present invention allows the user to control a virtual musical instrument without any physical user interface of an actual musical instrument. The physical user interface is exchanged into that of a virtual full- body interface, i.e. controlling the created sound with postures assumed by the user' s body, here referred to as poses, and movements of the body, here referred to as gestures. While using the method according to an example of the present invention, the user may use three controls to select the sound being played. These controls may be for instance the pose, playing speed defined by the frequency of performed trigger gestures, and distance between the left and right hands of the player. In other words, player's state is observed with a suitable apparatus. The player state may comprise gesture data of the player and the posture of the player. Gesture data comprises at least the aforementioned distance between the hands and the playing speed.
Each combination of the three controls is mapped to a specific musical passage, called a lick, which can be stored in a database, for instance. The database with the lick data can be stored in a memory, which can form a part of a computer or it can be a separate memory unit. For example, standing in a certain pose, playing at slow speed (where the frequency of the player' s trigger gestures is low) , and holding the hands at a far mutual distance corresponds to one lick. Keeping the other parameters the same but playing at a high speed changes the lick to a different one, as does moving the hands closer to each other. In one embodiment of the invention, changing the pose opens up a new collection of licks for each combination of playing speed and the mutual distance of hands. Thus, each pose can be seen to contain a subset of licks, where each lick is activated by changing playing speed and the hand distance. The reason for this categorization is because changing poses is subjectively
a major change, while the hand distance and playing speed are minor changes.
The control parameters may be categorized into discrete parameter sets, but the user's movements and gestures are continuous by nature. Thus, in one embodiment of the invention, licks are selected based on the combination of control parameters which the user is the nearest to. For example, if the user's position is between the two defined poses, the lick is selected based on which pose the user is closer to. In one embodiment of the invention, selection is performed which means that the volumes of the other licks are decreased either completely or partially. In a further embodiment, the volumes may be normalized so that the overall loudness perceived by the player stays constant even though the relative volumes of the licks change.
The pose may be described as a plurality of features, such as the locations of the player's hand and feet, body joint angles, or image features computed from an image acquired by a camera or from data provided by some other motion tracking tool, such as the control devices of Nintendo Wii game console. Furthermore, the pose may be determined as one of a predetermined discrete set of reference poses using pattern recognition and classification techniques, such as selecting the predetermined pose with a feature vector closest to the feature vector computed based on the motion tracking data. In case of a virtual guitar, the predetermined poses may include, for example, "low grinding", "basic playing", or "intense solo". Each predetermined pose may have an associated a plurality of input image reference areas, and pose may be determined by computing the overlap of the player' s body parts and the reference areas . Typically, the poses represent the emotional and expressive qualities of the licks (e.g., "relaxed", "anxious", "heroic") , whereas hand distance and playing speed are more directly mapped to musical parameters, such as the amount and pitch of notes in the lick.
Learning to express emotion through playing a real instrument requires considerably more training and fine- motor skill than simply producing the correct notes. The benefit of the present invention is that the player can produce professional sounding music by expressing emotion with full body and controlling the essential musical parameters with simple gestures. For example, if the virtual instrument is a guitar, frantic solo licks may be played by dropping on one's knees and arching one's back intensely, whereas confident, low, and crunchy licks may be played by assuming a stabile low stance with legs widely spread apart but keeping upper body relaxed.
The player states (pose and gesture data) may be categorized into discrete sets, but the user's movement is continuous by nature. Thus, licks are selected based on player state that the user is the nearest to. For example, if the user is between two poses, the lick is selected based on which pose the user is the closest to. Here, selection means that the volumes of the other licks are decreased either completely or partially.
In an example according to the invention, the playing speed parameter is achieved by detecting the movements of one hand in a top-down motion similar to strumming the strings of a real guitar. By now referring to the drawings, an embodiment of an apparatus according to the present invention is disclosed in Figure 1. The apparatus comprises a motion tracking tool 11, a computing unit 12 and a sound production device 13 while a user 10 is located in the field of view regarding the motion tracking tool 11.
The motion tracking tool 11 may be, for example, an ordinary digital camera that is capable of providing images at a certain desired resolution and rate. In an embodiment of the invention, the motion tracking tool 11 may comprise computing means configured to perform data analysis, such as detecting a pose or filtering the signals. The motion tracking tool 11 may also comprise a number of devices held by or attached to the user 10 that are capable of sensing their location and/or orientation
in space. In an example, these devices may be gloves which are worn by the virtual guitar player. The gloves can be connected to the computing unit 12 with wires or wirelessly via radio transmission, or the gloves can simply act as passive markers that can be recognized from a picture by computer vision software executed by the computing unit.
The computing unit 12 may be, for example, an ordinary computer having sufficient computing power to provide the result at desired quality level. Furthermore, the computing unit 12 includes common means, such as a processor and a memory, in order to execute a computer program or a computer-implemented method according to the present invention. Furthermore, the computing device includes storage capacity for storing the recorded music pieces that are played back in the usage of the present invention. The sound production device 13 may be, for example, an ordinary loudspeaker capable of generating sound waves or a device that is capable of recording audio signals into digital or analog format on a storage device .
Figure 2 discloses a flow chart disclosing different functional units in one embodiment of the invention. Generally speaking, synthesized musical passages are chosen based on the poses and gestures of the user. The motion tracking tool 21 monitors the user 20 and transmits data concerning the user's body and movements to a player state analysis system 22. The player state analysis system 22 analyses the data and determines the posture of the user and/or gestures which the user is performing. The player state analysis system 22 outputs data that corresponds to the control parameters. These parameters can include for example a pose, speed or rate of playing, the distance between player's hands, the angle between the player's hands, and/or the locations of the player's hand and/or feet. This data is transmitted to a mixing unit 25, which may be simply a multiplexer, but also, for example, a volume controller. The mixing unit 25 reads the licks 24 (in
this example, N pieces of choosable licks) from a lick database 23 and modifies their playback states and playback volumes so that the lick corresponding to the control parameters is being played back. The mixed audio is sent to the sound production device 26 for audio output .
Figure 3 discloses a flow chart of a more detailed embodiment according to the invention. The motion tracking tool 31 monitors the player 30, and transmits data to a pose analysis unit 32 and a musical parameter analysis unit 33. The pose analysis unit 32 outputs pose data, such as a pose identifier. The musical parameter analysis unit 33 outputs musical gesture data, such as the hand distance and playing speed. The group selector 36 selects a lick group from the database 35 based on the pose data. The group selector 36 outputs the individual licks that belong to the selected group. The lick mixer 37 selects a lick based on the musical gesture data. The lick mixer 37 outputs the selected lick for audio output 34.
Figure 4 discloses a flow chart of yet another embodiment of the invention where the selection of licks proceeds hierarchically. The motion tracking tool 41 monitors the player 40, and transmits data to the pose analysis unit 42, playing speed analysis unit 43 and hand distance analysis unit 44. The pose based selector 47 selects and outputs the licks from the lick database 46 that correspond to the detected pose. The speed based selector 48 selects and outputs the licks that correspond to the detected playing speed. The hand distance based selector 49 selects and outputs the lick that corresponds to the detected hand distance. Units 47, 48 and 49 may also be placed to any other mutual order than the one shown in Figure 4. Finally, the selected lick is directed to the audio output device 45.
Generally in the present invention, the content of the pre-recorded licks can correspond to the combinations of pose, speed and distance, for example. This means that each combination can be mapped to a
certain lick and any chosen lick corresponds to one specific combination of the playing gesture parameters. For each playing speed, there are licks containing passages, where the recorded instrument is played at different speeds. These speeds may be, for example, musical 4th, 8th and 16th notes, corresponding to slow, medium and fast playing speeds, respectively. More accurately, the passages contain the perception of being played with 4th, 8th and 16th notes where the actual sounds may contain syncopations or deviations in timing to make them musically rich and interesting while still maintaining the perception of slow, medium or fast playing speed.
For each mutual hand distance (for example, corresponding to a broad or narrow grip of the guitar) , the same or similar lick exists with a certain pitch. For example, broad grip (large distance of the hands) results in lower pitch and narrow grip results to a higher pitch of the same lick. Playing a certain lick at, for example, medium speed and then moving the hands from near to far distance changes the lick from high to low pitch.
Because changing poses represents a major change, the licks within one subset of poses are typically fairly similar to each other compared to the licks with a different pose.
In addition to the licks, an accompaniment track can be played throughout the performance. Thus, the user' s perception is that they are playing the lead instrument in an orchestra or band. The method of selecting discrete licks based on continuous movements is prone to fluctuations. Thus, ways of mitigating these fluctuations can be incorporated into the system.
In one example taking this issue into account, the system knows the user's Mistance' to each defined pose at all times. Let x be a normalized distance measure so that x = 0.0 indicates the user matching first pose and x = 1.0 indicates the user matching the second pose. If the user is located at x = 0.5, that is, equally far
away from both poses, minor fluctuations to either side can cause the system to change between two licks very rapidly. The same theory applies to hand distance and playing speed as well. The first way of the mitigation process is to apply a hysteresis filter to the data of distance between two poses so that the change from one lick to the next happens only after the distance has passed well over the midpoint in the opposite direction from the first pose. Thus, if the user hovers around x = 0.5, no accidental switching will occur.
The second way of the mitigation process is performing a cross-fade between two licks. At any given moment, all licks in the current subset of the pose are being played back, but only one of them is at full volume while the other licks are at zero volume. When one of the control parameters changes, the playing volume of the two licks related to that control parameter' s extreme values changes as a function of the parameter change. For example, moving from a near hand distance towards a far hand distance, the volume of the lick corresponding to the near distance is reduced, and the volume of the lick corresponding to the far distance is increased correspondingly. Since the user can not be assumed to be proficient with any instrument, the system must ensure that the result sounds musical. This is accomplished by adhering to certain musical rules both in the recording of the musical content as well as in the real-time control.
In an embodiment of the invention, each lick in the database is recorded with a tempo (beats per minute) that is an integer multiple of a base tempo. For a tempo of 120 4th notes per minute, a lick with 4th notes contains approximately 120 notes in a minute. A lick with 8th is played twice as fast, containing approximately 240 notes, and a lick with 16th notes is four times as fast (480) .
In an embodiment of the invention, the length of each lick is an integer multiplier of a base length, e.g.
a musical measure, so that memory and disk capacity can be saved by looping the licks (restarting playback once the lick has reached its end) so that a steady tempo is maintained. In a further embodiment, all licks have also been recorded in the same musical key and overall style so that they fit together well.
To maintain tempo and overall musical appeal during actual playback, the licks must be synchronized in time. Usually, this is accomplished by playing back all licks at the same time and controlling the volumes of each lick. However, for the sake of efficient calculation, licks to be played at zero volume may be stopped entirely and restarted, when the user picks up one of them. In this case, the playback must be started at a position that is synchronized to the global time.
For example, if the user changes the lick 1.317 seconds after the lick has started, the new lick must start playing from the position of 1.317 seconds, not the beginning, and not the position it was previously stopped at. This ensures that each lick is always synchronized to a global tempo, thus ensuring that they fit into the accompaniment track.
In addition to licks in one embodiment of the invention, the user may also trigger single, non- synchronized sounds called long notes by playing at a speed even slower than the slowest licks. In this case, each playing gesture is interpreted as a trigger for a long note. The recorded content of long notes may be, for example, individual notes or chords that are let ring for a long time. The long notes may also be miscellaneous, non-musical sounds such as explosions or similar kinds of special sounds.
For example, triggering a long note immediately mutes any lick being played and begins the playback of a long note from the beginning. The note will continue playing until the user plays another note or chooses to mute the currently playing note with a special gesture.
For example, the user may mute the lick by lifting the right hand back up above the centerline, as
if on top of an imaginary guitar's strings. An exception to this might be a case where the user moves the right hand horizontally away to the side of the imaginary guitar and only then lifts the hand back up, in which case the note would continue to be played.
The pose and hand distance affect which long note is chosen. Typically, there may be 2-6 different hand distances which produce a different note, each of which is perceptually higher or lower in pitch corresponding to the distance.
In addition to licks and long notes, the user may also trigger single, non-synchronized sounds called special effects with predefined trigger gestures or poses. For example, assuming a certain pose may trigger the sound of an explosion, or moving the right hand in a circular motion may trigger the sound of hitting a drum.
An embodiment of the invention may be a gaming system that comprises a screen configured to display the picture of the user captured by the motion tracking tool. In order to provide a convincing illusion of the user playing on a virtual stage in front of a virtual audience, the background may be removed from the picture using computer vision software, and the picture may be rendered as a billboard texture inside 3d graphics. In a further embodiment, the 3d graphics are displayed from the point of view of a virtual camera that can move, and the billboard texture is rotated so that it always faces the camera to maintain the illusion and not reveal the 2d nature of the texture. The present invention may comprise also instruction methods and visualizations that solve the problem of communicating to the player what player states
(poses, hand distances and playing speeds) the system is able to recognize and for which sound content has been prerecorded or can be generated in real-time. In an embodiment of the invention shown in Figure 5, a plurality of instruction images 51, 52, 53, one for each recognized player pose, are visualized on screen in addition to the user 50.
In an interactive system, feedback provided by the system to the player about the pose analysis is important. The feedback helps the player determine how to move for allowing the system to detect a desired pose. To provide stronger feedback than the audible changes in the musical content, the instruction images may be modified depending on their difference from the current pose of the player. For example, the instruction image depicting a pose closest to the player may be made larger than the other instruction images. In an alternative embodiment, all instruction images may be scaled according to the distances between the poses corresponding to the instruction images and the player's current pose. Additionally, the instruction image depicting a pose closest to the player may be highlighted, e.g., using a glow effect.
In a further embodiment of the present invention shown in Figure 6, the pose analysis may comprise the player' s visual representation 60 interacting on screen with visual elements, e.g., by touching at least one of a plurality of visual selection objects 61, 62, 63 to define the current pose. In this case, the user experience may become similar to the player playing a virtual guitar and selecting a lick group by touching virtual foot pedals on screen, and then modifying the volumes of the licks within the group with other player state information, such as the distance between hands and the playing speed.
In an embodiment of the invention shown in Figure 7, the pose analysis may comprise displaying a plurality of instruction images 71, 72, 73, 74, each corresponding to a pre-defined pose, and determining the recognized pose by measuring how well the player's current visual representation 70 on screen matches the instruction images. In Figure 7, the instruction image 74
(illustrated as the outline of a body silhouette) is the best match for the player's visual representation 70
(illustrated as a solid black silhouette) . For example, if the player' s video image is shown on screen together
with the instruction images, the match may be measured by computing the distance between the on screen locations of the player' s body parts and the corresponding on screen locations of the body parts of the instruction images. For example, the body parts used in the distance computation may comprise hands and feet. In a further embodiment, some of the instruction images may be hidden part of the time based on, e.g., the player's location on screen or on a virtual stage 75 so that the player may move around to search for the instruction images. In another embodiment, the instruction images may move on screen so that they stay close to the player, and to minimize visual clutter, only a subset of the instruction images are visible at a given moment. The visible subset may be determined by computing which instruction images are matching the player's current pose best.
It is obvious to a person skilled in the art that with the advancement of technology, the basic idea of the invention may be implemented in various ways. The invention and its embodiments are thus not limited to the examples described above, instead they may vary within the scope of the claims.
Claims
1. A method for creating synthesized music in response to a player's gestures, cha r a ct e r i z e d in that the method comprises the steps of: recognizing a current player state; calculating differences between the recognized player state and at least one pre-stored player state, wherein the pre-stored player states are each associated with a musical passage; and adjusting playback volumes of the associated musical passages according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume.
2. The method according to claim 1, cha r a ct e r i z e d in that the player state comprises a pose.
3. The method according to any of preceding claims 1 - 2, cha r a ct e r i z e d in that the player state comprises playing speed.
4. The method according to claim 3, cha r a ct e r i z e d in that the method further comprises the step of: determining the playing speed in relation to the frequency of consecutive trigger gestures performed by the player.
5. The method according to claim 4, cha r a c t e r i z e d in that the method further comprises the step of: recognizing trigger gestures that mimic the strumming of the strings of a guitar.
6. The method according to any of preceding claims 1 - 5, cha r a ct e r i z e d in that the player state comprises the distance between the hands of the player.
7. The method according to any of preceding claims 1 - 6, cha r a ct e r i z e d in that the method further comprises the step of: producing a plurality of pre-stored player states and associated musical passages so that the player states with a relatively high distance between the hands of the player are associated with musical passages that have notes with relatively low pitches.
8. The method according to any preceding claims 1 - 7, cha r a c t e r i z e d in that the method further comprises the steps of: producing the musical passages so that each passage has a tempo that is an integer multiple of a base tempo; and playing the musical passages in sync with each other.
9. The method according to any of preceding claims 1 - 8, cha r a c t e r i z e d in that the method further comprises the step of: playing back a musical accompaniment track that has the same musical key as the musical passages and a tempo that is an integer multiple of the base tempo.
10. The method according to any of preceding claims 1 - 9, cha r a c t e r i z e d in that the method further comprises the steps of: stopping the playback of an associated musical passage if its playback volume is below a threshold value; and restarting a stopped musical passage if its playback volume is above a threshold value, and setting the playback position of the restarted musical passage to where it would be in case the musical passage had not been stopped.
11. The method according to any of preceding claims 1 - 10, c h a r a c t e r i z e d in that the method further comprises the step of: displaying instruction images representing predefined poses for giving movement instructions to the player.
12. The method according to claim 11, c h a r a c t e r i z e d in that the method further comprises the step of: highlighting the instruction images as a function of the distance between the pose of the displayed image and the pose of the player.
13. The method according to any of preceding claims 1 - 12, cha r a c t e r i z e d in that the method further comprises the step of: providing visual selection objects for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
14. The method according to any of preceding claims 11-13, cha r a c t e r i z e d in that the method further comprises the steps of: calculating a level of matching between the instruction images and the player' s current visual representation; and choosing the desired pose as the best match among the instruction images.
15. The method according to claim 14, cha r a c t e r i z e d in that the method further comprises the step of: including player' s hands and feet in said matching calculation .
16. The method according to any of preceding claims 11-15, cha r a c t e r i z e d in that the method further comprises the step of: providing a limited subset of instruction images at a time which show the best matches according to player' s current visual representation.
17. A computer program for creating synthesized music in response to a player's gestures, cha r a c t e r i z e d in that the computer program comprises program code configured to control a data- processing device to perform: recognizing a current player state; calculating differences between the recognized player state and at least one pre-stored player state, wherein the pre-stored player states are each associated with a musical passage; and adjusting playback volumes of the associated musical passages according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume.
18. The computer program according to claim 17, cha r a ct e r i z e d in that the computer program further comprises the step of: controlling the determining of the playing speed in relation to the frequency of consecutive trigger gestures performed by the player.
19. The computer program according to any of preceding claims 17 - 18, cha r a c t e r i z e d in that the computer program further comprises the step of: controlling the producing of a plurality of pre- stored player states and associated musical passages so that the player states with a relatively high distance between the hands of the player are associated with musical passages that have notes with relatively low pitches .
20. The computer program according to any preceding claims 17 - 19, cha r a c t e r i z e d in that the computer program further comprises the steps of: controlling the producing of the musical passages so that each passage has a tempo that is an integer multiple of a base tempo; and controlling the playing of the musical passages in sync with each other.
21. The computer program according to any of preceding claims 17 - 20, cha r a c t e r i z e d in that the computer program further comprises the step of: controlling the playback of a musical accompaniment track that has the same musical key as the musical passages and a tempo that is an integer multiple of the base tempo.
22. The computer program according to any of preceding claims 17 - 21, cha r a c t e r i z e d in that the computer program further comprises the steps of: controlling the stopping of the playback of an associated musical passage if its playback volume is below a threshold value; and controlling the restarting of a stopped musical passage if its playback volume is above a threshold value, and setting the playback position of the restarted musical passage to where it would be in case the musical passage had not been stopped.
23. The computer program according to any of preceding claims 17 - 22, cha r a c t e r i z e d in that the computer program further comprises the step of: controlling the displaying of instruction images representing predefined poses for giving movement instructions to the player.
24. The computer program according to claim 23, cha r a c t e r i z e d in that the computer program further comprises the step of: controlling the highlighting of the instruction images as a function of the distance between the pose of the displayed image and the pose of the player.
25. The computer program according to any of preceding claims 17 - 24, cha r a c t e r i z e d in that the computer program further comprises the step of: providing visual selection objects for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
26. The computer program according to any of preceding claims 23-25, cha r a c t e r i z e d in that the computer program further comprises the steps of: calculating a level of matching between the instruction images and the player' s current visual representation; and choosing the desired pose as the best match among the instruction images.
27. The computer program according to claim 26, c h a r a c t e r i z e d in that the computer program further comprises the step of: controlling the providing of a limited subset of instruction images at a time which show the best matches according to player's current visual representation.
28. The computer program according to any of preceding claims 11-21 , cha r a c t e r i z e d in that the computer program is embodied on a computer readable medium.
29. A system for creating synthesized music in response to a player's gestures, cha r a ct e r i z e d in that the system comprises: a motion tracking tool (11, 21) and data processing means configured to recognize a current player state; a memory (23) configured to store a plurality of musical passages which each comprise an association to pre-stored player states; a calculating means (12) configured to calculate differences between the recognized player state and at least one pre-stored player state stored in the memory
(23) ; a control unit (25) configured to adjust playback volumes of the associated musical passages according to the calculated differences so that a small difference yields a high volume and a large difference yields a low volume; and a sound production device (13, 26) configured to produce the synthesized music according to the control unit (25) output.
30. The system according to claim 29, cha r a c t e r i z e d in that the system further comprises: location indicators worn by the player in order to define the locations of player's hands; and means for capturing the location data with the motion tracking tool (11, 21).
31. The system according to any preceding claims
29 - 30, cha r a ct e r i z e d in that the system further comprises: a camera in the motion tracking tool (11, 21) for taking plurality of pictures of the player; and means (11, 12, 21, 22) for using the picture data for the player state recognition.
32. The system according to claim 31, cha r a c t e r i z e d in that the system further comprises: a screen and graphics rendering means configured to show the picture of the player composited inside three- dimensional computer graphics as a billboard texture that stays facing the virtual camera if the virtual camera moves.
33. The system according to any preceding claims 29 - 32, cha r a ct e r i z e d in that the system further comprises: a screen and graphics rendering means configured to display instruction images (51, 52, 53) representing predefined poses for giving movement instructions to the player .
34. The system according to any of preceding claims 29 - 33, cha r a ct e r i z e d in that the system further comprises: visual selection objects (61, 62, 63) for allowing the player to choose a desired pose by virtually touching the visual selection objects on screen.
35. The system according to any of preceding claims 29 - 34, cha r a ct e r i z e d in that the system further comprises: calculating means (12) configured to calculate a level of matching between the instruction images (71, 72, 73, 74) and the player's current visual representation (70); and control unit (25) configured to choose the desired pose as the best match among the instruction images (74) .
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20075530 | 2007-07-09 | ||
FI20075530A FI20075530A0 (en) | 2007-07-09 | 2007-07-09 | Gesture-controlled music synthesis system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009007512A1 true WO2009007512A1 (en) | 2009-01-15 |
Family
ID=38331618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2008/050421 WO2009007512A1 (en) | 2007-07-09 | 2008-07-09 | A gesture-controlled music synthesis system |
Country Status (2)
Country | Link |
---|---|
FI (1) | FI20075530A0 (en) |
WO (1) | WO2009007512A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012069614A1 (en) * | 2010-11-25 | 2012-05-31 | Institut für Rundfunktechnik GmbH | Method and assembly for improved audio signal presentation of sounds during a video recording |
GB2496521A (en) * | 2011-11-11 | 2013-05-15 | Fictitious Capital Ltd | Computerised musical instrument using motion capture and analysis |
US8618405B2 (en) | 2010-12-09 | 2013-12-31 | Microsoft Corp. | Free-space gesture musical instrument digital interface (MIDI) controller |
US9573049B2 (en) | 2013-01-07 | 2017-02-21 | Mibblio, Inc. | Strum pad |
WO2017048713A1 (en) * | 2015-09-16 | 2017-03-23 | Magic Leap, Inc. | Head pose mixing of audio files |
US10188957B2 (en) | 2016-10-18 | 2019-01-29 | Mattel, Inc. | Toy with proximity-based interactive features |
WO2020007179A1 (en) * | 2018-07-05 | 2020-01-09 | 腾讯科技(深圳)有限公司 | Posture adjustment method and device, storage medium, and electronic device |
US10839778B1 (en) * | 2019-06-13 | 2020-11-17 | Everett Reid | Circumambient musical sensor pods system |
GB2585060A (en) * | 2019-06-27 | 2020-12-30 | Sony Interactive Entertainment Inc | Audio generation system and method |
CN113299256A (en) * | 2021-05-14 | 2021-08-24 | 上海锣钹信息科技有限公司 | MIDI digital music performance interaction method |
CN116504205A (en) * | 2023-03-01 | 2023-07-28 | 广州感音科技有限公司 | Musical performance control method, system, medium and computer |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2494432B1 (en) | 2009-10-27 | 2019-05-29 | Harmonix Music Systems, Inc. | Gesture-based user interface |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4658427A (en) * | 1982-12-10 | 1987-04-14 | Etat Francais Represente Per Le Ministre Des Ptt (Centre National D'etudes Des Telecommunications) | Sound production device |
WO1993022762A1 (en) * | 1992-04-24 | 1993-11-11 | The Walt Disney Company | Apparatus and method for tracking movement to generate a control signal |
US6506969B1 (en) * | 1998-09-24 | 2003-01-14 | Medal Sarl | Automatic music generating method and device |
WO2005094958A1 (en) * | 2004-03-23 | 2005-10-13 | Harmonix Music Systems, Inc. | Method and apparatus for controlling a three-dimensional character in a three-dimensional gaming environment |
-
2007
- 2007-07-09 FI FI20075530A patent/FI20075530A0/en not_active Application Discontinuation
-
2008
- 2008-07-09 WO PCT/FI2008/050421 patent/WO2009007512A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4658427A (en) * | 1982-12-10 | 1987-04-14 | Etat Francais Represente Per Le Ministre Des Ptt (Centre National D'etudes Des Telecommunications) | Sound production device |
WO1993022762A1 (en) * | 1992-04-24 | 1993-11-11 | The Walt Disney Company | Apparatus and method for tracking movement to generate a control signal |
US6506969B1 (en) * | 1998-09-24 | 2003-01-14 | Medal Sarl | Automatic music generating method and device |
WO2005094958A1 (en) * | 2004-03-23 | 2005-10-13 | Harmonix Music Systems, Inc. | Method and apparatus for controlling a three-dimensional character in a three-dimensional gaming environment |
Non-Patent Citations (1)
Title |
---|
KARJALAINEN M. ET AL.: "Virtual Air Guitar", AUDIO ENGINEERING SOCIETY, CONVENTION PAPER, SAN FRANCISCO, 28 October 2004 (2004-10-28) - 31 October 2004 (2004-10-31) * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103329145A (en) * | 2010-11-25 | 2013-09-25 | 无线电广播技术研究所有限公司 | Method and assembly for improved audio signal presentation of sounds during a video recording |
US9240213B2 (en) | 2010-11-25 | 2016-01-19 | Institut Fur Rundfunktechnik Gmbh | Method and assembly for improved audio signal presentation of sounds during a video recording |
WO2012069614A1 (en) * | 2010-11-25 | 2012-05-31 | Institut für Rundfunktechnik GmbH | Method and assembly for improved audio signal presentation of sounds during a video recording |
CN103329145B (en) * | 2010-11-25 | 2017-06-27 | 无线电广播技术研究所有限公司 | Method and assembly for improved audio signal rendering of sound during video recording |
US8618405B2 (en) | 2010-12-09 | 2013-12-31 | Microsoft Corp. | Free-space gesture musical instrument digital interface (MIDI) controller |
GB2496521B (en) * | 2011-11-11 | 2019-01-16 | Fictitious Capital Ltd | Computerised percussion instrument |
GB2496521A (en) * | 2011-11-11 | 2013-05-15 | Fictitious Capital Ltd | Computerised musical instrument using motion capture and analysis |
US9224377B2 (en) | 2011-11-11 | 2015-12-29 | Fictitious Capital Limited | Computerized percussion instrument |
US9573049B2 (en) | 2013-01-07 | 2017-02-21 | Mibblio, Inc. | Strum pad |
US11039267B2 (en) | 2015-09-16 | 2021-06-15 | Magic Leap, Inc. | Head pose mixing of audio files |
US11778412B2 (en) | 2015-09-16 | 2023-10-03 | Magic Leap, Inc. | Head pose mixing of audio files |
US10681489B2 (en) | 2015-09-16 | 2020-06-09 | Magic Leap, Inc. | Head pose mixing of audio files |
US12185086B2 (en) | 2015-09-16 | 2024-12-31 | Magic Leap, Inc. | Head pose mixing of audio files |
WO2017048713A1 (en) * | 2015-09-16 | 2017-03-23 | Magic Leap, Inc. | Head pose mixing of audio files |
US11438724B2 (en) | 2015-09-16 | 2022-09-06 | Magic Leap, Inc. | Head pose mixing of audio files |
US10188957B2 (en) | 2016-10-18 | 2019-01-29 | Mattel, Inc. | Toy with proximity-based interactive features |
WO2020007179A1 (en) * | 2018-07-05 | 2020-01-09 | 腾讯科技(深圳)有限公司 | Posture adjustment method and device, storage medium, and electronic device |
US12023585B2 (en) | 2018-07-05 | 2024-07-02 | Tencent Technology (Shenzhen) Company Limited | Posture adjustment method and apparatus, storage medium, and electronic device |
US10839778B1 (en) * | 2019-06-13 | 2020-11-17 | Everett Reid | Circumambient musical sensor pods system |
GB2585060A (en) * | 2019-06-27 | 2020-12-30 | Sony Interactive Entertainment Inc | Audio generation system and method |
CN113299256A (en) * | 2021-05-14 | 2021-08-24 | 上海锣钹信息科技有限公司 | MIDI digital music performance interaction method |
CN116504205A (en) * | 2023-03-01 | 2023-07-28 | 广州感音科技有限公司 | Musical performance control method, system, medium and computer |
CN116504205B (en) * | 2023-03-01 | 2023-11-24 | 广州感音科技有限公司 | Musical performance control method, system, medium and computer |
Also Published As
Publication number | Publication date |
---|---|
FI20075530A0 (en) | 2007-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009007512A1 (en) | A gesture-controlled music synthesis system | |
US9981193B2 (en) | Movement based recognition and evaluation | |
US8444464B2 (en) | Prompting a player of a dance game | |
US7589727B2 (en) | Method and apparatus for generating visual images based on musical compositions | |
US9358456B1 (en) | Dance competition game | |
US7893337B2 (en) | System and method for learning music in a computer game | |
US6541692B2 (en) | Dynamically adjustable network enabled method for playing along with music | |
JP6137935B2 (en) | Body motion evaluation apparatus, karaoke system, and program | |
US20150103019A1 (en) | Methods and Devices and Systems for Positioning Input Devices and Creating Control | |
JP5198766B2 (en) | Recording method of metronome and beat interval corresponding to tempo change | |
WO2012020242A2 (en) | An augmented reality system | |
US6878869B2 (en) | Audio signal outputting method and BGM generation method | |
WO2015194509A1 (en) | Video generation device, video generation method, program, and information storage medium | |
Fels et al. | Musikalscope: A graphical musical instrument | |
JP2001215963A (en) | Music playing device, music playing game device, and recording medium | |
JP2021140065A (en) | Processing systems, sound systems and programs | |
Petersen et al. | Musical-based interaction system for the Waseda Flutist Robot: Implementation of the visual tracking interaction module | |
Baba et al. | ''VirtualPhilharmony'': A Conducting System with Heuristics of Conducting an Orchestra | |
JP2007271739A (en) | Concert parameter display device | |
Hoang et al. | Multimodal Metronome—Rhythm game for musical instruments | |
CN108877754A (en) | System and implementation method are played in artificial intelligence music's letter | |
JP2005321514A (en) | Game machine and musical interval imparting effect sound generation program and method | |
Sourin | Music in the air with leap motion controller | |
Feitsch et al. | Tangible and body-related interaction techniques for a singing voice synthesis installation | |
WO2024190759A1 (en) | Information processing method, information processing system, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08787698 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08787698 Country of ref document: EP Kind code of ref document: A1 |