CN111046226A

CN111046226A - Music tuning method and device

Info

Publication number: CN111046226A
Application number: CN201811196608.0A
Authority: CN
Inventors: 孙浩华
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2020-04-21
Anticipated expiration: 2038-10-15
Also published as: CN111046226B

Abstract

The embodiment of the application discloses a music tuning method and device, wherein the method comprises the following steps: receiving recording data aiming at the selected song sent by a client, and identifying recording audio fingerprint characteristics contained in the recording data; determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song; performing tuning processing on the recorded data to enable the recorded data to be matched with the pitch of the target original singing audio data; and feeding back the recorded data after the tuning processing to the client. According to the technical scheme, the experience degree of the user can be improved.

Description

Music tuning method and device

Technical Field

The application relates to the technical field of internet, in particular to a music tuning method and device.

Background

With the continuous development of internet technology, more and more users begin to record songs using music-related applications (applications). Examples of such applications are QQ music, cool dog music, internet music, etc. However, most users often find the singing effect to be not satisfactory after recording the songs, and some singing problems such as running tune, breaking sound and the like usually occur.

Currently, some music production organizations typically Tune their work of singing by musicians using, for example, Auto Tune software. However, since the operation interface of such software is professional, too complicated manual operation is often required in the tuning process, which results in a higher threshold for the user to use, thereby reducing the user experience.

Disclosure of Invention

The embodiment of the application aims to provide a music tuning method and device, which can improve the experience of a user.

In order to achieve the above object, an embodiment of the present application provides a method for tuning music, including: receiving recording data aiming at the selected song sent by a client, and identifying recording audio fingerprint characteristics contained in the recording data; determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song; performing tuning processing on the recorded data to enable the recorded data to be matched with the pitch of the target original singing audio data; and feeding back the recorded data after the tuning processing to the client.

To achieve the above object, the present invention further provides a music tuning device, including a memory and a processor, the memory storing a computer program, and the computer program, when executed by the processor, implementing the following steps: receiving recording data aiming at the selected song sent by a client, and identifying recording audio fingerprint characteristics contained in the recording data; determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song; performing tuning processing on the recorded data to enable the recorded data to be matched with the pitch of the target original singing audio data; and feeding back the recorded data after the tuning processing to the client.

As can be seen from the above, in the present application, after receiving the recording data for the selected song sent by the client, the recording audio fingerprint feature contained in the recording data may be identified. Since the selected song is typically associated with at least one original song audio data, for example, a song will typically have at least one singing version, including the singing version of the original singer, the singing versions of other singers, etc., where each singing version corresponds to one original song audio data. Then, the target original audio data with the audio fingerprint characteristics matched with the recorded audio fingerprint characteristics can be determined from the original audio data. After the target original audio data is determined, the recorded data can be tuned based on the target original audio data, so that the pitch of the recorded data is matched with that of the target original audio data. At this time, recorded data after tuning processing can be fed back to the client. Therefore, when the audio represented by the recorded data after tuning is played by the client, the singing problems such as tune running, sound breaking and the like can not occur. Therefore, the user only needs to record the songs selected by the user, the tuning can be automatically completed through the technical scheme provided by the application, the method is simple and intelligent, and the threshold is lower, so that the experience degree of the user can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method of tuning music in an embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario of a music tuning method in an embodiment of the present application;

FIG. 3 is a diagram illustrating a beat debugging process in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a music tuning device in an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.

The embodiment of the application provides a music tuning method which can be applied to an independent tuning processing terminal device. The terminal device can be an electronic device with data operation, storage function, page display function and network interaction function. Specifically, the terminal device may be, for example, a desktop computer, a notebook computer, a tablet computer, and the like. The terminal device can also be software which runs in the electronic device and provides support for data processing, storage, page display and network interaction. Referring to fig. 1, the method may include the following steps.

S11: receiving recording data sent by a client aiming at the selected song, and identifying the recording audio fingerprint characteristics contained in the recording data.

In this embodiment, a song library for different songs is provided in the terminal device. The lyric library may be a data set storing lyrics and corresponding singing beats. The lyric database can adopt any one of database formats such as MySQL, Oracle, DB2, Sybase and the like. The thesaurus may be deployed on a storage medium in the terminal device. Moreover, the lyrics of a part of songs and the corresponding singing beat data can be downloaded from the terminal equipment through the client and stored in the cache of the client.

In this embodiment, the client may be an electronic device having a recording function and a shooting function. Specifically, the client may be, for example, a tablet computer, a notebook computer, a smart phone, a smart wearable device, and the like. Alternatively, the client may be software capable of running in the electronic device. The client can be provided with a communication module and can be in communication connection with remote terminal equipment to realize data transmission with the terminal equipment.

In this embodiment, the client may present a link of the song to be selected to the user. The link may be a text link or a picture link. At this time, if the user wants to sing and record a certain song, the user can click the link of the song, so that the user can select the song that the user wants to sing. And after the user clicks the text link of the song name, the client sends a lyric loading request comprising a lyric identifier and a beat identifier to the terminal equipment. The lyric identification is used for identifying the lyrics of the song, and the beat identification is used for identifying singing beat data of the song. After receiving the lyric loading request, the terminal device may extract the lyric identification and the beat identification from the lyric loading request. After the lyric identifier and the beat identifier are extracted, the terminal device may read the lyrics with the lyric identifier and the singing beat data with the beat identifier from the lyric library, and feed back the lyrics and the singing beat data corresponding to the song to the client. Therefore, the client can display the lyrics corresponding to the song to the user, and when the song starts to be recorded, the client can also display the lyrics to the user in a word-by-word or line-by-line rolling mode according to the singing beat data, so that the user can sing along with the displayed rhythm of the lyrics, and meanwhile, the client can record the audio data sung by the user through the loaded microphone or record the video data sung by the user through the loaded microphone and the front camera. In this embodiment, the lyric loading request may be a character string written according to a preset rule. Wherein the preset rule may be a network communication protocol followed between the client and the terminal device. For example, the lyric load request may be a string written according to the HTTP protocol. The preset rules may define the components in the lyric loading request and the order of arrangement between the components.

In this embodiment, in practical applications, the client does not usually play the accompaniment music of the selected song while scrolling and displaying the lyrics to the user word by word or line by line. Thus, the recording data may include audio data for representing the user's singing content or video data for representing the user's singing content and the captured user's singing frame, and may not include accompaniment music data for representing accompaniment music of the user-selected song. The recorded audio data may be complete recorded audio data of the selected song, or recorded audio data of a segment of a specified time segment of the selected song. For example, the segment recorded audio data may be recorded audio data for a refrain portion of the selected song. The specified time slice can also be set according to the actual application situation, and is not limited herein.

In this embodiment, after the client records the recording data of the song selected by the user, the client may display, to the user, a playing control for playing the user's chorus content represented by the recording data, a re-recording control for re-recording the recording data, and a tuning control for tuning the recording data. Therefore, after the song is recorded, the user can play back the audio or video singing by clicking the play control. And if the user feels bad, the user can also send a tuning request comprising the recording data and the song identification to the terminal equipment by clicking the re-recording control and then re-recording or clicking the tuning control. Wherein the song identification may be used to identify the original audio data associated with the selected song. Upon receiving the tuning request, the terminal device may receive recording data for the selected song from the client. Specifically, after receiving the tuning request, the terminal device may obtain recording data for the selected song from the tuning request, and may extract the song identification. In this embodiment, an audio database may also be provided in the terminal device. The audio database may be a data set storing audio data. The audio database may be in any one of the database formats of MySQL, Oracle, DB2, Sybase, etc. The audio database may be disposed on a storage medium in the terminal device. Wherein, at least one original singing audio data aiming at one song can be stored in the audio database. For example, a song will typically have at least one version of singing, including the original singer's version of singing, a copy of other singers' versions of singing, etc., where each version of singing corresponds to an original audio data. Therefore, after the song identification is extracted, the terminal equipment can read the original audio data with the song identification from the audio database. In this embodiment, each original audio data in the audio database may have its own song identifier. The song identification and the original audio data can be stored in a key-value (key value pair) mode, so that the corresponding original audio data can be acquired from the audio database through the provided song identification.

In this embodiment, in practical applications, the recorded audio data and the original audio data in the audio database both have corresponding audio fingerprint features. In order to facilitate subsequent tuning of the recorded data, after receiving the recorded data for the selected song sent by the client, the terminal device may further identify recorded audio fingerprint features included in the recorded data, so that target original singing audio data with audio fingerprint features matching the recorded audio fingerprint features are determined from at least one original singing audio data associated with the selected song subsequently according to the identified recorded audio fingerprint features, and tuning of the recorded data is performed according to the target original singing audio data. Specifically, if the recorded data is recorded audio data or segment recorded audio data, the recorded audio fingerprint feature or the segment recorded audio fingerprint feature included in the recorded audio data or the segment recorded audio data may be directly identified. For example, after receiving the recorded audio data, the recorded audio data may be converted from a time domain to a frequency domain to obtain recorded audio data of the frequency domain, and the recorded audio data of the frequency domain may be converted to a Bark domain with reference to a preset frequency interval to obtain a plurality of frequency domain subbands. Wherein each frequency domain subband corresponds to one frequency interval. Finally, the sub-band energy value of each sub-band can be calculated, and the sub-band energy value of each sub-band is used as the recording audio fingerprint feature. In this embodiment, the predetermined frequency interval may be set according to practical applications, for example, referring to the human auditory system, and the predetermined frequency interval may include 300 hz to 2000 hz. If the recorded data is recorded video data, the recorded picture data and the recorded audio data can be separated from the recorded video data. Specifically, since the recorded video data is usually in a specified encapsulation format, for example, FLV format, MP4 format, or MKV format, it is necessary to decapsulate the recorded video data to obtain audio compressed data and picture compressed data, and then perform audio decoding and picture decoding on the audio compressed data and the picture compressed data, respectively, to obtain the recorded audio data and the recorded picture data. After separating the recorded audio data, the recorded audio fingerprint features contained in the recorded video data may be identified in the manner described above.

S13: and determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song.

In this embodiment, in order to facilitate tuning of the recorded data, after receiving the recorded data for the selected song sent by the client and identifying the recorded audio fingerprint feature included in the recorded data, the terminal device may determine, from at least one original audio data associated with the selected song, a target original audio data whose audio fingerprint feature matches the recorded audio fingerprint feature. In practical application, an audio fingerprint database can be provided in the terminal device. The audio fingerprint database may be a data set storing audio fingerprint features. The audio database may be in any one of the database formats of MySQL, Oracle, DB2, Sybase, etc. The audio fingerprint database may also be disposed on a storage medium in the terminal device. The audio fingerprint database can store audio fingerprint characteristics bound by original audio data, binding identifiers and fingerprint characteristic identifiers. The binding identifier may be used to identify the audio fingerprint feature of the original audio data binding. The fingerprint identifier may be used to identify the audio fingerprint. The tuning request sent by the client terminal can also comprise the binding identifier and the fingerprint feature identifier. In this way, after receiving a tuning request sent by the client, the terminal device may not only read at least one original song audio data of the selected song through a song identifier included in the tuning request, but also extract the binding identifier and the fingerprint feature identifier from the tuning request. After the binding identifier and the fingerprint feature identifier are extracted, the terminal device can read the audio fingerprint features with the binding identifier and the fingerprint feature identifier from the audio database, so that the audio fingerprint features bound by the original audio data can be obtained.

After obtaining the audio fingerprint characteristics bound to the original audio data, the terminal device may determine, from at least one original audio data associated with the selected song, target original audio data whose audio fingerprint characteristics match the recorded audio fingerprint characteristics. Specifically, if the recording data is recording audio data, the terminal device may calculate, for at least one original audio data associated with the selected song, a similarity between the recording audio fingerprint feature and an audio fingerprint feature bound to each original audio data. And then, the original audio data associated with the audio fingerprint features corresponding to the maximum similarity can be used as the target original audio data. If the recording data is segment recording audio data, the terminal device may first acquire segment original singing audio data in the specified time segment in the original singing audio data and acquire segment audio fingerprint features in the specified time segment from audio fingerprint features of the original singing audio data for at least one original singing audio data associated with the selected song. The original audio data of the segment whose audio fingerprint characteristics match the recorded audio fingerprint characteristics of the segment identified in step S11 may then be used as the target original audio data. If the recorded data is recorded video data, the terminal device may calculate, for at least one original audio data associated with the selected song, a similarity between the recorded audio data separated from the recorded video data and an audio fingerprint feature bound to each original audio data. And then, the original audio data associated with the audio fingerprint features corresponding to the maximum similarity can be used as the target original audio data.

S15: and carrying out tuning processing on the recorded data so as to enable the recorded data to be matched with the pitch of the target original singing audio data.

In this embodiment, in order to solve the problem of singing such as running tune and breaking tune occurring in the recording of the singing work for the selected song by the user, after receiving the recording data for the selected song sent by the client and determining the target original singing audio data with the audio fingerprint characteristics matched with the recording audio fingerprint characteristics from at least one original singing audio data associated with the selected song, the terminal device may further perform tuning processing on the recording data according to the target original singing audio data, so that the pitch of the recording data is matched with that of the target original singing audio data. Therefore, the singing works represented by the recording data after tuning processing can avoid the singing problems such as tone running, sound breaking and the like. Specifically, if the recorded data is recorded audio data or segment recorded audio data, the terminal device may first determine whether a pitch of a specified singing moment in the recorded audio data or the segment recorded audio data is the same as a pitch of the specified singing moment in the target original audio data. If the recorded data is recorded video data, the terminal device may first determine whether a pitch of a designated singing moment in recorded audio data separated from the recorded video data is the same as a pitch of the designated singing moment in the target original audio data. Wherein the specified singing moment may be used to represent any singing moment in the recorded data or the target original audio data. If the pitch of the designated singing moment in the recorded data is different from the pitch of the designated singing moment in the target original audio data, the pitch of the designated singing moment in the target original audio data can be used as the pitch of the designated singing moment in the recorded data. For example, a user records a singing work for a song with higher difficulty, and always sings in a high-pitched part, even breaks occur. In this case, the pitch of each singing moment corresponding to the high pitch part in the target original singing audio data can be respectively used as the pitch of each singing moment corresponding to the high pitch part in the recorded audio data. Thus, the shortage of singing when the user records can be made up.

In an embodiment, if the recorded data is recorded video data, after tuning the recorded audio data separated from the recorded video data, the terminal device may further perform picture and sound merging on the recorded picture data separated from the recorded video data and the recorded audio data after tuning to obtain the tuned recorded video data. Specifically, for a specified time node in different time nodes of the recorded video data before the tuning process, the picture data at the specified time node in the recorded picture data and the audio data at the specified time node in the recorded audio data after the tuning process are subjected to the sound-picture merging process, so that the video data at the specified time node after the tuning process can be obtained. Then, for different time nodes, video data at different time nodes after tuning processing can be obtained, and thus, recorded video data after tuning processing can be obtained.

S17: and feeding back the recorded data after the tuning processing to the client.

In this embodiment, after the tuning processing is performed on the recorded data, the terminal device feeds back the tuned recorded data to the client through a remote connection established with the client. In this way, after receiving the recording data after the tuning processing, the client may play the audio or video represented by the recording data after the tuning processing to the user.

In an embodiment of the application, in practical application, because a user does not have accompaniment music during recording, when enjoying a work sung by himself, the user often wants to have corresponding accompaniment music. In order to meet such a demand of the user, the terminal device may further determine appropriate accompaniment music data after the tuning process is performed on the recording data. Specifically, the terminal device may further provide an accompaniment music database. The accompaniment music database may be a data set storing accompaniment music data and a correspondence relationship between the original audio data and the accompaniment music data. The accompaniment music database may adopt any one of database formats such as MySQL, Oracle, DB2, Sybase, and the like. The accompaniment music database may also be disposed on a storage medium in the terminal device. Therefore, after the recorded data is subjected to tuning processing, the terminal equipment can read accompaniment music data corresponding to the target original singing audio data from the audio database according to the determined target original singing audio data and the corresponding relation between the original singing audio data and the accompaniment music data, and can take the accompaniment music data corresponding to the target original singing audio data as the accompaniment music data corresponding to the recorded data after the tuning processing. And then, the recording data after the tuning processing and the accompaniment music data corresponding to the recording data after the tuning processing can be subjected to the note merging processing to obtain the recording data after the note merging processing. Therefore, after the recorded data after the note and note merging processing is fed back to the client subsequently, the client can simultaneously play the recorded audio represented by the recorded data after the note processing and the accompaniment music represented by the corresponding accompaniment music data to the user.

In an embodiment of the application, in order to facilitate a user to compare and enjoy recorded data before tuning processing and recorded data after tuning processing, after the terminal device feeds back the recorded data after tuning processing to the client, the client may simultaneously show the recorded data before tuning processing and the recorded data after tuning processing to the user. For example, if the recorded data is recorded video data, the client may simultaneously display the recorded video data before the tuning process and the recorded video data after the tuning process to the user in a multi-grid video display manner, so that the user can simultaneously compare and enjoy differences between the recorded data before the tuning process and the recorded data after the tuning process.

In a specific application scenario, the client may be a smart phone. The user can record video data aiming at the selected song through the front-facing camera and the microphone on the smart phone. As shown in fig. 2, a user may start a certain song recording application on the smart phone, and at this time, links of different songs may be displayed on the song recording application, for example, the links may be text links with song names "scatter and go", "talk really", "learn cat call", and the user may select a song to record according to his/her own will. For example, the user clicks a text link with a song name "say true", and then the client may send a lyric loading request including a lyric id and a beat id to the terminal device. After receiving the lyric loading request, the terminal device may extract the lyric identification and the beat identification from the lyric loading request. After the lyric identifier and the beat identifier are extracted, the terminal device may read the lyrics with the lyric identifier and the singing beat data with the beat identifier from the lyric library, and feed back the lyrics and the singing beat data corresponding to the song to the client. Therefore, the client can display the lyrics corresponding to the song to the user, and after the user clicks the recording starting control displayed on the song recording application, the displayed lyrics can be scrolled word by word or line by line along with the rhythm displayed by the lyrics to sing. And after the recording is finished, clicking a recording ending control displayed on the song recording application by the user. At this time, the recording application may present, to the user, a re-recording control for re-recording the recorded data, a tuning control for tuning processing, and a playing control for playing the currently recorded data. And if the user clicks the tuning control, sending a tuning request comprising the recording data and the song identifier to the terminal equipment. When receiving the tuning request, the terminal device may receive recorded video data for the selected song sent by the client, may separate recorded picture data and recorded audio data from the recorded video data, and may then identify recorded audio fingerprint features included in the recorded audio data. After identifying the recorded audio fingerprint features contained in the recorded audio data, target original audio data with audio fingerprint features matching the recorded audio fingerprint features may be determined from at least one original audio data associated with the selected song by using the song identifier extracted from the tuning request. Then, the recorded audio data can be tuned according to the target original audio data, so that the pitch of the recorded audio data is matched with that of the target original audio data. Then, according to the corresponding relationship between the original singing audio data and the accompaniment music data, determining the accompaniment music data corresponding to the recording data after tuning, performing note merging processing on the recording audio data after tuning and the accompaniment music data corresponding to the recording audio data after tuning, and then performing note merging processing on the recording picture data separated from the recording video data and the recording audio data after note merging processing to obtain the recording video data after tuning. And finally, feeding back the recorded video data after the tuning processing to the smart phone. After receiving the recorded video data after the tuning processing, the smart phone can simultaneously display the recorded video data before the tuning processing and the recorded video data after the tuning processing to a user in a multi-grid video display mode. The recorded video data before the tuning processing is original voice without accompaniment, and the recorded video data after the tuning processing is tuned voice with accompaniment. Therefore, the user only needs to record the song selected by the user, the tuning can be automatically completed through the song recording application, the operation is simple and intelligent, and the threshold is low, so that the user experience can be improved.

In an embodiment of the present application, in practical applications, during recording audio, some noise may be usually accompanied in the surrounding environment of the user, which may cause the recorded data to introduce noise during recording. After noise is introduced, the signal-to-noise ratio of the recorded data may be low, thereby affecting subsequent tuning processes. In this case, in order to improve the signal-to-noise ratio of the recorded data, after receiving the recorded data for the selected song sent by the client, the terminal device may further perform smoothing filtering on the recorded data, and filter out noise in the recorded data, so that the signal-to-noise ratio of the smoothly filtered recorded data is greater than or equal to a specified signal-to-noise ratio threshold. The value range of the specified signal-to-noise ratio threshold value can include 80-100 percent, and can be specifically set according to actual conditions. And then, the recorded data after the smooth filtering processing can replace the recorded data before the smooth filtering processing, so that the recorded data after the smooth filtering processing can be subjected to tuning processing subsequently. In practical applications, the smoothing filtering process may include a plurality of manners, such as neighborhood average filtering, median filtering, gaussian filtering, and frequency domain filtering.

In one embodiment of the present application, when recording a singing work, the tempo of singing by the user may not be synchronized with the singing tempo of the song selected by the user, which may affect the subsequent tuning process of the recorded data. In this case, after receiving the recording data for the selected song sent by the client, the terminal device may further perform beat debugging processing on the recording data, so that the singing beat in the recording data after the beat debugging processing is synchronized with the singing beat in the singing beat data of the selected song. For example, when the user records the audio of a song with a "speak true" song title, as shown in fig. 3, if the song lyric part "speak true" will not be faster than the corresponding beat B in the song by 0.1 second for the recorded audio data, the beat a of the lyric "speak" in the recorded audio data may be extended by 0.1 second to obtain the beat a1 of the lyric "speak", and then the beat a1 of the lyric "speak" is truncated by 0.1 second, i.e., the part of the beat in the dashed box in fig. 3, to obtain the beat a2 of the lyric "speak", at which time, the beat a2 of the lyric "speak" in the recorded audio data is synchronized with the corresponding beat B in the song. The recorded data after the beat debugging process may then be substituted for the recorded data before the beat debugging process. Therefore, the recorded data after the beat debugging processing can be subjected to tuning processing in the following process.

In an embodiment of the present application, in practical applications, after the recorded data is subjected to tuning processing, part of users may want to know the audio difference before the tuning processing and after the tuning processing, so as to improve the disadvantages of singing themselves. In order to meet such a demand of the user, after the recorded data is subjected to the tuning process, a data presentation page may be displayed to the user. The data display page may be used to display recorded data before tuning processing and recorded data after tuning processing. The data display page can comprise a pre-tuning play control and a post-tuning play control. The pre-tuning play control may be disposed in the vicinity of an area where recorded data before tuning processing is presented. The post-tuning play control may be disposed near an area where recorded data after tuning processing is presented. When the pre-tuning play control is triggered, the audio represented by the part of the recorded data to be tuned in the recorded data before tuning can be played. And when the post-tuning play control is triggered, playing audio represented by partial recorded data in a time segment corresponding to the partial recorded data to be tuned in the recorded data after tuning.

For example, on a data presentation page on which recorded data before tuning processing and recorded data after tuning processing are presented, part of the recorded data to be tuned and part of the recorded data not requiring tuning processing in the recorded data before tuning processing may be displayed in different colors, so as to distinguish the two recorded data. Similarly, the corresponding part of the recorded data after the tuning process and the part of the recorded data without the tuning process in the recorded data after the tuning process can be displayed by different colors. In a general case, since there may be a plurality of partial recorded data to be tuned in the recorded data before tuning, a corresponding pre-tuning play control may be respectively set near a display area of each partial recorded data to be tuned, so that when a user clicks a nearby pre-tuning play control, an audio represented by the corresponding partial recorded data to be tuned near the pre-tuning play control in the recorded data before tuning may be played to the user. Similarly, a plurality of post-tuning play controls can be arranged at corresponding positions near the recording data area after the tuning processing is displayed, so that when a user clicks the post-tuning play controls at the corresponding positions, the audio represented by the partial recording data in the time segment corresponding to the partial recording data to be tuned in the recording data after the tuning processing can be played to the user. So that it is possible to facilitate the user to know the difference in audio before and after each tuning process.

In one embodiment of the present application, the terminal device may also be a system architecture formed by a client and a server. The client may be an electronic device having a recording function and a photographing function. Specifically, the client may be, for example, a tablet computer, a notebook computer, a smart phone, a smart wearable device, and the like. Alternatively, the client may be software capable of running in the electronic device. The client can be provided with a communication module and can be in communication connection with a remote server to realize data transmission with the server. The server may be a device that stores audio data. Specifically, the server may be an electronic device having data operation, storage function and network interaction function; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers. In some application scenarios, one side of the client can send recorded audio data to the server in real time, one side of the server performs tuning processing, the recorded audio data after tuning processing can be fed back to the client, and the client can display the data display page to a user. In the embodiment of the server-side processing, music tuning and other processing are executed by the server side, the processing speed is generally higher than that of the client side, the processing pressure of the client side can be reduced, and the tuning processing speed can be increased. Of course, this specification does not exclude that all or part of the above processing is implemented by the client side in other embodiments, for example, the client side performs real-time tuning processing.

In this embodiment, the functions implemented in the above method steps may be implemented by a computer program, and the computer program may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read the computer program from the computer storage medium. The computer program, when executed by a processor, may perform the following functions:

s11: receiving recording data aiming at the selected song sent by a client, and identifying recording audio fingerprint characteristics contained in the recording data;

s13: determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song;

s15: performing tuning processing on the recorded data to enable the recorded data to be matched with the pitch of the target original singing audio data;

In one embodiment, the computer program, when executed by the processor, further implements the steps of:

carrying out sound and picture combination processing on the recorded picture data and the recorded audio data after the sound adjustment processing to obtain recorded video data after the sound adjustment processing;

and correspondingly, feeding back the recorded video data after the tuning processing to the client.

In one embodiment, the memory is further configured to store an audio fingerprint database; wherein, the audio fingerprint database comprises the audio fingerprint characteristics bound with the original singing audio data; the target original audio data is determined according to the following steps:

calculating the similarity between the recorded audio fingerprint characteristics and the audio fingerprint characteristics bound to the original audio data;

and taking the original singing audio data associated with the audio fingerprint features corresponding to the maximum similarity as the target original singing audio data.

In one embodiment, when the computer program is executed by the processor, tuning the recorded data includes:

judging whether the pitch of the appointed singing moment in the recorded data is the same as the pitch of the appointed singing moment in the target original audio data or not;

if the target original singing voice data is different from the target original singing voice data, the pitch of the designated singing time in the target original singing voice data is used as the pitch of the designated singing time in the recording data.

performing smoothing filtering processing on the recorded data to enable the signal-to-noise ratio of the recorded data after smoothing filtering processing to be larger than or equal to a specified signal-to-noise ratio threshold value;

and replacing the recorded data before the smooth filtering processing with the recorded data after the smooth filtering processing.

In one embodiment, the memory is further configured to store singing tempo data of the selected song; the computer program, when executed by the processor, further implements the steps of:

performing beat debugging processing on the recorded data so as to enable the singing beats in the recorded data after the beat debugging processing to be synchronous with the singing beats in the singing beat data of the selected song;

and replacing the recorded data before the beat debugging processing with the recorded data after the beat debugging processing.

In one embodiment, the memory is further configured to store a corresponding relationship between the original audio data and the accompaniment music data; the computer program, when executed by the processor, further implements the steps of:

determining accompaniment music data corresponding to the target original singing audio data according to the corresponding relation, and taking the accompaniment music data corresponding to the target original singing audio data as accompaniment music data corresponding to recording data after tuning processing;

performing note merging processing on the recorded data after the tuning processing and accompaniment music data corresponding to the recorded data after the tuning processing;

and correspondingly, feeding back the recorded data after the ensemble merging processing to the client.

displaying a data display page; the data display page is used for displaying the recorded data before tuning processing and the recorded data after tuning processing; the data display page comprises a pre-tuning play control and a post-tuning play control;

when the pre-tuning play control is triggered, playing an audio represented by part of recorded data to be tuned in the recorded data before tuning;

and when the post-tuning play control is triggered, playing audio represented by partial recorded data in a time segment corresponding to the partial recorded data to be tuned in the recorded data after tuning.

It should be noted that, the functions that can be realized by the computer program in the computer storage medium can all refer to the foregoing method implementation embodiments, and the technical effects achieved are also similar to the technical effects achieved in the foregoing method implementation embodiments, and are not described here again.

Referring to fig. 4, the present application also provides a music tuning device, which may include a memory for storing a computer program and a processor, wherein when the computer program is executed by the processor, the following steps are implemented:

In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.

The specific functions of the device, the memory thereof, and the processor thereof provided in the embodiments of this specification can be explained in comparison with the foregoing embodiments in this specification, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

Those skilled in the art will also appreciate that, in addition to implementing a client, server as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the client, server are in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a client, server may be considered as a hardware component, and the means included therein for implementing various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, embodiments of the computer storage medium, the auxiliary bedding device, the server and the client may all be explained with reference to the introduction of embodiments of the method described above.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A method of tuning music, the method comprising:

receiving recording data aiming at the selected song sent by a client, and identifying recording audio fingerprint characteristics contained in the recording data;

determining target original singing audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song;

performing tuning processing on the recorded data to enable the recorded data to be matched with the pitch of the target original singing audio data;

and feeding back the recorded data after the tuning processing to the client.

2. The method of claim 1, wherein said recording data comprises recording video data; identifying recorded audio fingerprint features contained in the recorded data, including:

separating recorded picture data and recorded audio data from the recorded video data;

identifying recorded audio fingerprint features contained in the recorded video data;

correspondingly, the recorded audio data are subjected to tuning processing, so that the recorded audio data are matched with the pitch of the target original singing audio data.

3. The method of claim 2, wherein after tuning the recorded audio data, the method further comprises:

4. The method according to claim 1, characterized in that an audio fingerprint database is provided; wherein, the audio fingerprint database comprises the audio fingerprint characteristics bound with the original singing audio data; the target original audio data is determined according to the following steps:

5. The method of claim 1, wherein tuning the recorded data comprises:

6. The method of claim 1, wherein after receiving recording data for the selected song from the client, the method further comprises:

7. The method according to claim 1, wherein singing tempo data of the selected song is provided; after receiving the recording data for the selected song sent by the client, the method further comprises:

8. The method according to claim 1, wherein there is provided a correspondence of the original vocal audio data and accompaniment music data; after the recorded data is subjected to tuning processing, the method further comprises:

9. The method of claim 1, wherein after tuning the recorded data, the method further comprises:

10. A tuning device for music, the device comprising a memory for storing a computer program and a processor, the computer program when executed by the processor effecting the steps of:

and feeding back the recorded data after the tuning processing to the client.

11. The apparatus of claim 10, wherein the recorded data comprises recorded video data; the computer program, when executed by the processor, identifying recorded audio fingerprint features contained in the recorded data comprises:

12. The apparatus of claim 11, wherein the computer program, when executed by the processor, further performs the steps of:

13. The apparatus of claim 10, wherein the memory is further configured to store an audio fingerprint database; wherein, the audio fingerprint database comprises the audio fingerprint characteristics bound with the original singing audio data; the target original audio data is determined according to the following steps:

14. The apparatus of claim 10, wherein the computer program, when executed by the processor, to tune the recorded data comprises:

15. The apparatus of claim 10, wherein the computer program, when executed by the processor, further performs the steps of:

16. The apparatus of claim 10, wherein the memory is further configured to store singing tempo data for the selected song; the computer program, when executed by the processor, further implements the steps of:

17. The apparatus of claim 10, wherein the memory is further configured to store a correspondence between the original audio data and accompaniment music data; the computer program, when executed by the processor, further implements the steps of:

18. The apparatus of claim 10, wherein the computer program, when executed by the processor, further performs the steps of: