CN103561104B

CN103561104B - Smart mobile phone speech control system and audio recognition method thereof

Info

Publication number: CN103561104B
Application number: CN201310556946.1A
Authority: CN
Inventors: 杨喆
Original assignee: Beijing Beny Wave Science and Technology Co Ltd
Current assignee: Beijing Beny Wave Science and Technology Co Ltd
Priority date: 2013-11-11
Filing date: 2013-11-11
Publication date: 2016-08-17
Anticipated expiration: 2033-11-11
Also published as: CN103561104A

Abstract

A kind of smart mobile phone speech control system, including vehicle carried mobile phone system, server receives system, SMS message service system, Wap system and Web system.Internet network technology, in order to obtain the pictorial informations such as position, is combined with GIS map platform, GPRS and radio network technique, provides warning message service by Web application service system and Wap application service system to client by this control system.This control system convenient and swift various comprehensively, can to car owner bring great convenience and safety, monitoring, antitheft aspect also have wide application.

Description

Smart mobile phone speech control system and audio recognition method thereof

Technical field

The present invention relates to a kind of mobile phone speech control system and audio recognition method thereof.

Background technology

In prior art, vehicle is saved oneself, alarm monitoring system does not combines with intelligent mobile terminal (mobile phone), thus right Weak point is had from the point of view of on convenient, fast, various, comprehensive.

Summary of the invention

In order to solve the problems referred to above, the invention provides a kind of mobile phone speech control system and audio recognition method thereof, it incorporates Several technology popular in information technology at present, speech recognition, GPS, GPRS and GIS platform, rely on wireless network skill Art and Internet technology so that car owner saves oneself at vehicle, report to the police while can follow the tracks of position and the speed etc. of vehicle at any time Related data.

The technical solution adopted for the present invention to solve the technical problems is:

On the one hand, the invention provides a kind of smart mobile phone speech control system, including vehicle carried mobile phone system, server receives system System, SMS message service system, Wap system and Web system；

Vehicle carried mobile phone system has intelligent sound storehouse, and after user says natural language, the voice signal of user is turned by intelligent sound storehouse It is changed to word message and returns result, then this Word message being sent to server and receives system；

Server receives system, receives after the message of vehicle carried mobile phone system, connects generalized information system, and by place for vehicle institute at present The coordinate put is supplied to generalized information system, generalized information system retrieval map data, and with the form of word and picture, result is fed back to server, Server receives system and is stored in data base by written historical materials, and picture information is stored in the user information file folder of correspondence；

SMS message service system is after receiving the notice receiving system from server, and go-on-go goes out the data of user, and combines described The service URL address that word message and Wap system provide, SMS message service system is by short with mobile phone for the URL address finally combined The mode of letter is sent to user by described vehicle carried mobile phone system；

When user receives after the note of vehicle carried mobile phone system, according to the URL address comprised in note, come by mobile phone browser Connect and access Wap server system, download related data, Wap server system by data with according to the text after time-sequencing Tabular form passes to user mobile phone, in user's selective listing certain connect can check the current region of vehicle and other Relevant information；And

User can log in Web server system by computer system thus check related data, and Web server system Provide history management function.

According to above-mentioned smart mobile phone speech control system, after wherein user says natural language, included in vehicle carried mobile phone system First mobile terminal of mobile telephone calls voice control and carries out recording and intelligence editing, then forwards speech data in mobile phone background monitoring The heart, mobile phone background monitoring center processes after obtaining speech data, is then back to the result after processing and presents to cellphone subscriber, User is sent to server described Word message again after confirming speech processes result and receives system.

According to above-mentioned smart mobile phone speech control system, wherein URL address comprises user account number and encrypted message.

According to above-mentioned smart mobile phone speech control system, the natural language wherein said according to user when described intelligent sound storehouse judges Going out user when being in state of intoxication, mobile phone terminal can carry out warning prompt or realize automatic-aid.

According to above-mentioned smart mobile phone speech control system, wherein said automatic-aid includes that phone automatic short message or call are pre- The number first set and automatic alarm.

On the other hand, the invention provides a kind of audio recognition method in smart mobile phone speech control system, including voice The training stage of signal and cognitive phase, had the training stage before carrying out speech recognition, will represent voice in the training stage After the characteristic parameter of feature processes, set up a model for each entry, save as reference template；In speech recognition period, Voice signal obtains speech characteristic parameter through identical passage, generates test template, mates with reference template, will coupling It is worth the highest reference template as recognition result.

According to above-mentioned audio recognition method, end-point detection algorithm is used to determine the beginning and end of voice, reference template and test template The characteristic vector of same type, identical frame length, identical window function is used to move with identical frame;Test and reference template are used respectively T and R represents, in order to compare the matching degree between them, calculates distance D [T, R] between them, and distance is the least, thinks them Between matching value the highest.

According to above-mentioned audio recognition method, if reference template features vector sequence is a1, a2 ..., am；Input phonetic feature is vowed Amount sequence is b1, b2 ..., bn, wherein meet time rule positive function m=w (n) between m and n, and this w meets equation below:

D = \min_{w (n)} Σ_{n = 1}^{N} d [n, w (n)]

In formula, d [n, w (n)] is n-th frame input vector and the distance of m frame reference vector, and D is corresponding to optimal time rule just lower two The distance measure of template.

Above-mentioned audio recognition method, uses the floating-point operation in the Implementation on Fixed Point DSP speech recognition of 16b, higher in required precision Place, use 32b or 48b to represent the intermediate variable of calculating, or use pseudo-floating point to represent floating number, wherein Pseudo-floating point refers to represent floating number by the method for mantissa+index.

The invention has the beneficial effects as follows: fully combine the application skills such as communication, information, control, digital product and computer network Art, therefore it is saved oneself, the function of warning system can also expand, and is not only able to show visual pattern and various order and car Position, it is also possible to realize the functions such as voice transfer, ambient conditions typing storage further, therefore along with cell-phone function powerfulization, should With generalization, the convenient and swift various of it will bring great convenience and safety to car owner comprehensively.

Accompanying drawing explanation

The present invention is further described with embodiment below in conjunction with the accompanying drawings.

Fig. 1 is smart mobile phone speech control system structural representation.

Fig. 2 is smart mobile phone Voice command schematic flow sheet.

Detailed description of the invention

Owing to mobile phone is affected by terminal self, there is serious limitation in input mode, cellphone subscriber is for convenient input Demand grow with each passing day.Meanwhile, the customer service of mobile phone terminal carrying and business tine, increasingly enrich user for quick inspection The requirement of rope is more and more higher.So, the mode of user's voice goes inquiry, orders service and business, and cell phone system is by semanteme Understand and help user quickly to position required service or required help, extremely important particularly with the car owner in driving.

Vehicle carried mobile phone intelligent sound knowledge base has " can listen ", " can understand " two highly important functions.First, " can listen " It is speech analysis and transcription, may recognize that the information such as the content of cellphone subscriber's voice, speaker, languages.Can allow cellphone subscriber Easily freely say the demand of oneself, that is to say that user can express the information of more horn of plenty, allow mobile phone become alternately with Surveillance center The most natural, convenient.Secondly, " can understand " is i.e. semantic understanding, the semanteme of intellectual analysis user, provides the user intelligence Service, enables the quick, intelligent feedback processing cellphone subscriber of mobile phone terminal.

Use the design of the mobile phone terminal intelligent sound service of the present invention, may have the advantage that

1, this phonetic entry makes up the deficiency of tradition input, makes cellphone subscriber use during driving freer；

2, the intelligent semantic analysis in mobile phone, it is accurately positioned the demand of user and required help, returns result；

3, it is provided that information input mode more easily；

4, the user in driving procedure can be helped to be easily accomplished required with the loaded down with trivial details information of Auto-writing.

5, the specific demand of digging user, according to the semanteme of user, the state of fuzzy Judgment user, the shape after such as user is drunk During state, speaking incoherently, or the when of speaking indiscreetly words, mobile phone terminal can carry out warning prompt, or realizes the automatic of mobile phone terminal The functions such as relief, the friend set before such as phone automatic short message or call or relatives, and automatic alarm.

What intelligent mobile terminal (mobile phone) and external mobile device typically used is all radio network technique, for GPS, GPRS Technology Application comparison is many.System in the present invention is in order to obtain the pictorial informations such as position, by Internet network technology and GIS Map platform, GPRS and radio network technique combine, by Web application service system and Wap application service system to Client provides warning message service.

Main functional modules in this system realizes:

(1) the vehicle carried mobile phone system of intelligent mobile phone terminal, voice based on existing smart mobile phone and sms, wap and gps are included Technology, coordinates intelligent sound and the understanding to natural language of the intelligent sound storehouse, before providing on the basis of existing smart mobile phone Section speech-sound intelligent service, effectively promotes mobile phone speech function and the service ability of mobile phone terminal and efficiency；This vehicle carried mobile phone system, Receive with GPS technology and GPRS technology and send message, transmitting the message to web server and wap server.

(2) server receives system, receives after the saving oneself of vehicle carried mobile phone system, warning message, connects GIS map platform, And the coordinate of current for vehicle position is supplied to generalized information system, generalized information system retrieves map data, and by result with word and figure The form of sheet feeds back to server, and written historical materials is stored in data base by server, picture information is stored in client's money of correspondence In material file.

(3) SMS message service system, SMS is receiving after the notice of server, and go-on-go goes out the data of user, and combines police Accuse message, and the service URL address that Wap system provides, URL address comprises the relevant information such as user account number, password.SMS The URL address finally combined is sent to client in the way of SMS by message service system.

(4) Wap system, when user receives after the saving oneself of onboard system, alarming short message, directly selects the URL comprised in note Address, connects and accesses Wap server system by mobile phone browser, downloads warning related data, and Wap server system will Data is to pass to Client handset according to the text list form after time-sequencing；Certain connection in customer selecting list can be looked into See the current region of vehicle and other relevant informations.

(5) Web system, if user can log in Web server system by computer system, can be connected by Internet Access Web server system, check alarm information by Web server system, and Web server system provides history Record management function, includes SMS message historical record, and remaining cost is counted and checked, mobile phone log in Wap System History record with And customer data amendment etc..

Below the function of each system is described in further details:

Server receives system

Its function mainly has: receives the audio alert information from mobile phone car loading system, and translates into the discernible intelligent sound of mobile phone Signal carries out processing and connecting generalized information system and obtain the related datas such as car body position.This functional realiey can be divided into voice transfer, turn over Decoding, reception service, connection generalized information system, reception GIS feedback, customer data center 6 part of data on file to correspondence. The realization of this system is completed by multithreading, the most just can receive the report that multiple user mobile phone onboard system is sent simultaneously Alarm message meets the connection requirement of multiple user.It is as follows that it realizes process prescription:

Monitor the port specified

while(true)

{ receive user mobile phone voice signal；

Mobile phone semantics recognition software and Intelligence repository combine judges that the demand of user carries out process and is sent to server end and carries out Feedback information；

Receive the connected reference from client；

Start client and access thread to process associative operation.

}

Client processes thread and describes:

{ obtain the data provided from onboard system；

If (connects generalized information system successful connection)

Send the data such as coordinate to GIS platform；

If (reads the response success of GIS)

{ if (connects data base)

Write data to data base

If (picture file does not exists)

Write image data to specified file

}

If (operates successfully) above

Notice SMS system sends a message to user mobile phone

}

SMS message service system

This system is connected with the SMS platform of company of telecom operators, is the intermediary service part sending message, can be sent out by this service Appointment message content, SMS system external is sent to provide an interface.System judges this message according to the amount of text of message content Being the most then message or at most message, to send a message to the telecommunications service port of correspondence, it is as follows that it sends form:

Http: // server address: serve port/send.aspx?Dist=target mobile phones Hao Ma &smsbody=message content

Here dist represents target mobile phones number, can provide multiple phone number, separate with branch between each number, and Smsbody is intended for the concrete message content of user, includes user and browses the address of wap service, and the simple data of user.

Wap system

Wap system is used for being supplied to mobile phone user alarm information.When user receives after the message reported to the police, according in message URL address be connected to Wap system, this system provide a geographic name list, user by geographic name list select wish The GIS picture understood in detail and up-to-date alarm information.Native system have employed WML technology, and applies ASP.NET technology to move State produces WML content, allows user see up-to-date alarm information at any time, the most as described below:

If (client's existence)

{ ask for the related data of client in case sending message and using

If (data base's connection)

{ obtain list data

CreateWML () // generation WAP browser resolves data

Send WML statement to Client handset

}

When customer selecting after certain project in specified list,

It is described as follows:

If (data of client's request exists)

{ alarm information that transmission client is asked is to Client handset }

Web system

User can connect Internet network by computer and check alarm information.This system passes through Design of Dynamic Website language ASP.NET technology realizes, and is supplied to user's several functions, is implemented as follows:

If (user logs in successfully)

{ function menu is to login user

If (user asks amendment data)

Connect data base, and revise data

If (user checks that SMS sends history)

Connect data base, inquire about user's SMS history, and be shown to user

If (user checks Wap historical summary)

Connect data base, inquire about Wap historical summary, be shown to user

}

Present invention also offers a kind of audio recognition method, below the algorithm of the speech recognition employed in the present invention is carried out in detail Explanation.

The present invention uses end-point detection algorithm to determine the beginning and end of voice.Each entry being stored in template base is referred to as with reference to mould Plate, a reference template is represented by, and m is the sequential label of training speech frame, and m=1 is starting point speech frame, and m=M is terminating voice Frame, the speech frame sum that therefore M is comprised by this template, R (m) is the speech characteristic vector of m frame.To be identified one defeated Entering entry voice and be referred to as test template, be represented by the sequential label of tested speech frame, n=1 is starting point speech frame, and n=N is terminal Speech frame, the speech frame sum that therefore N is comprised by this template, T (n) is the speech characteristic vector of n-th frame.Reference template and survey Die trial plate typically uses the characteristic vector of same type, identical frame length, identical window function to move with identical frame.Test and ginseng Examine template to represent with T and R respectively, in order to compare the similarity between them, distance D [T, R] between them can be calculated, away from The highest from the least then similarity.In order to calculate this distortion distance, distance between each corresponding frame should count from T and R.If n Being optional frame number in T and R respectively with m, d [T (n), R (m)] represents the distance between this two frame.Distance function depends on reality The distance metric that border uses.

This algorithm is that the time is advised just non-linear with the one that distance measure calculations incorporated the is got up positive technology of rule.As set: reference template is special Levying vector sequence is a1, a2 ..., am；Input speech characteristic vector sequence is b1, b2 ..., bn, if m ≠ n, then This algorithm seeks to hunting time rule positive function m=w (n), and it is mapped to reference template nonlinear for the time shaft treatment of input template Time shaft m, and this w meets equation below (1):

D = \min_{w (n)} Σ_{n = 1}^{N} d [n, w (n)]

Formula (1)

In formula, d [n, w (n)] is n-th frame input vector and the distance of m frame reference vector, and D is corresponding to optimal time rule just lower two The distance measure of template.If n=m, can directly calculate, otherwise to consider to align T (n) and R (m).Alignment can use linearly Expansion method, if n < m can by the sequence that T Linear Mapping is a M frame, then calculate it and between distance.But so Calculating do not account for each section persistent period in different situations in voice and can produce long or short change, therefore know Other effect can not be optimal.Thus be the method using dynamic programming more.This algorithm is a typical optimization problem. It describes input template and the corresponding time relationship of reference template with time rule positive function w (n) meeting certain condition, solves two moulds Rule positive function corresponding to cumulative distance minimum during plate coupling: so this algorithm ensure that the maximum acoustics existed between two templates is similar Property.

Different according to the application in reality, the speech recognition system of the present invention can be divided into: the identification of particular person and unspecified person, Autonomous word and the identification of continuous word, little vocabulary quantity and large vocabulary and the identification of unlimited vocabulary quantity.But the most that speech recognition System, its ultimate principle is similar with processing method.

Speech recognition process mainly includes the pretreatment of voice signal, feature extraction, the several part of pattern match.Pretreatment includes Pre-filtering, sample and quantify, windowing, end-point detection, the process such as preemphasis.The most important ring of voice signal identification is exactly special Levy parameter extraction.The characteristic parameter extracted must is fulfilled for following requirement:

(1) characteristic parameter extracted can represent phonetic feature effectively, has good distinction；

(2) there is good independence between the parameter of each rank；

(3) characteristic parameter wants convenience of calculation, is preferably formed with efficient algorithm, to ensure the real-time implementation of speech recognition.

In the training stage, after characteristic parameter is carried out certain process, set up a model for each entry, save as template base. At cognitive phase, voice signal obtains speech characteristic parameter through identical passage, generates test template, carries out with reference template Coupling, using reference template the highest for coupling mark as recognition result.At the same time it can also be with the help of a lot of prioris, Improve the accuracy rate identified.

Further below use DSP is realized speech recognition to illustrate.

In the algorithm of speech recognition, there is the floating-point operation of many.Realizing floating-point operation with fixed DSP is to write speech recognition The problem needing in program first to solve.This problem can be realized by the calibrating method of number.The calibration of number determines little exactly The several somes positions in fixed-point number.Q representation is a kind of conventional calibrating method.Its expression mechanism is:

Set point number is J, and floating number is f, then the fixed-point number that Q method represents with the transformational relation of floating number is: floating number f is converted to Fixed-point number x:x=(int) y × 2Q；Fixed-point number z is converted to floating number y:y=(float) x × 2-Q.

During with the Implementation on Fixed Point DSP speech recognition algorithm of 16b, although the speed of service of program improves, but data precision compares Low.This is likely to be due to the cumulative errors of pilot process and causes the incorrect of operation result.In order to improve the operational precision of data, Have employed following processing method in a program:

(1) extended precision is in the higher place of required precision, uses 32b, even 48b to represent the intermediate variable calculated. So, operational precision is but made to substantially increase in the case of instruction strip number increases seldom.

(2) pseudo-floating point is used to represent floating number

Pseudo-floating point i.e. represents floating number by the method for mantissa+index.At this moment, the mantissa of data block can use Q1.15 data Form, the index of data block is identical.The method of this expression data has sufficiently large scope of data, can fully meet data essence The requirement of degree, however it is necessary that and oneself write a set of index and mantissa's computing storehouse, can additionally increase instruction number and the operand of program, It is unfavorable for real-time implementation.

Both the above method, can improve operational precision, but when practical operation, answering according to the requirement of system and algorithm Miscellaneous degree, weighs consideration.

In high-level language, there is the difference of global variable and local variable storage, but in DSP program, the variable of all statements exists Data space all can be given during link.If so define local variable like that according to high-level language, substantial amounts of DSP will be wasted Memory space, this is for the fixed DSP of data space more anxiety, it is clear that be irrational.In order to save memory space, When writing DSP program, preferably safeguard an argument table.When often entering a DSP submodule, do not rush to distribute new local Variable, should preferentially use and distribute but no variable.When inadequate, only just distribute new local variable.

In the realization of speech recognition algorithm, for the ease of design and the debugging of program, have employed modular Programming Methodology. Carrying out Module Division with the basic process of speech recognition for foundation, each module is further subdivided into several submodules, then with module It is programmed for unit and debugs.Before coding, first each module is carried out algorithm simulating, at this by high-level language On the basis of carry out writing of assembly program again.When debugging, the debud mode of high-level language and assembler language contrast can be used, So can verify the correctness of assembler language by following the tracks of the intermediateness of high-level language and assembler language, and send out timely Now with amendment mistake, shorten programming cycle.It addition, in the compiling procedure of program, should be in crucial part plus necessary note Release and illustrate, to strengthen the readability of program.At total timing, need in each module, set corresponding suction parameter and outlet Parameter, safeguards stack pointer and intermediate variable etc..

Above embodiment is only to be described the preferred embodiment of the present invention, is not defined the scope of the present invention, On the premise of designing spirit without departing from the present invention, the various changes that technical scheme is made by those of ordinary skill in the art Shape and improvement, all should fall in the protection domain that claims of the present invention determines.

Claims

1. a smart mobile phone speech control system, including vehicle carried mobile phone system, server receives system, and SMS message service is System, Wap system and Web system；It is characterized in that:

When user receives after the note of vehicle carried mobile phone system, according to the URL address comprised in note, come by mobile phone browser Connect and access Wap system, download related data, Wap system by data to pass to according to the text list form after time-sequencing User mobile phone, certain connection in user's selective listing can check the current region of vehicle and other relevant informations；And User can log in Web system by computer system thus check related data, and Web system provides history management Function.

Smart mobile phone speech control system the most according to claim 1, is characterized in that: after user says natural language, car First mobile terminal of mobile telephone included in load cell phone system calls voice control and carries out recording and intelligence editing, then forwards voice Data are to mobile phone background monitoring center, and mobile phone background monitoring center processes after obtaining speech data, after being then back to process Result also presents to cellphone subscriber, and user is sent to server described word message again after confirming speech processes result and receives system.

Smart mobile phone speech control system the most according to claim 1, is characterized in that: URL address comprises user account number and Encrypted message.

Smart mobile phone speech control system the most according to claim 1, is characterized in that: when described intelligent sound storehouse according to The natural language that family is said is judged when user is in state of intoxication, and mobile phone terminal can carry out warning prompt or realize automatically rescuing Help.

Smart mobile phone speech control system the most according to claim 4, is characterized in that: described automatic-aid includes that phone is certainly Number that dynamic note or call pre-set and automatic alarm.

6. the audio recognition method in any one smart mobile phone speech control system in claim 1-5, including language The training stage of tone signal and cognitive phase, is characterized in that: was trained the stage before carrying out speech recognition period, in training After the speech characteristic parameter representing phonetic feature is processed by the stage, set up a model for each entry, save as reference Template；In speech recognition period, voice signal obtains speech characteristic parameter through identical passage, generates test template, with ginseng Examine template to mate, using reference template the highest for matching value as recognition result.

Audio recognition method the most according to claim 6, is characterized in that: use end-point detection algorithm to determine the starting point of voice And terminal, reference template and test template use the characteristic vector of same type, identical frame length, identical window function and identical Frame moves；Test and reference template represent with T and R respectively, in order to compare the matching degree between them, calculate the distance between them D [T, R], distance is the least, thinks that the matching value between them is the highest.

Audio recognition method the most according to claim 7, is characterized in that:

If reference template features vector sequence is a1, a2 ..., am；Input speech characteristic vector sequence is b1, b2 ..., bn, Wherein meet time rule positive function m=w (n) between m and n, and this w meet equation below:

D = \min_{w (n)} Σ_{n = 1}^{N} d [n, w (n)]

9. according to audio recognition method any one of in claim 6-8, it is characterized in that: the fixed DSP using 16b is real Floating-point operation in existing speech recognition, in the place that required precision is higher, uses 32b or 48b to come the intermediate variable calculated Representing, or use pseudo-floating point to represent floating number, wherein pseudo-floating point refers to represent floating number by the method for mantissa+index.