CN119203100A - A smart speaker permission design management method and system - Google Patents
A smart speaker permission design management method and system Download PDFInfo
- Publication number
- CN119203100A CN119203100A CN202411217025.7A CN202411217025A CN119203100A CN 119203100 A CN119203100 A CN 119203100A CN 202411217025 A CN202411217025 A CN 202411217025A CN 119203100 A CN119203100 A CN 119203100A
- Authority
- CN
- China
- Prior art keywords
- user
- permission
- authority
- feature vector
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
- G06F16/636—Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Computer Security & Cryptography (AREA)
- Physiology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an intelligent sound box authority design management method system, which comprises the steps of collecting user audio data and preprocessing the audio data, constructing a user-situation joint feature vector to carry out recognition accuracy adjustment, designing fuzzy matching and dynamic adjustment models to carry out real-time dynamic management on the authority of an intelligent sound box if complex voice environments of multiple languages and multiple accents appear, constructing a behavior monitoring and anomaly detection model to carry out real-time analysis on the user authority through feature engineering and time sequence modeling, dynamically adjusting the authority level according to the results of anomaly detection and behavior monitoring, providing a mapping function of emotion state and authority response to dynamically adjust the authority level, immediately applying new authority configuration to the intelligent sound box, constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization. The intelligent sound box meets the wide application requirements of the intelligent sound box in various scenes such as families, offices and the like.
Description
Technical Field
The invention belongs to the technical field of authority design management, and particularly relates to an intelligent sound box authority design management method and system.
Background
With the rapid development of internet of things (IoT) technology, smart speakers are becoming indispensable smart devices in home and office environments. The method provides a plurality of convenient services such as music playing, information inquiry, intelligent home control and the like for users in a voice interaction mode. However, the widespread use of smart speakers has also raised a number of problems associated with privacy protection, rights management, and multi-user collaboration. The existing intelligent sound box mainly controls the use permission of equipment through simple static permission setting, and the method has various limitations and disadvantages.
Firstly, the existing static authority management mode lacks of dynamics and intelligence, and cannot adapt to complex requirements of multiple users and multiple scenes. In a home environment, different members may have different usage requirements and rights requirements for the function of the sound box. For example, the rights of children and adults to content access should be different, and the existing static rights settings cannot flexibly cope with these changes, often resulting in children accessing unsuitable content or adults not having timely access to the desired functions. Second, conventional static rule-based rights management cannot provide a personalized experience, lacks deep understanding and learning of individual behaviors and usage habits of users, and results in poor user experience. The intelligent sound box is used as equipment with extremely high daily use frequency, and is supposed to have higher user experience optimizing capability.
In addition, with the popularization of intelligent sound boxes in home, office and other environments, the authority management of the intelligent sound boxes faces more serious privacy security challenges. The traditional authority setting mode generally requires a user to manually adjust the authority through tedious operations, and is not convenient and easy to make mistakes. Even more serious, enclosures are often exposed to public environments where unauthorized users may perform unauthorized operations through voice commands, presenting a significant security risk. For example, unauthorized users can easily access confidential information, control smart home devices, and even make improper voice purchases, resulting in property and privacy losses. In the prior art, solutions to these security problems are mostly post-hoc remedies, and a prospective protection strategy is lacking.
Finally, rights management of smart speakers lacks an effective management mechanism in a multi-user scenario, especially in a scenario where multiple people are simultaneously using, how to differentiate user identities and provide personalized services is a big challenge. The current rights management system can only distinguish simple user roles or rely on a manual login mode to switch users, which is not convenient and unreliable in actual use and cannot meet diversified use requirements. In the face of alternate use of users with different ages and different roles, the prior art cannot provide an intelligent and dynamic authority management scheme, which limits the function exertion and user experience optimization of the intelligent sound box.
Therefore, the current intelligent sound box authority management technology has obvious defects, which are mainly represented by the lack of dynamic intelligence, personalized experience, security guarantee and insufficient support for multi-user scenes. These problems severely limit the potential deployment and market applications of smart speakers.
Disclosure of Invention
The invention aims to design an intelligent sound box authority design management method and system, introduces an advanced artificial intelligent technology and a data analysis method, overcomes the defects of the prior art, and provides a more intelligent sound box authority management system with high safety and excellent user experience.
In order to achieve the above object, in a first aspect of the present invention, there is provided a method for rights design management of an intelligent sound box, the method comprising:
S1, collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data, and outputting an audio signature;
S2, taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to carry out recognition accuracy adjustment, and then carrying out situation awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
S3, according to the optimized authority level and authority score, if complex voice environments of multiple languages and multiple accents appear, designing real-time dynamic management of the intelligent sound box authority by a fuzzy matching and dynamic adjusting model, wherein the fuzzy matching and dynamic adjusting model calculates the credibility score gamma of the voice feature vector C new by using a feature weighted aggregation method, and the credibility score gamma is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
s4, according to the new authority level after being optimized by combining the complex voice environment And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring results
S5, according to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
S6, according to the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
Preferably, the preprocessing is to process the collected audio data x (t) to remove noise and to perform normalization processing;
Wherein, the probability density function of the Gaussian mixture model is expressed as:
Wherein, C represents the extracted Mel frequency cepstrum coefficient feature vector, which is used for representing the voiceprint feature of the audio; represents the parameter set of the Gaussian mixture model, pi k represents the mixing weight of the kth Gaussian component, meets the requirement And pi k≥0;μk represents the mean vector of the kth Gaussian component, Σ k represents the covariance matrix of the kth Gaussian component, K represents the number of Gaussian components and the number of Gaussian distributions in the model; Representing a multivariate normal distribution, expressed as:
where d represents the dimension of the feature vector C.
Preferably, for each new audio input, the user identity is identified by computing a log-likelihood estimate of the feature vector of the new audio under each user model, as follows:
Wherein logL (C new|Si) represents a log-likelihood value of the new feature vector under the ith user model for measuring the matching degree of the audio input and the user model, C new represents the feature vector of the new audio input, S i represents the audio signature of the ith user, namely GMM parameter set Θ i, T represents the total number of frames of the new audio input, represents the number of frames after the audio signal is segmented, and P (C new(t)|Θi) represents the probability of the feature vector C new (T) of the T frame under the user model Θ i.
Preferably, the real-time environment data is collected through various sensors, wherein the sensors comprise a microphone, a camera, a light sensor, a temperature and humidity sensor and an accelerometer, and a real-time environment feature vector is expressed as E;
The user-situation joint feature vectors are fused by a weighted multi-layer perceptron model, expressed as follows:
Ce=σ(W2·σ(W1·[Enorm,Si]+b1)+b2)
The method comprises the steps of generating a context feature vector, wherein C e represents the generated context feature vector, represents the behavior feature of a user in the current environment, [ E norm,Si ] represents the input feature obtained by combining the environment feature vector and a user audio signature, W 1,W2 represents a weight matrix, the dimensions are m (n+1) and m×m respectively, and the weight matrix is learned through model training, b 1,b2 represents a bias vector, the dimensions are m and m respectively, and the bias vector is learned through model training, and sigma represents a nonlinear activation function and is used for capturing nonlinear relations;
Combining the context feature vector C e and the user audio signature S i to generate a user-context joint feature vector f= [ S i,Ce ];
adding a regularization term to the user-context joint feature vector f= [ S i,Ce ] achieves higher recognition accuracy by a weighted combination of user and environmental features, as follows:
Fopt=F+λ·(F⊙Wreg)
Wherein F opt represents the optimized joint feature vector, lambda represents the regularization parameter for balancing the influence of the original feature and the regularization term, as well as the element product operation for element-by-element weighting, W reg represents the weight matrix and the interrelationship between the features.
Preferably, the configuration initialization includes:
designing a permission scoring function according to the adjusted user-situation joint feature vector for calculating the permission score of the user under the current situation The expression is as follows:
Wherein, Representing a permission score for reflecting a permission level of a user in a current context; beta represents a weight adjustment coefficient to control the overall weight of the joint feature; Representing the two norms of the joint features as a measure of feature importance, increasing the constraint of balance between features; Representing a bias term;
Scoring according to the calculated rights And a preset authority level threshold value, performing authority dynamic adjustment, wherein the authority adjustment functionThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment; And expressing the authority threshold, and setting according to the user requirements and system configuration.
Preferably, in S3, the fuzzy matching and dynamic adjustment model designs an error correction feedback mechanism to dynamically adjust the weights and parameters of the fuzzy matching model, the error correction mechanism is expressed as follows:
Δw=∈·(γtarget-γ)·Cnew
Wherein Deltaw represents a weight adjustment vector for modifying the weight of a speech feature, E represents a learning rate for controlling the step size of the weight adjustment, gamma target represents a target confidence score, typically set to a system desired confidence level, gamma represents a confidence score for the current speech input, and C new represents a feature vector for the current speech input.
Preferably, a monitoring and abnormality detection model based on a long-short-time memory network is constructed according to the weighted behavior feature vector B w, behavior features at the next moment are predicted, and an abnormality mode of the behavior is identified, wherein an update formula of the behavior monitoring and abnormality detection model is expressed as follows:
ht=fLSTM(Bw,t,ht-1)
Wherein h t represents the hidden state of the current time t, captures the time dependence of the user behavior, B w,t represents the weighted behavior feature vector of the current time t, h t-1 represents the hidden state of the previous time t-1, and f LSTM represents the state update function of the long and short time memory network;
judging whether abnormal behavior exists or not by calculating residual errors between actual behavior characteristics and predicted behavior characteristics, wherein the abnormal score xi is expressed as follows:
Wherein, xi represents the abnormal score, the difference degree between the current behavior and the predicted behavior, and B w,t represents the actual weighted behavior characteristic vector of the current time t; Sigma 2 represents the variance of the prediction residual and is used for normalizing the abnormality score;
Judging whether to trigger a permission adjustment or warning mechanism according to the anomaly score xi and a preset anomaly threshold value theta, wherein the adjustment strategy is as follows:
If xi is less than or equal to theta, the current authority level is maintained without adjustment
If xi > theta, identifying the abnormal behavior, and carrying out authority tightening or prompting the user to carry out identity verification according to the severity of the abnormality.
Preferably, the collection of the emotional states of the user includes collection of audio and video features, and then a fusion model based on a self-attention mechanism is used to capture interaction between the multi-modal features, and the specific formula is as follows:
e represents a fused emotion feature vector and is used for describing the current comprehensive emotion state of the user; A represents an audio feature vector, the dimension is m, V represents a video feature vector, the dimension is n, W A and W V represent projection matrices for mapping audio and video features to the same space, the dimensions are m x d k and n x d k;dk represent the dimensions of key vectors in an attention mechanism for scaling dot product operations, and softmax represents a normalization function for calculating attention weights;
classifying the emotion states by using the fused emotion feature vector E through an emotion classifier model C (E), and generating an emotion label C, wherein the emotion label C is expressed as follows:
w i represents a weight vector of the classifier model, the weight corresponding to the ith class, and the dimension d k;bi represents a bias item of the classifier model, and the bias corresponding to the ith class;
according to the emotion state label c and the abnormal behavior response strategy, a mapping function of emotion state and authority response is provided For dynamically adjusting authority levelsThe expression is as follows:
Wherein, And rho (c) represents a permission adjustment increment function, and is expressed as an integer which can be positive or negative depending on the user emotion state c.
Preferably, the self-adaptive authority optimization model based on reinforcement learning combines the current authority level and emotion state of the user, and the historical behavior mode and system feedback of the user, namely a state vectorWherein H represents a historical behavior pattern feature vector of a user, F represents a system feedback feature vector, and defines actions executable by the system as an adjustment operation set of authoritiesWherein each action a i represents an adjustment to the level of authority, a sense-optimized objective function J (θ) that maximizes the desired jackpot by optimizing the parameter θ of policy pi θ (S);
the formula for optimizing the objective function J (θ) is as follows:
wherein J (theta) represents an optimized objective function of a strategy parameter theta, and represents a desired jackpot, S represents a current state vector and consists of a permission level, an emotion state, a historical behavior mode and system feedback; Representing a state distribution set, representing a possible state space, wherein gamma represents a discount factor, the value range of which is more than or equal to 0 and less than or equal to 1 and is used for measuring the importance of future rewards, r (S t,at) represents instant rewards obtained when a state S t executes an action a t and reflects the validity and rationality of authority adjustment, a t represents the action executed at a time t and is determined by a strategy pi θ (S);
in the strategy optimization process, a strategy regularization term omega (theta) is introduced to avoid the strategy from excessively fitting the short-term behavior pattern of the user, and an optimization target for maximizing regularization of strategy parameters theta is updated, wherein the optimization target is expressed as follows:
Wherein, theta t+1 represents the updated strategy parameter, theta t represents the strategy parameter at the current time t, alpha represents the learning rate and controls the step length of parameter updating; The method comprises the steps of representing strategy gradients, representing gradients of strategy parameters of an optimized objective function, lambda representing regularization strength parameters and controlling influence degree of regularization items on strategy updating, and omega (theta t) representing strategy regularization items, wherein the strategy regularization items are sparsity regularization, and simplicity and generalization capability of strategy parameters are encouraged, and are represented as follows:
where N represents the total number of policy parameters and θ i represents the ith parameter of the policy parameters.
In a second aspect of the present invention, there is provided an intelligent sound box authority design management system, the system comprising:
The user data collection module is used for collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data and outputting an audio signature;
The configuration initialization module is used for taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to adjust the identification precision, and then carrying out situation awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
The authority initializing module is used for designing real-time dynamic management of the authority of the intelligent sound box by the fuzzy matching and dynamic adjusting model according to the optimized authority level and the authority score if complex voice environments of multiple languages and multiple accents appear, wherein the fuzzy matching and dynamic adjusting model calculates the credibility score gamma of the voice feature vector C new by using a feature weighting aggregation method and is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
the authority management module is used for optimizing the new authority level according to the combined complex voice environment And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring resultsAccording to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
a system optimization module for optimizing the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
The beneficial technical effects of the invention are at least as follows:
Firstly, the invention adopts the user audio signature recognition technology, and can recognize the user identity in real time through the audio characteristics in the using process of the loudspeaker box. The method not only can effectively distinguish different users in a home or office environment, but also can automatically adjust the authority setting according to the use habit and the authority requirement of each user, thereby realizing real personalized service and overcoming the problem of lack of flexibility in the traditional static authority setting.
Secondly, the invention introduces a context-aware authority management technology, and the system can intelligently perceive the current context and dynamically adjust the authority setting by combining the behavior data of the user and the environment information (such as time, place, schedule and the like) through multi-sensor data fusion such as a microphone, a camera, a light sensor and the like. For example, in the evening home mode, the system can automatically start the child protection mode and limit unsuitable function access, so that the intelligence and user experience of the system are greatly improved, and the problem of lack of dynamic adjustment capability in the prior art is solved.
The invention designs a self-adaptive voice fuzzy matching algorithm, which can self-adaptively adjust voice recognition precision under various accents and speech speeds, and ensures that the permission setting is flexibly adjusted to ensure safety while accurately understanding the intention of a user. The self-adaptive mechanism solves the problem of low recognition accuracy in the multi-language and multi-accent use scene in the prior art, and reduces the risk of misoperation of rights.
In addition, the invention also integrates an abnormal authority behavior detection module and an emotion reasoning authority management module. The abnormal authority behavior detection module monitors and analyzes the authority use behavior of the user in real time by using a machine learning algorithm, can rapidly identify abnormal operation and take corresponding safety measures, and enhances the safety and the protection capability of the system. The emotion reasoning authority management module analyzes the emotion state of the user through voice, automatically adjusts the authority when the emotion of the user is excited, and prevents improper behaviors caused by emotion fluctuation. The innovation points effectively solve the defect that the prior art lacks response to the emotion and abnormal behavior of the user, and remarkably improve the safety and humanized experience of the system.
Through the innovation points, the invention provides a comprehensive solution, can effectively overcome various defects in the prior art, and realizes dynamic management, personalized service, safety protection and multi-user support of the authority of the intelligent sound box. The multi-layer and omnibearing intelligent authority management system not only improves user experience, but also obviously enhances the safety and adaptability of equipment, and meets the wide application requirements of intelligent sound boxes in diversified scenes such as families, offices and the like.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
Fig. 1 is a flowchart of a method for rights design management of an intelligent sound box according to an embodiment of the invention.
Fig. 2 is a frame diagram of an intelligent sound box authority design management system according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In one or more embodiments, as shown in fig. 1, the invention discloses a method for managing authority design of an intelligent sound box, which comprises steps 1-6, including:
s1, collecting user audio data, preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data, and outputting an audio signature.
Specifically, during preprocessing, the acquired audio data x (t) is processed to remove noise and normalized.
Where y (t) represents the denoised audio signal. x (t) represents the original audio signal as a function of time.Representing the estimated noise signal, is extracted from the environment using an adaptive filter.
Signal normalization process, the normalized signal is expressed as:
Where y norm (t) represents the normalized audio signal. Mu (y) represents the mean value of the denoising signal y (t), calculated as
Where N is the number of signal samples. Sigma (y) represents the standard deviation of the denoising signal y (t), calculated as
Further, the audio features of the user are modeled using a gaussian mixture model to form an audio signature. The probability density function of the GMM is expressed as:
where C represents an extracted mel-frequency cepstral coefficient (MFCC) feature vector for representing voiceprint features of audio. A set of parameters representing the GMM. Pi k represents the mixing weight of the kth Gaussian component, satisfying And pi k≥0.μk represents the mean vector of the kth gaussian component. Σ k represents the covariance matrix of the kth gaussian component. K represents the number of Gaussian components and the number of Gaussian distributions in the model.Representing a multivariate normal distribution, expressed as:
Where d is the dimension of the feature vector C.
Further, for each new audio input, the user identity is identified by computing a log-likelihood estimate of its feature vector under each user model:
Wherein logL (C new|Si) represents the log-likelihood value of the new feature vector under the ith user model for measuring the degree of matching of the audio input to the user model. C new denotes the feature vector of the new audio input. S i represents the audio signature of the ith user, i.e., GMM parameter set Θ i. T represents the total number of frames of the new audio input, representing the number of frames after segmentation of the audio signal. P (C new(t)|Θi) represents the probability that the feature vector C new (t) of the t-th frame is under the user model Θ i.
S2, taking the audio signature as input to represent the identity characteristic of the user, collecting real-time environment data, preprocessing, constructing a user-situation joint feature vector according to the identity characteristic of the user and the preprocessed real-time environment data, adding a regular term into the user-situation joint feature vector to carry out recognition accuracy adjustment, and then carrying out situation awareness and dynamic authority configuration initialization to obtain the optimized authority level and authority score.
The intelligent sound box collects environmental data in real time through various sensors (such as a microphone, a camera, a light sensor, a temperature and humidity sensor, an accelerometer and the like) to form an environmental characteristic vector E= [ E 1,e2,…,en ]. These environmental features include:
e 1 represents an ambient noise level, ranging from 0 to 100 db.
E 2 represents the illumination intensity, ranging from 0 to 10,000 lux.
E 3 denotes the distance of the user from the loudspeaker in the range of 0 to 5 meters.
E 4 represents the current time, 24 hours, ranging from 0 to 23.99.
E 5 denotes the number of detected persons, and the integer range is 0 to 10.
E 6 denotes the user activity state, classifies the variables, and takes the value range { stationary, walking, jumping, etc }.
Other sensor data such as temperature e 7, humidity e 8, etc.
Further, since the data ranges and units of different sensors are different, the environmental feature vector E needs to be standardized to unify the data scales:
Wherein, Representing the normalized environmental characteristics. min (e i) and max (e i) represent the minimum and maximum values of the environmental feature e i for normalization processing.
Further, the user identity and the environmental characteristics are fused by using the user audio signature S i obtained in the first step, and a novel fusion method is introduced, so that the user identity characteristics and the environmental data are fused, and a situation characteristic vector C e is constructed. The specific method is that a weighted multi-layer perceptron (MLP) model is used, and the model can dynamically adjust the weights of the user characteristics and the environment characteristics in the learning process so as to reflect the behavior modes of the user in different environments.
Ce=σ(W2·σ(W1·[Enorm,Si]+b1)+b2)
Wherein C e represents the generated contextual feature vector, representing the behavioral characteristics of the user in the current environment. [ E norm,Si ] represents the input feature after combining the ambient feature vector and the user audio signature. W 1,W2 represents a weight matrix, the dimensions are m× (n+1) and m×m, respectively, and learning is performed through model training. b 1,b2 represents bias vectors, the dimensions are m and m respectively, and learning is performed through model training. σ represents a nonlinear activation function (e.g., reLU or Sigmoid) for capturing nonlinear relationships.
Further, a user-context joint feature vector f= [ S i,Ce ] is generated in combination with the context feature vector C e and the user audio signature S i. This is a fused feature vector that represents the comprehensive state of a particular user in the current context.
In order to better capture the complex relationship between the user features and the environmental features, a joint feature modeling method with innovative regularization terms is provided. This regularization term achieves higher recognition accuracy by a weighted combination of the user and environmental features. The model formula is as follows:
Fopt=F+λ·(F⊙Wreg)
Wherein F opt represents the optimized joint feature vector, taking into account the weighted relationship of the user and the environmental features. Lambda represents a regularization parameter for balancing the effects of the original features and regularization term, typically determined by cross-validation. As indicated by the letter, "(Hadamardproduct) is used for the element-wise weighting. W reg represents a weight matrix, representing the interrelationship between features, obtained by optimizing dynamic learning.
Further, based on the optimized joint feature vector F opt, a permission scoring function with an innovation extra term is provided for calculating the permission score of the user in the current situation
Wherein, And representing the permission score for reflecting the permission level of the user in the current context. w represents a weight vector, obtained through model training, and represents the importance of each feature. Beta represents a weight adjustment coefficient, controlling the overall weight of the joint feature. and/F opt∥2 represents the two norms of the joint features, which are taken as the measure of the importance of the features, and the constraint of balance among the features is increased. b represents the bias term, obtained by training data learning.
Scoring according to the calculated rightsAnd a preset authority level threshold value, and performing authority dynamic adjustment. Rights adjustment function Is defined as:
Wherein, Representing the dynamically adjusted permission level according to which the specific permission configuration is to be set. T 1,T2 represents a permission threshold, and is set according to user requirements and system configuration.
The finally determined authority levelThe method is applied to an intelligent sound box system and can dynamically control functions and data authority which can be accessed by a user.
And S3, according to the optimized authority level and authority score, if complex speech environments of multiple languages and multiple accents appear, designing fuzzy matching and dynamic regulation models to dynamically manage the authority of the intelligent sound box in real time.
Specifically, when a user sends a voice command, the sound box collects a current voice input signal x (t), and performs preprocessing (such as denoising and standardization) on the voice signal by using the method in the first step to generate a processed voice feature vector C new.
Meanwhile, in order to adapt to a multi-language and multi-accent environment, a self-adaptive voice fuzzy matching model is provided. The model uses a feature weighted aggregation method to calculate the credibility score gamma of the voice feature vector C new, and the formula is as follows:
Where γ represents the confidence score of the speech input, and represents the confidence of the matching degree of the current speech feature. M represents the total number of speech features. w j represents the weight of the jth feature, indicating the importance of that feature in calculating speech confidence. Alpha j denotes the fuzzy tuning parameter of the j-th feature, controlling the sensitivity of feature matching. C new,j denotes a component of the jth speech feature vector. Mu j represents the expected value of the j-th feature, representing the typical value of that feature under normal conditions.
Further, defining a confidence-based entitlement adjustment policy utilizes a confidence score gamma of the voice input and the entitlement score in the second stepA rights regulating function is provided "Dynamically adjusting the permission level of a user
Wherein, And the authority level after dynamic adjustment is represented, and the authority setting after adjustment is carried out according to the voice credibility and the original authority score.Representing the initial permission level of the output of the second step. η represents a gain factor for entitlement adjustment, and the amplitude of entitlement adjustment is controlled. Gamma denotes the confidence score of the speech input. Delta represents an intermediate threshold of the confidence score, typically set to 0.5.Representing the rights score calculated in the second step. τ represents a baseline threshold for the entitlement score, typically set based on actual usage.
Further, according to the calculated new authority levelThe intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions.
In order to further improve the robustness and accuracy of the system, an "error correction feedback mechanism" is introduced. The mechanism dynamically adjusts the weight and parameters of the fuzzy matching model by detecting and analyzing user feedback or use behaviors after rights adjustment. The core formula of error correction is as follows:
Δw=∈·(γtarget-γ)·Cnew
Where Δw represents a weight adjustment vector for modifying the weight of the speech feature. E denotes the learning rate, the step size for controlling the weight adjustment. Gamma target represents the target confidence score, typically set to the desired confidence level for the system. Gamma denotes the confidence score of the current speech input. C new denotes the feature vector of the current speech input.
The parameters of the voice fuzzy matching model are updated according to the error correction formula, so that the authority configuration can be more accurately adjusted when the system faces to voice input of different users and contexts.
Further, the permission adjusting function is combined with the instant feedback and long-term use behavior data of the userAnd continuously optimizing the fuzzy matching model. The system parameters are dynamically adjusted by monitoring the operation habit and the authority use condition of the user in real time, so that the adaptability and the accuracy of the authority adjustment are improved.
And collecting the use data (such as the behavior, satisfaction feedback and the like of the user under the adjusted authority level) of the user, and periodically retraining the fuzzy matching model and the authority adjustment function to ensure that the model parameters are consistent with the dynamic changes of the user demands.
S4, according to the new authority level after being optimized by combining the complex voice environmentAnd then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring results
Specifically, extracting the user behavior features includes:
The intelligent sound box continuously collects operation behavior data of a user and generates a behavior feature vector B= [ B 1,b2,…,bk ], wherein the features comprise:
b 1 represents the type of voice instruction (such as playing music, setting alarm clock, etc.) issued by the user, and is represented as a classification variable.
B 2 denotes the time interval (unit: second) at which the instruction occurs, expressed as a continuous variable.
B 3 denotes the frequency of instructions (units: times/min) issued by the user during a specific time period, expressed as a continuous variable.
B 4 represents the authority level change amplitude corresponding to each instruction, and represents an integer.
B 5 represents the identified emotional state of the user, expressed as a classification variable (e.g., calm, excited, etc.).
And (3) normalizing and weighting the behavior feature vector B to eliminate scale difference among different features. Then, weighted combination is performed according to the feature importance, and a weighted behavior feature vector B w is generated:
Bw=WB·Bnorm
wherein B w represents a weighted behavior feature vector, which is represented as a behavior feature weighted by different weights. W B denotes a weight matrix for weighting of different features. B norm denotes the normalized behavior feature vector, expressed as normalized feature values.
Further, a behavior monitoring and anomaly detection model based on a long-short-time memory network (LSTM) is constructed using the weighted behavior feature vector B w for capturing the time dependence of the user behavior. The goal of the model is to predict the behavior characteristics at the next moment, thereby identifying the abnormal pattern of behavior. The updated formula of the LSTM model is:
ht=fLSTM(Bw,t,ht-1)
Wherein h t represents the hidden state at the current time t, capturing the time dependence of the user behavior. B w,t denotes the weighted behavior feature vector at the current time t. h t-1 denotes the hidden state at the previous time t-1. f LSTM represents the state update function of LSTM, defined as a conventional LSTM cell calculation formula.
And judging whether abnormal behaviors exist or not by calculating residual errors between the actual behavior characteristics and the predicted behavior characteristics. The anomaly score ζ is calculated by the following formula:
where ζ represents the anomaly score, representing the degree of difference between the current behavior and the predicted behavior. B w,t represents the actual weighted behavior feature vector at the current time t. And the weighted behavior feature vector represents the next moment of LSTM model prediction. σ 2 represents the variance of the prediction residual for normalizing the anomaly score.
Further, according to the anomaly score ζ and a preset anomaly threshold value θ, it is determined whether to trigger a permission adjustment or a warning mechanism. The adjustment strategy is as follows:
If xi is less than or equal to theta, the current authority level is maintained without adjustment
If xi > theta, identifying the abnormal behavior, and carrying out authority tightening or prompting the user to carry out identity verification according to the severity of the abnormality.
Further, defining authority dynamic adjustment function under abnormal behavior conditionDynamic adjustment of rights level based on anomaly score ζ
Wherein, And representing the authority level after dynamic adjustment, and considering the authority setting after abnormal behaviors.Representing the level of rights output by the third step. ζ represents the adjustment step size parameter, and the adjustment range of the control authority. ζ represents the current anomaly score. θ represents a threshold value for abnormality detection, and is generally set based on actual user behavior data.
Further, the time sequence behavior model and the abnormality detection mechanism are continuously optimized by combining feedback of the user and actual behavior data. Through online learning and periodic model updating, the self-adaptability of the model to the behavior change of the user is ensured. User behavior data (such as abnormal behavior times, authority adjustment frequency and the like) are collected, parameters of the LSTM model are updated periodically, and an abnormality detection threshold value theta and a weight matrix W B are adjusted to ensure that the system can stably run in a changeable user behavior environment.
S5, according to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box instantly applies new permission configuration.
Further, the user emotion state characteristics are extracted, and the intelligent sound box is combined with the microphone and the camera to collect the multi-mode emotion data of the user in real time, wherein the multi-mode emotion data comprises audio and video characteristics. The audio feature vector a= [ a 1,a2,…,am ] may include:
a 1 represents the voice pitch frequency variation, expressed as a continuous variable.
A 2 represents the speech rate variation, expressed as a continuous variable.
A 3 represents the intensity of the voice volume, expressed as a continuous variable.
Other audio features such as speech discontinuities, timbre, etc.
The video feature vector v= [ V 1,v2,…,vn ] may include:
v 1 denotes a facial expression feature (such as eye-horn up, mouth-horn down, etc.), expressed as a classification variable.
V 2 denotes an eye movement characteristic (such as eye movement speed, direction, etc.), expressed as a continuous variable.
V 3 denotes head pose (e.g., rotation angle, pitch angle, etc.), expressed as a continuous variable.
Further, the audio feature vector A and the video feature vector V are fused to generate a fused emotion feature vector E. A fusion model based on self-attention mechanism is used for capturing interaction between multi-modal features, and the specific formula is as follows:
E represents the fused emotion feature vector and is used for describing the current comprehensive emotion state of the user. A represents an audio feature vector, and the dimension is m. V denotes the video feature vector, with dimension n. W A and W V represent projection matrices for mapping audio and video features to the same space, dimensions m x d k and n x d k.dk represent the dimensions of key vectors in the attention mechanism for scaling dot product operations. softmax represents the normalization function used to calculate the attention weight.
Further, the fused emotion feature vector E is utilized to pass through an emotion classifier modelClassifying the emotion states to generate an emotion label c:
Where c represents the current user's emotional state label, expressed as discrete categories (e.g., calm, anger, happiness, etc.). w i represents the weight vector of the classifier model, the weight corresponding to the i-th class, the bias term of the classifier model with the dimension d k.bi, the bias corresponding to the i-th class and the scalar.
Further, according to the emotion state label c and the abnormal behavior response strategy, a mapping function of emotion state and authority response is providedFor dynamically adjusting authority levels
Wherein, And representing the authority level dynamically adjusted according to the emotion state, and considering authority setting of the emotion of the user.Representing the level of rights output by the fourth step. ρ (c) represents a permission adjustment delta function, expressed as an integer that can be positive or negative depending on the user's emotional state c.
Further, according to the emotion state and authority response mapping functionCalculated authority levelThe intelligent sound box immediately applies new permission configuration, and ensures safe and reasonable operation of the user in the current emotion state.
And continuously optimizing an emotion classification model and a permission response strategy by combining instant feedback of a user and long-time behavior pattern data. By monitoring the behavior and emotion change of the user in real time, the parameters of the emotion classifier are updated regularly, and the authority response function is adjusted And the emotion characteristic fusion model ensures that the model can adapt to the dynamic change of the emotion state of the user.
S6, according to the authority levelConstructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
Specifically, the state of the adaptive authority optimization model based on reinforcement learning is defined as the current authority level and emotion state of the user, and the historical behavior pattern and system feedback of the user, namely a state vectorWherein, And (3) representing the dynamically adjusted authority level output in the fifth step, and representing the current authority configuration optimized according to the emotion state of the user. c represents the current emotional state label of the user, expressed as discrete categories (e.g., calm, anger, happiness, etc.). H represents a historical behavior pattern feature vector of the user, representing statistical features of the user behavior over a period of time, such as operating frequency, preferred operating type, etc. F represents a system feedback feature vector, and represents user feedback data of the system under different authority configurations, such as user satisfaction degree scores, error operation frequencies and the like.
Defining system-executable actions as a set of rights adjustment operationsWherein each action a i represents an adjustment (e.g., raise, lower, hold, etc.) to the level of rights. The design of these actions needs to take into account the security of the rights, the user experience and the functional limitations of the smart speakers.
Further, to enable the system to maintain optimal rights configuration in varying user behavior and emotional states, we employ a policy-optimized reinforcement learning approach. An optimization objective function J (θ) is defined to maximize the desired jackpot by optimizing the parameter θ of the strategy pi θ (S). The formula for optimizing the objective function J (θ) is as follows:
where J (θ) represents an optimized objective function of the policy parameter θ, representing the desired jackpot. S represents a current state vector and consists of a permission level, an emotion state, a historical behavior mode and system feedback. Representing a set of state distributions, representing possible state spaces. Gamma represents a discount factor, and the value range is more than or equal to 0 and less than or equal to 1, so that the importance of future rewards is measured. r (S t,at) represents an instant prize obtained when the action a t is performed in the state S t, reflecting the validity and rationality of the rights adjustment. a t represents the action performed at time t, as determined by policy pi θ (S).
In the policy optimization process, to avoid the policy from excessively fitting the short-term behavior pattern of the user, we introduce a novel policy regularization term Ω (θ) and update the policy parameters θ to maximize the regularized optimization objective:
Where θ t+1 represents the updated policy parameters. θ t represents the policy parameter at the current time t. Alpha represents the learning rate and controls the step size of parameter updating. Representing the policy gradient and representing the gradient of the policy parameters that optimize the objective function. λ represents a regularization strength parameter, controlling the extent to which regularization terms affect policy updates. Omega (theta t) represents a strategy regularization term, is designed to be sparsity regularization, encourages conciseness and generalization capability of strategy parameters, and has the following formula:
where N represents the total number of policy parameters. θ i represents the ith parameter among the policy parameters.
Further, the design of the instant rewarding function r (S t,at) combines the reasonability of user behavior feedback, emotion state change and authority adjustment, and aims to balance the security and user experience of the system. The formula of the reward function is as follows:
Where r (S t,at) represents the instant prize, measuring the effect of executing action a t in state S t. U t represents the amount of change in the user satisfaction score at the current time t, and the function f (U t) represents a positive reward, positively correlated with a reasonable entitlement adjustment. E t represents the amount of change in the user's emotional state fluctuation, and function g (E t) represents a negative reward that is inversely related to the negative emotional response due to the rights adjustment. Representing the magnitude, function of rights adjustmentIndicating a negative prize and punishing excessively frequent entitlement adjustment operations. Omega 1,ω2,ω3 represents a weight coefficient and controls the influence of various factors on instant rewards.
To further improve the adaptability and generalization ability of the model, we introduced an adaptive learning mechanism based on user behavior patterns. The mechanism dynamically adjusts the learning rate and regularization parameters of the strategy model by continuously monitoring the behavior characteristics and the emotion state changes of the user so as to adapt to the personalized requirements of the user. Learning rate and regularization parameter dynamic adjustment formula:
αt+1=αt·(1-η·|ΔHt|)
λt+1=λt·(1+ζ·|ΔFt|)
Where α t+1 denotes the learning rate at the next time. Alpha t represents the learning rate at the current time. η represents a learning rate adjustment coefficient, and the adjustment range of the learning rate is controlled. The |Δh t | represents the variation amplitude of the user behavior pattern feature vector, and represents the variation degree of the user behavior. Lambda t+1 represents the regularized intensity parameter at the next time instant. Lambda t represents the regularized strength parameter at the current time. ζ represents a regularization parameter adjustment coefficient, and an adjustment amplitude of the regularization intensity is controlled. The |Δf t | represents the variation amplitude of the system feedback feature vector, and represents the variation degree of the user feedback.
Further, a policy optimization and feedback loop is established in combination with the long-term behavioral data and immediate feedback of the user. And the optimal performance of the system in long-term operation is ensured by continuously monitoring the performance of the system under different authority configurations and periodically updating parameters of the strategy model.
And collecting long-term behavior data (such as authority adjustment frequency, user satisfaction change, emotion state fluctuation and the like) of the user, and periodically updating parameters of the reinforcement learning model, including strategy parameters theta, regularization parameters lambda and learning rate alpha, so as to ensure that the model can adapt to long-term changes of the user behavior and emotion state.
The embodiment of the application also provides an intelligent sound box authority design management system, as shown in fig. 2, which comprises:
the user data collection module 101 is used for collecting user audio data and preprocessing the audio data, constructing a Gaussian mixture model according to the preprocessed data and outputting an audio signature;
The configuration initialization module 102 is configured to take the audio signature as input to represent the identity feature of the user, collect real-time environment data and perform preprocessing, construct a user-context joint feature vector according to the identity feature of the user in combination with the preprocessed real-time environment data, add a regular term into the user-context joint feature vector to perform recognition accuracy adjustment, and then perform context awareness and dynamic authority configuration initialization to obtain an optimized authority level and authority score;
The authority initialization module 103 is configured to design real-time dynamic management of the authority of the intelligent sound box by using a fuzzy matching and dynamic adjustment model according to the optimized authority level and authority score if complex speech environments with multiple languages and multiple accents appear, where the fuzzy matching and dynamic adjustment model calculates a confidence score γ of a speech feature vector C new by using a feature weighted aggregation method, and the confidence score γ is expressed as follows:
Wherein, gamma represents the credibility score of the voice input, represents the credibility of the matching degree of the current voice feature, M represents the total number of voice features, w j represents the weight of the jth feature, represents the importance of the feature in calculating the voice credibility, alpha j represents the fuzzy adjustment parameter of the jth feature, controls the sensitivity of feature matching, C new,j represents the component of the jth voice feature vector, mu j represents the expected value of the jth feature, and represents the typical value of the feature under normal condition;
Confidence score gamma and optimized authority score using speech input Adjusting functions according to rightsDynamically adjusting authority level of userThe expression is as follows:
Wherein, Representing the authority level after dynamic adjustment, and representing the authority setting after adjustment according to the voice credibility and the original authority score; Representing the initial authority level output by the second step; η represents a rights adjustment gain coefficient, controls the rights adjustment amplitude, γ represents a confidence score for the speech input, δ represents an intermediate threshold for the confidence score; Represents the optimized authority score, tau represents the reference threshold of the authority score, and is calculated according to the new authority level The intelligent sound box system adjusts the authority configuration of the user in real time and dynamically changes the accessibility of the functions;
rights management module 104 for optimizing new rights levels based on a combination of complex speech environments And then combining with real-time user behavior data, constructing a behavior monitoring and anomaly detection model, analyzing user permission in real time through feature engineering and time sequence modeling, performing anomaly detection and behavior monitoring on user behavior, and dynamically adjusting permission level according to the anomaly detection and behavior monitoring resultsAccording to the authority levelClassifying the emotion states through an emotion classifier model by combining the emotion states of the user, generating emotion labels, providing mapping functions of the emotion states and authority responses to dynamically adjust authority levels according to the emotion state labels and abnormal behavior response strategies, and calculating the authority levels according to the emotion states and the authority response mapping functionsThe intelligent sound box immediately applies new authority configuration;
A system optimization module 105 for optimizing the authority level Constructing an adaptive authority optimization model based on reinforcement learning, and carrying out adaptive learning on the authority strategy of the intelligent sound box through reinforcement learning to carry out long-term strategy optimization by combining long-term behaviors and use modes of users.
The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the invention, as it is understood by those skilled in the art that all or part of the above-described embodiments may be practiced without resorting to the equivalent thereof, which is intended to fall within the scope of the invention as defined by the appended claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411217025.7A CN119203100B (en) | 2024-09-02 | 2024-09-02 | A smart speaker permission design management method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411217025.7A CN119203100B (en) | 2024-09-02 | 2024-09-02 | A smart speaker permission design management method and system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119203100A true CN119203100A (en) | 2024-12-27 |
| CN119203100B CN119203100B (en) | 2025-06-20 |
Family
ID=94057533
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411217025.7A Active CN119203100B (en) | 2024-09-02 | 2024-09-02 | A smart speaker permission design management method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119203100B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119741171A (en) * | 2025-03-03 | 2025-04-01 | 龙岩学院 | A smart education management system based on multi-user collaboration |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108763892A (en) * | 2018-04-18 | 2018-11-06 | Oppo广东移动通信有限公司 | Authority management method, device, mobile terminal and storage medium |
| WO2022268136A1 (en) * | 2021-06-22 | 2022-12-29 | 海信视像科技股份有限公司 | Terminal device and server for voice control |
| CA3177530A1 (en) * | 2021-07-14 | 2023-01-14 | Strong Force TX Portfolio 2018, LLC | Systems and methods with integrated gaming engines and smart contracts |
| CN118042355A (en) * | 2024-04-11 | 2024-05-14 | 江西天创智能科技有限公司 | Automatic control system and method for intelligent sound control sound equipment of stage |
| CN118411983A (en) * | 2024-04-16 | 2024-07-30 | 北京四方智汇信息科技有限公司 | Data processing method based on speech recognition model |
-
2024
- 2024-09-02 CN CN202411217025.7A patent/CN119203100B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108763892A (en) * | 2018-04-18 | 2018-11-06 | Oppo广东移动通信有限公司 | Authority management method, device, mobile terminal and storage medium |
| WO2022268136A1 (en) * | 2021-06-22 | 2022-12-29 | 海信视像科技股份有限公司 | Terminal device and server for voice control |
| CA3177530A1 (en) * | 2021-07-14 | 2023-01-14 | Strong Force TX Portfolio 2018, LLC | Systems and methods with integrated gaming engines and smart contracts |
| CN118042355A (en) * | 2024-04-11 | 2024-05-14 | 江西天创智能科技有限公司 | Automatic control system and method for intelligent sound control sound equipment of stage |
| CN118411983A (en) * | 2024-04-16 | 2024-07-30 | 北京四方智汇信息科技有限公司 | Data processing method based on speech recognition model |
Non-Patent Citations (3)
| Title |
|---|
| ALICE COUCKE等: "Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces", ARXIV, 25 May 2018 (2018-05-25), pages 1 - 29 * |
| 李宗霖;: "基于动态网络构思的三网融合核心技术", 前沿科学, no. 01, 28 March 2010 (2010-03-28) * |
| 陈如: "智能化宾馆物业管理系统的实现", 安徽机电学院学报, 31 January 2000 (2000-01-31), pages 75 - 78 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119741171A (en) * | 2025-03-03 | 2025-04-01 | 龙岩学院 | A smart education management system based on multi-user collaboration |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119203100B (en) | 2025-06-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN118673165B (en) | Multi-mode false news detection method and system based on text emotion characteristics and multi-level fusion | |
| KR20190094319A (en) | An artificial intelligence apparatus for performing voice control using voice extraction filter and method for the same | |
| CN119203100B (en) | A smart speaker permission design management method and system | |
| CN119357888A (en) | An intelligent behavior analysis system | |
| CN120163653A (en) | Dynamic risk control method, device, equipment and storage medium | |
| CN119153132B (en) | An artificial intelligence remote psychological consultation platform and method based on emotional cognition | |
| CN120145312A (en) | An intelligent security management system based on AI big model | |
| CN120257050A (en) | Dynamic adjustment method of robot behavior mode based on multimodal perception | |
| CN119939229B (en) | Network content propagation method and system based on fusion cognitive understanding and intelligent management | |
| CN120781259A (en) | Risk detection method, device, equipment and medium for big data abnormal behavior | |
| CN120725209A (en) | Marketing strategy optimization method based on consumer group behavior analysis | |
| Gade et al. | Speaker recognition using improved butterfly optimization algorithm with hybrid long short term memory network | |
| CN120388565B (en) | Voice interaction method and system based on 3D (three-dimensional) virtual | |
| CN119441995B (en) | A multimodal student classroom mental health early warning method and system | |
| US20230342108A1 (en) | Enhanced computing device representation of audio | |
| Namburi | Speaker recognition based on mutated monarch butterfly optimization configured artificial neural network | |
| CN119132308A (en) | A fraud prevention communication method based on voice change recognition | |
| Alexeevskaya et al. | Recognizing human emotions using a convolutional neural network | |
| Bhardwaj et al. | Identification of speech signal in moving objects using artificial neural network system | |
| CN120354176B (en) | Training methods for behavioral decision-making models and adaptive interaction methods for digital humans | |
| CN117271743B (en) | Multi-mode dialogue emotion recognition method and system | |
| Samanta et al. | RETRACTED ARTICLE: An energy-efficient voice activity detector using reconfigurable Gaussian base normalization deep neural network | |
| Lyu et al. | Global and local feature fusion via long and short-term memory mechanism for dance emotion recognition in robot | |
| US20250356874A1 (en) | Artificial intelligence device and operating method thereof | |
| Segarceanu et al. | Evaluation of deep learning techniques for acoustic environmental events detection |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |