CN121459801A

CN121459801A - Child asynchronous voice social method supporting character sound mutual simulation and content re-expression

Info

Publication number: CN121459801A
Application number: CN202511309282.8A
Authority: CN
Inventors: 王敦
Original assignee: Guangdong Operator Connection Intelligent Technology Co ltd
Current assignee: Guangdong Operator Connection Intelligent Technology Co ltd
Priority date: 2025-09-15
Filing date: 2025-09-15
Publication date: 2026-02-03

Abstract

This invention discloses an asynchronous voice-based social method for children that supports mutual simulation of character voices and content re-expression, belonging to the field of voice interaction and children's social technology. The method uses edge intelligence agents such as dolls with voice acquisition capabilities to collect children's voices. Through semantic recognition, language generation, and character binding, it generates expressive content that can be broadcast by multiple character voices. The system allows children to specify voice characters via voice commands, enabling character voice swapping, content previewing and confirmation, and expression task scheduling. The method integrates content blind boxes, voice blind boxes, and a theatrical collaboration mechanism, supporting multi-character chorus, round singing, and theatrical expression. Parents can pre-screen the text portion of the expressive content before sending it; the system does not interfere with voice selection or the expression process. After the expression is broadcast, natural feedback is generated, promoting the next round of children's social interaction. This method realizes a voice expression system where voice is the character, expression is controllable, and social interaction is sustainable, enhancing children's freedom of expression, creativity, and character cognition.

Description

Child asynchronous voice social method supporting character sound mutual simulation and content re-expression

Technical Field

The invention relates to the technical field of voice interaction and social content generation, in particular to a child asynchronous voice social method supporting character voice mutual simulation and content re-expression, belonging to the application fields of related cross technologies such as voice recognition, natural language processing, multi-role voice synthesis, child social interaction systems and the like.

Background

With rapid development of voice recognition, semantic understanding and voice synthesis technologies, voice interaction devices and voice social platforms for children are increasingly oriented. Related products currently in common use in the market mainly include the following categories:

1. The intelligent sound box product generally takes voice question and answer and command response as core functions, has the capabilities of voice awakening, voice broadcasting and the like, but takes adults as main interaction, is limited in use by children, and lacks context expression and role perception capability.

2. A voice transformer or a character simulation toy has the effects of simple audio conversion function, simulation of robot sound, monster sound and the like, but does not have the capability of expressing content generation, and also lacks a system binding mechanism with character identity, and the voice change lacks social paths and content association.

3. Although the voice social application or the message platform such as a voice message board for children, voice chat software and the like have certain social interaction capability, a voice role mechanism is not embedded generally, the expressed content and the voice identity have no binding relationship, and the problems of fuzzy identity, boundary crossing imitation and the like are easily caused.

4. Children's conversational toys, such as voice dolls or educational robots with voice recognition and content triggering capabilities, have limited control over the generation of content expressions, mostly driven by fixed content libraries, lacking in creativity and character replacement mechanisms, although they have voice interaction capabilities.

Under the technical background, the children voice expression system has the following main technical defects:

1. The identity mechanism of the sound character is lacking, in the existing product, the sound is only used as an information transmission means, but not the unique identity of the character, and the expression model of the sound, namely the character, cannot be constructed.

2. The children language expression has the functions of spoken language, fragmentation and unstructured, and the existing system can not complete the expression complement and language color rendering, so that the integrity and the spreading value of the content are affected.

3. The voice change lacks social graph limitation, namely the exchange of voice roles is generally unlimited, the social boundary that children can only imitate friends in reality cannot be embodied, and the problems of identity confusion, expression dislocation, improper imitation and the like are easily caused.

4. The expression flow is dominated by the system, and lacks child control rights, namely the existing system is mostly driven and interacted by program presetting, lacks expression paths which take the language of children as a unique control source, and lacks closed loop design due to the fact that the expression chains are split and control points are scattered.

Therefore, the current child voice interaction technology cannot meet the comprehensive requirements of "generation of expression content+sound simulation of roles+control rights to children+clear social paths", and a new system method is needed to reconstruct the role mechanism, social logic and technical control structure of the child voice expression.

Disclosure of Invention

1. Object of the invention

The invention aims to solve the problems of limited expression capacity, missing of sound characters, disordered social paths, over-strong system control and the like in the traditional children voice interaction system, and provides a children asynchronous voice social method supporting mutual simulation of character sounds and content re-expression so as to realize the technical aims of ' expression, namely creation ', sound, namely identity, character, namely social interaction '.

In the prior art, the voice expression of children usually depends on single sound box type equipment, command type voice assistant or sound changing toys, and has the prominent problems that firstly, a voice character binding mechanism is lacked, a anthropomorphic expression path of voice is not realized, secondly, the expression content is mainly a voice command or a message, the language generation and complement capability is lacked, the expression desire of the children is difficult to excite, thirdly, the voice change mechanism is not bound with a social graph, risks such as override imitation and identity confusion easily occur, fourthly, the system control is too strong, the children lack of autonomy of the expression path, and a real interactive closed loop cannot be formed.

Therefore, the invention provides an asynchronous voice social method which takes the natural voice of children as a unique control input source and completes expression generation, role voice selection, task scheduling and broadcasting execution through model cooperation. According to the method, a role expression mechanism taking sound as an identity tag is constructed, social limited interchange and multi-round collaborative performance of role sound are supported, and the mechanisms such as a content blind box, a sound blind box, a theater blind box and the like are fused, so that the interest, creativity and role cognition of child expression are effectively improved. Meanwhile, the invention sets a limited parental content auditing mechanism and a non-intervention feedback receipt mechanism, and reserves the complete control right of children on the sound selection and expression flow on the basis of ensuring the expression safety compliance.

In summary, the invention aims to provide a novel voice social interaction method with dominant children, real sound, rich expression, clear roles and natural feedback, and a role-type voice interaction system suitable for the needs of voice expression growth and social exploration of children is constructed.

2. Technical proposal

2.1 Overall structure of the inventive method

The invention provides a child asynchronous voice social method supporting mutual simulation of character voices and content re-expression, which is characterized in that the whole flow starts from voice collection, and the whole flow sequentially goes through content generation, default broadcasting, voice interchange, task confirmation, multi-role cooperation to final broadcasting execution to form a complete expression closed loop of 'child language leading-AI language enhancement-character voice realization-social content achievement'. The method comprises the following steps in sequence:

step one, voice collection and semantic recognition

The method is initiated by edge intelligent bodies such as dolls and the like, and the preliminary digital construction of the expression intention is realized through voice acquisition and semantic recognition. The system identifies social instructions, friend anchor points and role calling intents in the voice at the stage, establishes a basic semantic link for a subsequent expression flow, and automatically completes analysis tasks by a platform model.

Content generation and language reconstruction

And according to the voice input content and the intention understanding result of the child, the system invokes the language generation model to complete content complementation, language color rendering, scene expansion and expression optimization, so as to form the complete expression content capable of being broadcasted. Meanwhile, a content blind box mechanism is introduced, creativity and uncertainty of expression are enhanced, and children are encouraged to actively express.

Step three, default sound expression construction

On the basis of content generation, the system calls the registered sound fingerprint of the currently activated edge agent to complete voice broadcasting version generation. This stage ensures that the presentation content has a baseline sound output even if character exchange does not occur. Meanwhile, an interface for listening and confirmation is provided, and a comparison basis is provided for the subsequent voice exchange link.

Fourth, sound role exchange and test hearing confirmation

The child actively initiates the voice role switching through the voice command, and after the system identifies the friend name, the registered doll and the corresponding voice identity are extracted, so that the voice simulation is completed and the listening trial version is generated. The step is to integrate a friend identity blind box mechanism and a sound exchange blind box mechanism, so that the exploration and identification capability of children on character sounds in the social network is enhanced.

Expressing task confirmation and dispatch setting

After the expression content and the sound selection are completed, the child is led to complete the confirmation of the expression, the setting of the receiver and the configuration of the scheduling mode. The system supports multiple modes such as instant broadcasting, timing broadcasting, scene triggering and the like, and builds a 'time blind box mechanism' in a fuzzy triggering scene, so that the interestingness and the expected sense of the expression process are improved. Meanwhile, a unique intervention point of a parental auditing mechanism is reserved, and content compliance is ensured.

Step six, a multi-role cooperative expression mechanism

If the child initiates the multi-role cooperative instruction, the system can perform task disassembly and role allocation on the expression content, and realize complex modes such as master-slave singing, chorus, theatre expression and the like. The step supports mechanisms such as cross-equipment collaboration, theatrical performance, combined command blind box, theatrical blind box and the like, expands social creation space and constructs a high-interactivity sound expression theatre.

Step seven, broadcasting execution and sound expression

And the system wakes up the target doll according to the set trigger condition, and performs content broadcasting. The locked sound identity is used in the broadcasting process, the consistency of expression is ensured, meanwhile, the sound inheritance blind box and the task blind box mechanism are combined, social fun of sound identification and a new expression trigger opportunity are formed on the listener side, and the complete closed loop of the table feedback is realized.

2.2 Description of the specific steps of the method of the invention

Step one, voice collection and semantic recognition

The method comprises the steps that a system executes an initial stage of a children social expression flow, edge intelligent bodies such as dolls initiate voice collection and complete preliminary semantic analysis, and a digital basis for expressing intention is established.

The invention relates to a doll and other edge intelligent bodies, in particular to an interactive terminal device with voice collection, voice expression and character simulation functions, the identity of which is registered and controlled by a platform and is only used for character execution and content broadcasting in the voice social expression process of children.

In this step:

1. and the wake-up mechanism is used for enabling the edge intelligent bodies such as the doll and the like by the child through the modes of actively waking up words, specific commands or shaking, and the like, and triggering a voice acquisition function.

2. And voice collection, namely recording the voice of the child by the edge intelligent agent through the built-in microphone, and uploading the audio data to the platform processing module in real time or in a delayed manner. Physical keys are not needed, an intelligent sound box awakening mechanism is not relied on, and pure voice interaction experience is maintained.

3. Semantic recognition, namely calling a voice recognition and natural language understanding module by the system, analyzing the collected voice content, and extracting the following key elements:

core expression text (e.g., lectures, clips, phrases, etc.);

intent judgment (e.g., telling stories, leaving messages, imitating, singing, etc.);

Instruction information (e.g. "change to small trial" and "get his hearing in the morning on tomorrow").

4. And identifying social anchor points, namely identifying and analyzing friend names appearing in voice by the system, and extracting a unique main key serving as a social path. The friend name in the system is an entry of the whole social graph and is used for binding character sounds, determining the trend of the content and a source pointer of the subsequent simulation execution.

5. Character analysis preparation, namely when semantic structures such as ' test of changing person ' are identified, the system prepares for subsequent character sound exchange, locks doll identities (such as ' gurgling balls) corresponding to target characters, and hangs the intention into a content expression link.

The design core of the step is that the expression is always carried out around language leading of children without complex interaction mode, and all subsequent content generation, role control and expression paths are ensured to be triggered and mastered by natural voice of the children.

In addition, all analysis tasks in the step are completed by the platform side model, children do not need to know the working details of the model, the system automatically completes conversion from language-semantic-social intention, and the following expression vector and role foundation are built.

Content generation and language reconstruction

After the semantic recognition and expression intent extraction are completed, the system enters the content construction stage. The system calls a language generation model to carry out the following processing on the expression content according to the voice input content of the children and the semantic understanding result:

1. The expression complement and language color rendering are to complete the phrase, spoken language description, inspiration fragments and the like sent by the children, and keep the original expression style of the children, so that the output is faithful and has expressive force.

2. Scene expansion and story generation, namely for expression with story tendency or interactive scene intention, the system automatically generates expression content with character setting, plot structure or fairy tale style, for example, the ' we can expand to ' we see a duck wearing western-style clothes at the pond ' at the same time.

3. And (3) language reconstruction and expression optimization, namely, under the premise of keeping the semantics unchanged, reconstructing rhythms, rhythms and sentence-like structures of the languages, so that the languages are more suitable for subsequent character voice broadcasting and have child acceptance.

4. The content blind box mechanism is fused, wherein one of the core mechanisms in the step is a content regeneration blind box mechanism, namely:

The child lightly inputs a sentence or phrase, and the system reconstructs and expresses the sentence or phrase as an unexpected complete sentence or scene;

This mechanism emphasizes the surprise, randomness and motivation of the expression process, one of the key mechanisms encouraging children to express continuously;

the content generation has randomness and style diversity, and has a blind box effect of 'light trigger-heavy feedback'.

5. The child leading control right keeps that the system does not actively report after generating the content, and the child decides whether to adopt, rewrite and use with role exchange or not, so that the expression is always child leading and system auxiliary.

The design key point of the step is that AI expression is used for enhancing the expression capability of children, so that a sentence is changed into a section of fairy tale, a sentence of messages is changed into a performance, creativity and expressive force of social expression are improved, and a blind box type experience foundation of ' expression, namely creation ', expression, namely surprise ' is formed.

Step three, default sound expression construction

After completing the content generation, the system converts the constructed expressive content into a default sound announcement format for the currently active doll. The step ensures that the expression content can be completely broadcast through the original role even if the voice exchange is not performed, and a baseline version of the voice expression is established.

1. Identity sound generation, namely, the system calls character sounds registered by edge intelligent agents such as the current doll (unified authorization management by a platform) and generates expression voice content based on the character sounds. The sound fingerprint is a unique identification of the doll identity and cannot be changed.

2. And the system can adjust the mood, rhythm, speed and pause according to the expression content on the premise of not changing the sound fingerprint, so that the expression content has situation suitability such as fairy tale, whisper intonation, speech rhythm and the like.

3. The consistency of the sound carrier ensures that the system does not control the sound style by using the emotion label or the style variable, but always keeps the sound as the only carrier for expressing the content, and the expressing effect is indirectly determined by the semantics of the children. This way it is ensured that the expression control and aesthetic judgment rights are given back to the child, rather than defining "expression of emotion that should be by the model".

4. And the system provides the expression content generated by default sound with a listening test option for children to confirm whether to keep the current character sound or enter the next character sound exchange flow.

The step serves as a sound realization starting point of content expression, not only keeps the sound identity integrity of the character, but also provides a comparison reference basis for a subsequent 'sound exchange blind box mechanism'. The child can decide whether to switch the character sound or not and whether to try the friend sound blind box or not based on the listening experience, so that the exploring pleasure of sound selection is improved.

Fourth, sound role exchange and test hearing confirmation

After the default sound expression generation is completed, the system enters a character sound exchange and listening confirmation phase. The method is characterized in that a key link of a sound interchange blind box mechanism is established, a sound interchange command is actively sent out by a child, and diversity and mysterious of the friend doll world are explored.

1. Character voice switching command recognition, namely expressing sentences such as ' test for changing into the same or ' just like speaking ' by the child through natural voice, and analyzing ' friend name ' in the sentences as a unique binding main key by the system.

2. Doll identity confirmation prompts the system confirms and prompts specific available roles (such as ' sweet and sour balls) from a plurality of dolls registered by ' friends ' on a platform, and enhances blind box type exploration fun. If the name has ambiguity, the system actively prompts clarification and excites curiosity of the child on the friend's sound universe'.

3. And the system uses registered sound fingerprints of the target friend dolls to generate expression voice content for the child to listen. Children can listen to the sound versions of the friend dolls in sequence to perform transverse comparison and preference selection.

4. In the process of listening trial, children can put forward adjustment requests such as 'changing one more' and 'making the gurgling ball speak slowly a little', and the system supports personalized fine adjustment on rhythm, speed and intonation of the expressed content so as to ensure the optimal adaptation of the sound roles and the expressed content.

5. The voice interchange blind box mechanism is embodied in that the step of fusing a friend identity blind box mechanism and a voice interchange blind box mechanism realizes the following experience:

Children cannot predict which dolls and their naming styles a friend possesses;

each time the character sound is switched, the sound surprise is generated once;

the system invokes sound identities from the sound world of friends to complete dislocation expression, thereby bringing about 'appendage' -like expression tension and social pleasure;

All character exchanges are completed under the initiative of the child, and the system does not actively suggest or push.

The method is characterized in that the sound is made into a social probe, the perception of the child to the identity of friends, the naming of the roles and the style of the sound is enhanced through the conversion of the roles of the sound, the identity of the sound is enhanced, the social interaction principle line is explored, and the blind box mechanism of the sound social interaction is comprehensively realized.

Expressing task confirmation and dispatch setting

After completing the content generation, character sound selection and listening confirmation, the system enters the final confirmation and scheduling configuration stage of the expressive task. The final confirmation of the expression intention, the locking of the sound identity and the scheduling setting of the expression mode and time are completed by the leading of the children, so that the accuracy and individuation of the expression behavior are ensured.

In this step:

1. and confirming the expression content, namely presenting the expression content which completes the character sound synthesis in a trial listening mode by the system, and finally confirming by children. The child may choose to trigger the regeneration or go to the next step by selecting the natural language commands such as "just this", "alternate content", "repeat one pass".

2. And the sound identity locking step, namely, the child confirms that the character sound used by the current expression content is the final selected sound, and the system locks the sound fingerprint as the unique sound identity of the subsequent broadcasting, so that the binding consistency of the sound and the social character is ensured.

3. And receiving object confirmation, namely re-confirming to which friend doll the expression content is sent by the system, and ensuring accurate social graph path. All receiving targets are generated by the names of friends appointed by children, and the system does not actively push or recommend friends.

4. And (3) selecting a scheduling mode:

The immediate sending, wherein the expression content is immediately broadcasted by the target doll after confirmation;

the children can set natural semantic commands such as 'eight-point broadcast in open morning', 'play before lunch', and the like, and the system translates the commands into specific time stamps for delay scheduling;

The scene trigger sending is that the system supports to preset social trigger points (see theatre playing method in detail) such as playing when the opponent approaches to the doll and listening together when the opponent gathers, and a more complex linkage broadcasting mechanism is formed.

5. The time blind box mechanism is fused, namely if a child sets a fuzzy or open trigger instruction (such as ' early in tomorrow's day ' and ' she gets home to listen again '), the system automatically and fuzzy schedules the expression task in a reasonable time period, so that the specific broadcasting time is surprise to both the child and the receiver, and the time blind box mechanism is formed.

6. And (3) carrying out the expression task mounting, namely uniformly packaging the expression content, the target sound, the receiver and the trigger mechanism by the system, hanging the expression content, the target sound, the receiver and the trigger mechanism into a task list to be executed, and executing the broadcasting instruction after the trigger condition is met.

7. And a parental auditing mechanism interface, wherein if a platform sets parental auditing authority, the step is to provide a unique auditing access point for the system, the expressed content is required to be audited by a guardian or automatically passed through a mechanism (see an explanatory section for details) before the link, and the audited content can be scheduled and broadcasted.

The method comprises the steps that all expression tasks are confirmed by children before being executed, no default sending behavior exists, and the system supports various sending occasions and triggering conditions, so that the expression behavior has stronger personalized control force and interestingness, and expected experience of an open-blind box is formed in 'time uncertainty'.

Step six, a multi-role cooperative expression mechanism

After the content, sound character and broadcasting task are confirmed, if the child hopes that a plurality of dolls participate in the content, the system will enter a multi-role collaborative expression flow. The step supports multiple edge intelligent bodies such as dolls and the like to perform modes such as chorus, sing by turns, master-slave singing or theatre performance and the like when content is broadcast, and richer sound theatre experience is built.

1. Collaborative command recognition-the child vocally expresses commands such as "let small just and my say together", "you say me sentence by sentence", "you sing together", the system recognizes collaborative commands and extracts the multiple doll identities involved.

2. And the system performs semantic splitting on the complete sentence according to the expression content and the cooperative mode, and distributes the complete sentence to different character sounds to complete tasks such as master singing, antiphonal singing, chorus, dubbing, hosting and the like. For example:

Master-slave singing, namely, the master doll sings completely, and then other dolls repeat;

sing by turns sentence relay broadcasting among dolls;

chorus, in which a plurality of dolls play in an overlapping way by synchronous sounding;

Inserting a matching angle, namely inserting another character line into the main angle broadcasting report;

k song mode, in which the child is led to drive the doll to sing the complete song.

3. If the system supports theatre playing, children can trigger the multi-role cooperation scene through commands such as 'one-scene performance in tonight' and 'trial run of arranging roles', etc., the system randomly distributes role combinations, line structures or departure sequences, so that the performance result has blind box-type surprise.

4. Friend joint collaboration mechanism, namely when a plurality of children meet, the system can identify doll devices from different accounts and perform collaborative operation, so that cross-social collaboration expression such as 'I sing together with a small bright doll' is realized.

5. In the cooperative mode, children can realize instruction combination through continuous command, such as 'sing the sentence first and then changing the sound', 'starting the gurgling ball first and then chorus together', and the system recognizes the combination intention and combines the combination intention into a complete cooperative expression plan.

The design core of the step is that imagination of children to expression forms is fully released, and single expression behaviors are extended into multi-person, multi-role and multi-mode collaborative creation scenes. The collaborative expression not only improves the expressive force of the content, but also adds the sense of immersion, organization and cooperation of a theater blind box for social interaction.

Step seven, broadcasting execution and sound expression

When the expression task enters the execution stage, the system triggers the edge intelligent agents such as the target doll and the like to finish content broadcasting according to the mounted broadcasting task list. The step is the actual output link of the whole expression flow, and marks the complete closed loop from voice collection to voice broadcasting.

1. And the target wake-up and broadcast trigger is that the system automatically wakes up the target doll when the conditions are met according to the trigger modes (instant, timing, scene trigger and the like) set by the task, and the broadcast flow is started.

2. And (3) the sound expression and restoration, namely generating voice output by using the sound of the locked role in the step five, so as to ensure that the broadcasting content is completely consistent with the initial setting of the child, and expressing the creative intention and the social wish of the child.

3. The sound inheritance blind box mechanism is characterized in that if the child at the receiving end does not know who will broadcast the content clearly, the system can conceal the sound character information, and the receiving party can recognize the character identity through sound to form a 'sound inheritance' blind box experience, such as 'who is talking with me', 'this seems to be the cognition and emotion surprise of gurgling ball |'.

4. And after the broadcasting is completed, the system automatically generates feedback information and sends the feedback information to the original child account, such as 'successful delivery', 'the opposite party has heard the cheering', and closed loop confirmation of the expression behavior is established.

5. The interaction opportunity extends that the recipient doll can guide the other party to respond (such as "do not return a sentence.

6. The task blind box mechanism is expanded, namely, the system can add a sound task challenge or friend interaction suggestion generated by a platform after broadcasting content is finished, such as trial and error for gurgling a ball is.

The design key point of the method is to ensure that broadcasting is truly controllable, the process is perceivable and feedback is responsive, meanwhile, the sound uncertainty, the content explorability and the role cognition continuity are reserved, and the closed-loop experience from 'triggering' to 'landing' of social expression is realized.

2.3 Speech driven expression generation flow specification

1. Non-deterministic generation mechanism and content exploration experience design

The invention particularly introduces a non-deterministic content generation mechanism for enhancing the fun and exploratory experience of children in the expression process. The mechanism is based on a language generation model, combines light input information (such as keywords, phrases or prompts) and generates an output result which is reasonable in semantics but has differences in expression structure, plot trend and detail content, and specifically comprises the following steps:

Context adaptation (system style matching with historical expression segments);

plot enrichment (associating and expanding input semantics);

output diversity and randomness (the same input may correspond to different output versions).

The mechanism can effectively promote the unknown property and the surprise in the expression process, so that children can find different contents in multiple attempts, arouse the creation of interest and form expression exploration experience based on light prompt.

2. Natural language generation mechanism based on light input

In order to adapt to the differentiated level of the children on language expression capability, the invention supports a 'light input triggering' content generation path, namely, the children only need to speak a phrase, a keyword or daily spoken expressions (such as 'you play with your day', 'speak dinosaur bar'), the system can recognize the language intention of the children, and based on the light input, the language generation model is automatically invoked to generate the content output with complete structure, consistent semantics and rich emotion expression.

The mechanism reduces the expression threshold, so that children which do not have complete sentence expression capability can participate in character content creation, the use convenience and social expression participation degree are improved, and the man-machine interaction concept taking 'natural interaction' as a core is embodied.

3. Natural trigger mechanism for system non-dominant expression decision

To ensure that the initiative for expressing behavior is fully attributed to children, the system does not provide any form of content suggestion, character recommendation or expression guidance during the practice of the present invention. Namely:

The system does not actively prompt expressive content sentence patterns, character sounds or language options, does not conduct guidance through characters, voices, interface icons or animations, does not preset tasks or express intentions, and does not automatically start expression links based on context prediction. All expression generation related model calls (including language generation models, sound synthesis models and the like) need to be explicitly triggered by children through natural voice instructions.

The mechanism effectively avoids the guiding problem of replacing child expression of the platform, prevents the expression right from being interfered by system functional logic, ensures the spontaneity, autonomy and originality of child expression behaviors, and further establishes a core technical path taking the child voice command as a unique trigger source.

4. Trigger form and path specification of expression task chain

The method takes the voice command as the triggering mode of the unique expression task, and does not support the interaction path of button click, icon selection, recommendation guidance and other non-natural language forms. Namely:

all expression activities, including content generation, character voice switching, chorus collaboration, sing by turns control, etc., require the child to explicitly present voice commands;

After the system identifies the voice input and analyzes the semantics, the system starts the subsequent model calling, role broadcasting or collaborative expression flow;

If the child does not send out an instruction, the system keeps a silent monitoring state, and does not actively suggest or generate content.

The mechanism builds a natural interaction paradigm of language control, ensures that all starting points of expression activities are derived from the judgment of the intention of children, and prevents the system from actively inducing expression behaviors by a predictive model.

5. Procedure description for multi-model collaborative invocation

In the expression chain, the invention completes the complete flow from children language to role broadcasting through the model combination of voice instruction recognition, intention analysis, content generation, sound synthesis, broadcasting execution and the like. Wherein:

The language generation model is responsible for expanding the input phrases/keywords into structurally complete content;

The character sound synthesis model realizes sound personalized reconstruction according to the identity of the currently appointed character;

the broadcasting execution module completes voice output in the edge intelligent agent and records the broadcasting completion state;

The whole flow does not depend on central control active scheduling, and the voice intention of children is used as a flow control signal.

The mechanism supports the collaborative work of parallel models, such as that a plurality of edge agents complete respective sound synthesis and broadcasting execution in chorus/sing by turns tasks at the same time, and expression synchronism and role situation consistency are ensured.

6. Role chain expression

The child can trigger the cooperative expression of 'master singing + slave singing' through voice instructions, for example, commands such as 'gurgling ball master singing and bear slave singing', the system controls the corresponding two edge intelligent agents to finish content segmented broadcasting or sing by turns performance in sequence, all roles are divided, broadcasting rhythm and content control are initiated by the child naturally through voice, and the system does not default configuration or provide a preset scheme.

7. The language generation model (such as GPT, BERT, etc.) and the speech synthesis model (such as FASTSPEECH, TACOTRON, etc.) related in the invention are used as the prior art means, and the invention does not claim the algorithm structure, training method, network parameters or tuning mode for protecting the model body. The model is used as a callable module, and the technical innovation point is that the model cooperates with the calling mode and the expression control mechanism design of the trigger path.

3. Parental supervision mechanism and social feedback channel description

In order to reasonably embed the supervision capability of parents on the basis of guaranteeing the free expression right of the voice social contact of children and avoid the system interfering with the social contact intention of children, the invention particularly designs a limiting parental supervision mechanism and a non-invasive social contact feedback channel, and constructs a technical structure of moderate supervision, autonomous expression and natural social contact circulation. The mechanism is mainly characterized in that two key nodes are guided by natural feedback after the transmission of the expression task and before the verification and the broadcasting.

Parental content auditing mechanism (text content auditing before transmission link)

The invention only sets a unique supervision node before the transmission of the expression task, namely, after the generation of the content and the confirmation of the character sound are completed by the child, the system provides an optional parent auditing channel for the text part of the expression content before the task is mounted. The design characteristics are as follows:

1. The auditing scope is defined as "literal content":

Parents do not participate in the hearing test process of children, and do not intervene any sound role selection or hearing test links;

parents do not examine the sound content, all doll sounds are registered and recorded uniformly by a platform, and compliance, safety and stability are ensured;

the auditing object is the text content itself, so that the semantic compliance is ensured, and no improper expression is caused.

2. The auditing entrance mode is clear and single:

a parent receives an audit request through a dedicated mobile phone APP;

The auditing interface only displays the expressed text content, the target friend name (appointed by children), the current doll (role name) and the planned playing time;

The audio trial listening or the multi-version content is not displayed, so that the freedom of the sound selection of the children is ensured.

3. The auditing rights originate from doll rights:

only the doll currently bound to the parent account has content auditing rights;

The permission of auditing the content of the expression of other children or friend dolls is not provided;

the doll identity and the voice attribution are confirmed by the platform when registering and binding, and parents only bear supervision responsibility on the managed doll expression.

4. The auditing treatment strategy is simple and clear:

Parents can choose to "agree", "get back" (guide children to rewrite) or "ignore" (system default settings);

the system does not suggest default full auditing, encourages parents to trust setting according to the age and content style of children;

The audit logic is explicit and not expandable, and the system does not allow parents to enter the "voice audit" or "character control" link.

(II) the sound listening test and role selection process is not provided with a supervision interface

The invention highly pays attention to the free exploration right of children on sound expression, and therefore, the invention clearly stipulates the following:

the character sound listening step is led by children, and the system supports operations of listening, character changing, language changing speed, language changing tone and the like in multiple rounds;

Parents cannot interfere with the process and do not have the authority to view or replace trial listening;

all sounds come from the edge intelligent body roles registered and recorded by the platform, the sounds pass the platform authentication, and the sound content has no examination necessity;

in the process of selecting favorite character sounds, children form cognition and social exploration of friend characters, sound differences and naming interests, and the cognition and social exploration are one of core mechanisms of the invention, and the surprise and the free sense of the sounds are not destroyed due to supervision behaviors.

(III) succinct and natural social feedback channel (not dominant by the system)

In order to ensure that the voice social expression of the children forms an effective closed loop, the invention designs a non-forced and non-guided feedback mechanism and avoids the active override intervention of the system. The concrete embodiments are as follows:

1. The essence of the feedback mechanism is "next round of initiation of social behavior":

The child of the receiving party can freely select whether to respond or not;

if the response is made, a new round of expression task is formed;

The invention does not provide preset scripts or system suggestions, and fully delivers the self-decision of children.

2. The system does not actively suggest feedback behavior:

no systematic voice prompts such as "do not return a sentence;

not actively pushing a task challenge or an interactive suggestion;

And a platform setting task or an interaction instruction is not added in the feedback.

3. The feedback information is only conveyed and is not suggested:

The original sender children can know that the content is broadcasted through the prompt;

the system can display brief states such as 'delivered', 'play completed', and the like, and does not guide the next action;

all feedback is simply a receipt of a fact status, without any suggested or guided content embedded.

4. Feedback is the new social origin:

The invention considers that the real feedback should not be set by the system, but should be completed by the child through the re-expression behavior;

the system only needs to ensure that the feedback channel is unobstructed, and the real response is freely triggered among children;

therefore, a rolling type asynchronous expression chain is formed, so that the social relationship is continuously developed in natural circulation.

(IV) mechanism principle and design boundary description

In summary, in order to achieve the balance between expression autonomy and supervision, the present invention adheres to the following four principles in mechanism design:

First, the principle of sound freedom. The invention ensures that all the selection, trial listening and use related to the sound are completed by children independently, the system does not set a limiting path, and parents do not have access to the system. The platform ensures the legitimacy and safety of all edge agent sounds through unified record management, and ensures the sound content from the source without needing parental examination.

Secondly, the content audits define the principle. The parents only have the authority to examine the text content before the formal mounting of the expression task, and the authority is only suitable for dolls which are bound and managed by the parents, so that the parents cannot unauthorized examine the contents of other people and cannot interfere with the sound content. The rights are inextensible and not transferable, so that the children can be ensured to have complete sound expression rights in the expression process.

Again, feedback does not force the principle. The invention only provides basic report status receipt, and does not embed any system initiative advice, task pushing or interaction instruction. Whether to respond or not is determined by the children autonomously, and the system does not participate in judgment, recommendation or guidance, so that the platform is prevented from interfering with the social rhythm and the expression rhythm of the children.

Finally, the principle of closed-loop self-control is expressed. The starting point and the end point of each round of the expression flow are controlled by children, and the system is only used as an execution channel of the expression and does not bear the role of content design or social task generation. Even in the multi-round expression rolling, a platform task chain design is not arranged, so that the expression behavior is ensured to naturally circulate and freely grow under the dominant of children.

Through the four mechanism principles, the invention realizes the system design that parents can supervise, the sound is not unauthorized, the expression has closed loops and the feedback has elasticity on the basis of guaranteeing the free expression of children, and takes the safety, autonomy and social interest into consideration.

4. Model call mechanism and expression control path specification

The invention provides a child asynchronous voice social method supporting mutual simulation of character sounds and content re-expression, which takes 'sounds, namely characters and sound bearing moods' as a core technical idea, and constructs a complete expression technical path for expressing task links, multi-model collaborative call and character identity constraint control. The path consists of a plurality of model modules, but the core technical innovation of the path is not the algorithm structure of the model, but the cooperative calling mode of the model in the expression flow, the role voice identity binding mechanism and the system design logic for expressing the master control right.

Character identity function of (one) sound

In the expression system defined by the invention, sound is not only the physical waveform of voice output, but also the unique identification of character identity. Each 'doll and other edge intelligent body' binds its specific sound fingerprint when registering on the platform and bears the following character attributes in the whole expression process:

1. The character individuality is represented to form the basis of the cognitive character difference of children;

2. Transmitting changes in mood, rhythm and emotion (without changing the sound fingerprint);

3. the bearing expression identity is bound with friend names in the social graph to form a content routing path.

The binding mechanism ensures that the sounds can not be spliced or mixed at will, and prevents override expression or role confusion.

(II) model call path structure

In the execution process of the expression task, the method of the invention relates to the combined calling of the following model modules, and the sequence and the path are as follows:

1. Speech recognition model (ASR)

Inputting original voice collected for a doll;

outputting the text content for a downstream semantic model;

The module supports edge collection, cloud analysis and fuzzy recognition and children spoken language optimization.

2. Semantic understanding and intention recognition model (NLU)

Carrying out grammar analysis and semantic intention extraction on the recognition result;

extracting keywords, task types (such as message leaving, imitation, singing), and role calling commands;

Social anchors (e.g., friend names) are identified and mapped to unique IDs within the system.

3. Language generating model (LLM)

The method is used for completing, moisturizing, transferring and fairy taling the original expression;

generating an output sentence with rhythmicity and emotion adjustability;

A soft-touch heavy-feedback expression generation mode supporting a content blind box mechanism;

the model calling process is controlled and started by the voice of the child, and the system does not automatically generate the content.

4. Sound synthesis model (TTS)

Inputting the text content generated in the third step plus the determined character sound fingerprint;

The output is voice waveform;

The platform calls a special voice model template bound by the roles to generate expression audio;

Unauthorized character speech synthesis is not supported and custom synthesized sounds are not allowed.

5. Expression scheduling and scheduling module

Generating scheduling parameters (immediate, timing, scene trigger) in combination with semantic instructions;

mounting the expression task, completing character sound encapsulation, receiver confirmation and sending mode determination;

and supporting a parental auditing interface and an expression permission constraint interface.

The models are all universal modules, and the system can call different third party models to complete the tasks in a combined way.

(III) technical protection boundary declaration

For the purpose of defining the scope of protection, the invention is defined as follows:

1. protection of model bodies is not claimed

The invention does not relate to algorithms, structures, training methods, or tuning paths of language models (e.g., GPT, BERT, etc.) or speech models (e.g., FASTSPEECH, TACOTRON, etc.);

specific model parameters, network structures, fine tuning modes or frame implementation forms are not limited;

all models are considered as generic technology modules as callable resources in the expression path.

2. The technical focus of the claimed protection is the design of a model collaborative call mode and an expression control path, including but not limited to:

The combination paths of various models in the voice social task flow;

the expression control right is led by children and responded by a platform;

a synthetic generation path under a voice identity binding mechanism;

Content routing logic controlled by social graph;

The expression generation is not active, not recommended, not substituted, and the model driven by the child will call the system.

3. "Identity constraint call" emphasizing a sound synthesis model "

All TTS models must be used to bind character voice fingerprints authorized by the platform;

mixed output or free splicing is not allowed, so that the expressed sound is ensured to be consistent with the identity of the character;

this mechanism ensures the inventive idea of "sound cannot be counterfeited and character cannot be overridden".

(IV) model deployment and execution control principle

The invention supports the distributed deployment and flexible access of the model, and the platform can select cloud deployment or edge reasoning according to actual service:

the ASR and NLU module support edge lightweight deployment to shorten response delay;

LLM and TTS modules are generally deployed in the cloud and accessed through an interface calling mode;

All module calls are triggered under the guidance of a voice command of a child, and no system actively generates a path;

Parents have no authority to influence model selection, sound synthesis or role change, and only have auditing authority on part of the text contents (see the third part for details).

Through the model calling mechanism and the expression control path, the invention realizes the system expression logic of sound, namely roles, expressions, namely identities and models, namely cooperative execution bodies, establishes the voice social behavior basis by taking the voice of children as a unique control source and taking the voice fingerprint as a unique identity tag, and forms an expression system with originality and expression controllability on the premise of not depending on model algorithm innovation.

5. Technical effects, beneficial value and improvement description of the prior art

In order to highlight the technical progress and the use value of the invention in the children asynchronous voice social field, the present situation of method logic and industry is combined, and the beneficial effects, innovation contribution and comparison advantages with the prior art of the invention are comprehensively described as follows:

advantageous effects of the invention

The invention provides a children asynchronous voice social method supporting mutual simulation of character sounds and content re-expression, which has the following remarkable beneficial effects:

1. Role social mechanism for realizing unique expression identity by using sound

The invention defines the sound as the unique identity of the character, and binds the character personality and the expression permission through the sound fingerprint, so that the sound, namely the character, becomes the main axis of expression, thereby avoiding common problems such as character disorder, expression boundary crossing and the like.

2. Role voice interchange mechanism supporting friend relation

The system only allows doll character sounds added with friends to be simulated, accords with the actual social boundary, effectively prevents risks such as random sound variation and anonymous expression, and realizes controllable sound exchanging playing method based on social graphs.

3. Constructing content blind box type expression experience and exciting expression interest

Through the content re-expression mechanism of light sentence input and AI reconstruction, children can trigger a segment of fairy tale by one sentence to form forward circulation of expression, namely creation and creation, namely surprise, and improve language expression capability.

4. Sound character interchange brings expression tension and expressive force

The character sound change is led by children to form dislocation expression, "accessory sense" and sound cognition exploration, and the emotion intensity and performance interest of the expression content are endowed.

5. The whole process takes the voice of the children as the only control source, and keeps the expression free

All expression flows are triggered by the natural language command of the child, the system does not provide active advice, does not generate content and does not interfere with control, and the expression control right belongs to the child.

6. Introducing a limited parental auditing mechanism to ensure that compliance is expressed but not unauthorized

The parental auditing interface is only set up for the word content, and is only limited to the link before transmission and the binding of doll roles, thereby avoiding interference of sound selection and role exploration and ensuring that the child expression space is not compressed.

7. Closed loop link and natural rolling type social circulation for realizing expression behavior

From the generation of the expression, the sound determination and the broadcasting execution to the broadcasting feedback, a complete link is formed, the spontaneous response of the receiver is supported, the next round of expression task is formed, and the continuous expansion of the asynchronous social connection is promoted.

(II) technical contribution and innovation path of the invention

Compared with the traditional voice synthesis, children dialogue robots, intelligent sound boxes and other voice interaction products, the invention has the following originality in technical architecture and expression control logic:

1. expression paradigm of presenting "sound, identity

The method has the advantages that the voice is lifted from the output tool to be the expression main body, the identity of the character is bound by the voice and cannot be changed, all the voice is registered and authorized uniformly by the platform, and identity dislocation caused by free voice change is avoided.

2. Construction of expression control path based on model combination

The whole expression process is completed by cooperatively calling the voice recognition, semantic understanding, language generation and voice synthesis models by taking the voice of the child as a unique control source without claiming the voice model body, and the path realizes the complete expression logic of 'expression perception + control right verification + content control'.

3. Social expression power excited by designing multiple blind box mechanism

The playing mechanisms such as a sound blind box, a content blind box, a time blind box, a theater blind box and the like are initiated, and the cognitive development and interest mechanisms of children are combined, so that the expression is not only purposeful, but also exploratory and game.

4. Implementing deep binding of expression links and social graph

All expressions take 'friend names' as anchor points, the sound can only imitate friends added by the user, the control path is embedded into the social graph, the system is prevented from becoming any content/sound synthesizer, and the nature of the system is ensured to be a social system rather than a voice platform.

5. The platform is only used as an execution channel and does not actively interfere with expressing content

All the expression contents are actively initiated by children, the platform does not provide emotion labels, does not inject style templates and does not generate suggested contents, and language autonomy of the children is guaranteed to the greatest extent.

(III) comparison analysis with the prior art

Compared with the prior art, the invention has obvious advantages in structural design, expression mechanism and control logic:

1. Different from traditional speech synthesis system

The common TTS system only realizes voice output, does not have role identity binding, role voice registration or social relation constraint functions, and the expression content often lacks an identity directivity and interaction feedback structure.

2. Different from intelligent sound box and voice assistant products

The intelligent sound box is mainly used for executing commands, the interaction path is 'one-to-one' or 'passive wake-up', an expression generation chain cannot be formed, and a multi-role cooperation or blind box exploration mechanism is not supported.

3. Different from children's sound-changing toy or sound-changing device products

The traditional sound-changing product is based on an audio frequency transformation algorithm, does not express content generation capability, does not have sound simulation rules based on social relations, and cannot form controllable identity and role paths.

4. Is different from a social voice platform

Conventional voice social products (such as message boards and voice chat rooms) lack of a role-driven mechanism, voice does not have registered identities, content does not have expression control paths, and out-of-range imitation and content interference problems are easy to occur.

5. The invention constructs a three-in-one path of expression, namely role, sound, identity, content and social relationship for the first time, realizes a full-flow closed loop from the acquisition of expression intention to sound role broadcasting, and constrains the expression behavior to the real binding of the voice leading and social relationship of children, thus constructing a safe, controllable and extensible voice social system.

Drawings

FIG. 1 is a block diagram of a system architecture of the present invention;

FIG. 2 is a flow chart of the expression generation according to the present invention;

FIG. 3 is a schematic diagram of a character sound exchange mechanism according to the present invention;

FIG. 4 is a logic diagram of a parental audit flow chart;

FIG. 5 is a diagram of a model collaborative call architecture.

Detailed Description

In order to make the objects, technical solutions and advantageous effects of the present invention clearer and more specific, the present invention will be further described in detail below with reference to the accompanying drawings and examples. The invention is not to be limited to the specific embodiments described below, but is to be accorded the full scope of the claims.

Embodiment one, identity exploration mechanism (friend identity blind box)

In this embodiment, when a child initiates an expression through a voice command, the system only allows the registered sound of the edge agent such as the doll with friends to be called as an expression carrier. The mechanism establishes a role calling boundary based on the real social graph, and guarantees the coexistence of security and exploration interests.

Example story:

A six year old child speaks of a purring ball conversation with his doll edge agent at home, attempting to change a sound to speak his own puzzle. He said that "test with millet. The system recognizes the millet as an explicit friend and confirms that a plurality of registered character sounds exist under the friend account, but the specific name or appearance of the doll character owned by the millet is not explicitly known.

The system then responds: "do' rolling bear without millet" at this time, it is stated that the first listening to "rolling bear" is a doll name of millet, surprise and curiosity, and then answer "good, about this?

The system then announces the original intended puzzle "come around me's eyes at night," sound announced by the registered character of "rolling bear" during the day until me's eyes are closed, "the sound style is silly and heavy with a slight wheezing sound, making the puzzle appear mystery and funny. The Ming-Ming haha laugh is not lived in saying that the bear of the millet is good and playful, and the other is replaced by- "

At this time, it is stated that "another sound test for changing millet" is described again. The system prompts that 'Miaomiao star' sound of millet can be used according to social relationship and registration authority. The method is used for completing the exploration of the character blind box again and again.

Mechanism description:

In the playing method, the child cannot predict all character information owned by the friends, and the platform does not directly display the character list of the friends, but guides the child to gradually discover the sound universe of the friends through blind box exploring interaction of character test. Each invocation is not only the completion of one expression behavior, but also the process of identity discovery in a social network.

The platform strictly limits the calling range of the character sound to the edge agent registration sound of the added friends, and ensures legal and controllable sound identity. In other words, even if other children are explicitly known to possess certain interesting sound roles, the system does not provide a call portal until no friends relationship is established.

The mechanism strengthens the core concept that sound is identity and sound is limited by social relations, so that voice social contact presents natural structural boundaries and exploratory fun in a voice expression platform of children, and a social blind box system taking sound as a clue is constructed.

Second embodiment, character sound simulation mechanism (Sound interchange blind box)

The embodiment describes a playing mechanism based on 'character sound appendage' logic, namely, a child can enable an intelligent body at the edge of a doll to express own content by using registered sound of a doll of another person through a voice command, but the 'sound interchange' is only limited to the character added with friends in a social graph, the appearance of a physical doll is not changed, and character simulation is carried out only by sound.

Example story:

A child of seven years old has a doll edge agent called a "bean bag" that interacts with her, usually in the tone of an primordial girl. But on this day, a light rain would like to say a miscreant with a very contrasting feel, so she would say "steamed bean curd" that "I would like to say this with the sound of butyl. "

Butyl is one of her friends and has a doll called "click tiger" with a registered sound of a sandy and sunk northeast accent. The system recognizes that butyl is in the buddy list and the registered sound "click tiger" prompts "do you speak with the butyl's" click tiger' sound?

The small rain smiles to get on, the 'right for the' is the 'she then speaks a smiling voice, the' one day is the time the duck goes out, the police officer reports the fact, the police officer calls out what.

The system immediately calls the registered sound of the 'click tiger', and expresses the content in the form of the doll sound of the Ding, so that strong language contrast is formed. The entire joke becomes powerful with a moldy name, and a light rain sounds a haha laugh and is silent that "I trade another person to speak once-"

In the process, the rain does not control the doll of the butyl, but the sound of the friend character is additionally used on the edge intelligent agent 'bean bag' of the doll through a voice command to complete one-time content expression. After the expression is completed, the system only briefly prompts "delivered" and does not direct whether to respond or continue operation.

Mechanism description:

the voice interchange mechanism allows children to temporarily call the voice of the character registered by friends of the children to express contents, so that the playing experience of 'I speak with your voice' is formed. Sound becomes the carrier of the expression and is also an important symbol of character personality.

However, the system sets a strict boundary, which only allows calling the voice of the registered role of the 'added friend', and does not call the roles of friends or strangers of friends, so as to avoid out-of-range simulation or infringement of others to express personality.

Sound interchange does not involve transfer of physical ownership or change of appearance, and the platform only allows "sound" to dynamically attach to friends. The playing method enables interaction among children to have a certain creation color of simulating-masquerading-role switching, and excites more complex expression imagination and social tension.

Through sound exchange, children not only borrow sound, but also perform character pronouncing, language reproduction and expression deduction once, and the game is an expression identity game with sound as a core.

Embodiment three, non-deterministic expression Generation mechanism (content Blind Box)

The embodiment describes a "light prompt + system generation" expression in which the child only needs to provide a keyword, phrase, verbal exclamation, or segment language, and the system generates a complete, interesting, and storybook of the expression based on the current context-invoking language generation model. Because the generated result has uncertainty, each expression is like opening a 'content blind box', and has exploratory and surprise feelings.

Example story:

The eight years old music is sitting on the sofa, and the toy bear doll 'snore' is said to 'help me say a poet about rainbow'. "

The system recognizes that the rainbow is a keyword, then automatically invokes a language generation model, and combines the age preference and past expression style of the music to quickly generate content:

"rainbow hide behind the raindrop, silently draw a bridge, walk a cat on the bridge, carry stars and bubble. "

This piece of content is then read out by the sound of "snoring". The eyes are bright after music hearing, and the words are that the words are different. "

The system receives a new instruction and regenerates a section:

The rainbow is not a bridge, and is a laughing fish, which walks across the sky, poking the cloud with a hole. "

Music Le Da laughs, say, "this me likes like me draws. "

Then he shoutes that the sound theory of changing into small watermelons | the system recognizes "watermelons" as friends thereof, has the role of "violent turnips", and the registered sound style is lively and exaggerated. The system prompts that after the music confirmation of 'the radish is violently walked' or not by using the watermelon, the expression content generated by the system can be read again, but the sound style and the expression emotion are changed at the time, and the fun is multiplied.

Mechanism description:

In the mechanism, children do not need to completely think about the expressed content, and only need to say light prompts such as 'telling a story', 'helping me say a sentence of dream', 'poem about rainbow', and the like, and the system completes 'expression completion' through a language generation model.

The invoked model needs to have the following characteristics:

Context adaptation, namely, the current content environment can be understood;

The same prompt can generate different contents;

expressing integrity, namely generating content to form sentences in logic and language;

Emotion fitting capability, namely generating text of the fit language by combining character sounds.

The mechanism is particularly friendly to children with weaker expression ability or divergent imagination, effectively reduces the expression threshold and excites the expression motivation. Because the generated result has a certain 'nondeterminacy', each generation has a 'blind box' effect, so that the children keep exploring interests.

Meanwhile, the mechanism is always triggered by the leading of children, the system can not actively push content suggestions, the interference on the expression intention of the children is avoided, and a cooperative mechanism of 'expression autonomy and light inspiring generation' is embodied.

Fourth embodiment, multi-role cooperative expression mechanism (theatre blind box)

The embodiment describes a play mechanism supporting a plurality of edge agents such as dolls to participate in collaborative expression. The children trigger a 'multi-person theater' mode through natural voice commands, and the system arranges different roles to sound in turn according to the commands, so that a scene-character interactive voice theater is constructed. The children can not only appoint the character lines, but also automatically generate the character line content through the light prompt triggering system, thereby forming a theatre blind box playing method with 'theatre impromptu' and 'sound combined action'.

Example story:

an Ann nine years old invites her two friends to play a "character act in a play" with fat beans. Three people respectively have bound edge intelligent doll, namely 'little flying elephant', 'clever monkey' and 'fat pig'. Ann gives instructions by voice:

"we show the story of forest kingdom, little flying elephant is king, clever monkey is big minister, fat pig is the tamped egg ghost. "

The system takes part in deduction of the registered sounds of the three dolls in sequence according to the role setting, and prompts:

theatre starting, namely, the roles are king, minister, and tamponade respectively. Is automatic generation of a scenario enabled?

The Ann answer is "partially automatic, I come to the beginning first. "

She pronounces that "forest kingdom is doing something today, and the king wakes up to find imperial crown. "

The system recognizes the sentence as the beginning of the script, then automatically generates the continuing speech of the 'clever monkey', and the following speech is carried out by the sound of the 'clever monkey':

"Your Majesty Your Majesty, I just see that a fat pig is slipping into the treasury, and also hold a pile of flashing things to run out-"

The sound of the "fat pig" then pops out "humming, i have examined if the precious stone has faded, carelessly fallen your imperial crown.

Three characters say I am, and the system automatically generates subsequent plot contents according to character setting and plot progression, but always leaves intervals for security and her friends to insert rewrite instructions at any time.

For example, the peace shouts suddenly that "the small flying image becomes very lively, the fat piglets are educated with the most sound-"

The system immediately switches "anger mood template" and speaks with the registered sound of "little flying image"?

The story continues to advance, and the end is determined by Anan, namely, the last fat pig apostrophe and a poem is written and sent to the King. "

The poems generated by the system are expressed by the sound of 'fat pigs', so that a complete and impulse element-filled children theatre is formed.

Mechanism description:

The mechanism supports the following functional features:

multiple edge agents are cooperatively expressed by multi-role parallel binding;

Children control the setting of roles, the sequence of departure and the trend of plots;

the system calls a language generation model as required to provide scenario assistance;

The voice of each character keeps identity consistent and is never mixed or replaced;

the theatre mode supports scenario breakpoint resume, scenario reconstruction and role switching.

The theatre interaction mechanism not only strengthens the collaborative expression capability of children, but also forms a highly fused expression scene of sound, scenario and social relation. The children obtain strong creativity sense, participation sense and social existence sense in the process of controlling roles, selecting sounds and developing scenario.

Fifth embodiment, time delay trigger mechanism (time blind box)

The embodiment relates to a social playing mechanism combining time setting and expression delay triggering. The children set the expression time for the edge intelligent bodies such as the dolls through voice commands, and the system automatically wakes up the target dolls to sound at the appointed time, so that the interactive effect of a time-blind box like 'delayed surprise', 'timed chorus', 'reserved story' is realized.

Example story:

The morning of ten years gives a voice command to two dolls of her on friday evening, namely 'small zebra, eight times of day, whole singing of happy birthday song, small fox, eight times of day, half of talk and a section of blessing words written by me'. "

This is the surprise of the birthday she prepared for the ready friends to please. Two doll devices are also arranged in the Yue home, the 'friend identities' in the morning are respectively bound, and the system wakes up and broadcasts the two pieces of information at the appointed time of the Yue birthday day through background setting.

The next morning, eight times, and the small zebra sings in a pleasing room with active sound, namely 'wish you happy birthday to'

Eight and a half, the small fox speaks in a sensitive tone with dots, "I want you to go happy every day, like we play together that day happy | morning. "

The happy listening is very good, and the small fox is immediately said to say "help me return to a section of speech, tell me super-feeling in the morning-"

The echo enters another asynchronous expression flow (timing play can also be set), and new social interaction is about to start.

Mechanism description:

The time blind box mechanism has the following characteristics:

The timing expression setting, namely, the target time of expression can be set by children through voice commands;

The system triggers the playing according to time, and does not play any content before setting;

the reserved content is various, and can be used for blessing, chorus, timing reciting, holiday greeting and the like;

Sound identity locking, expressing registered sound of the doll of the initiator always used;

The target child cannot know the expression content in advance, and has a time blind box attribute.

The system is not provided with any preview in advance or ' countdown reminding ' mechanism, so that the surprise sense is ensured to be expressed, and meanwhile, long-term reservation (such as ' talking to a teacher doll in the next month ') or cyclic reservation (such as ' talking about ' having meal ' before every noon every day) are supported.

The mechanism strengthens the expressed ceremony sense, makes time become a bridge for emotion transmission, further extends the dimension and persistence of sound social contact, and also embodies the flexible control capability of the invention on an asynchronous expression mechanism.

Embodiment six Emotion Label activation mechanism (Emotion Blind Box)

This embodiment proposes an expression mechanism for the active designation of "emotional style" by children. Through the voice command, the child can set the expression emotion for the content to be broadcasted, for example, "happy words", "lively shout", "difficult to read out", "sham to impersonate the Mimosion". The system adjusts parameters such as mood, rhythm, intonation and the like of the sound synthesis model accordingly, and completes the sound expression with emotion rendering.

The mechanism does not support the system to automatically recognize or infer the emotional intent of the child, and the child must explicitly send out emotional instructions in natural language to ensure that the expression intent is completely dominated by the child.

Example story:

the seven years old receives a voice story response of the friend melon more that the "yesterday you say the extra star dog, too funny, I also composes a-"

Many want to respond to a somewhat "frightened" and "fun" content, so say her doll "hedgehog":

"start speaking with frightened mood and then laugh very happily afterwards". "

The system analyzes the two emotion paragraphs, and performs cooperative scheduling on the language generation and the sound synthesis model.

When the report is broadcast, the sound of the "small hedgehog" is first a low terrorist saying that "you are yesterday but the exotic dog?

Then immediately turning into light, fast and refreshing laughing sound, namely' Haha, which is woven randomly, is afraid of cheering- "

After the melon is heard, the melon laughs and gives out sound, and instructs her doll to simulate the language of "Weichuba". Fuafi afraid of not daring to brush teeth at night "

Two persons trigger a round of character exchange games around scaring and fun, interact back and forth for a plurality of times, and continuously enrich expression modes and mood levels.

Mechanism description:

the emotion blind box mechanism has the following technical characteristics:

The voice command sets emotion such as "speak with excited sound", "speak with little sound", etc.;

the emotion rendering is automatically completed by the system, and the speech synthesis output is controlled based on the emotion label;

automatic emotion reasoning is not supported, namely the system does not judge emotion automatically based on text or context;

multiple emotion can be connected in series, wherein the same content can comprise multiple emotion switching controls;

the voice identity is not changed by the change of the mood, and the voice fingerprint characteristics of the original doll are still used.

The mechanism enriches the emotion dimension of expression, enables children to advance from 'content expression' to 'emotion expression', and promotes emotion cognition and co-emotion capability development. Meanwhile, the interactive interestingness and dramatic tension are enhanced by setting a blind box in the way of emotion spirit.

Embodiment seven, autonomous expression task mechanism (task Blind Box)

The present embodiment proposes a mechanism in which a child sets a lightweight task instruction by voice and a receiver completes a response in an asynchronous manner. The task content is freely set by the initiating child, the system is not provided with a fixed template, the preset task is not pushed, the task reminding is not carried out, and only the intermediary role of content transmission is born.

The mechanism emphasizes the interestingness, non-compulsory and social interaction of the task, the system does not record whether the task is completed or not, evaluation and feedback guidance are not performed, and all response behaviors are autonomously determined and triggered by children of the receiving party.

Example story:

In a doll dialogue, an eight-year-old star speaks "xiaoling" to the doll of a friend fruit that you cannot help me design a foreign language. "

The task instruction is sent in a voice form, after the system presents the content of the task instruction to parents of fruits in a text mode to pass the verification, the ' xiaoling ' broadcasts that ' the star lets you invent a foreign language, and then uses the star to tell a small story- "

The hearing of fruits is interesting, five foreign words (such as 'corolla' for 'hello') are compiled by the user at night, and the user speaks a story with foreign language by using the doll 'small jumping frog'.

The next morning, the stars "xiaoling" plays the response of the fruit before the alarm sounds:

"Gola Star" I am ' Pu ' frog prince of Lappa star ', we were harvesting candy in the moon.

After stars are heard, surprise that' too cool?

Two persons enter a round of task relay, even invite other friend dolls to join in the creation of the interstar alliance, and build the foreign star civilization story universe together.

Mechanism description:

the task blind box mechanism has the following core characteristics:

The task content is freely created, namely, the voice of the child is directly set, and the form and the difficulty are not limited;

asynchronous response mechanism, the receiver can decide whether and when to respond at any time;

No system evaluation and tracking, namely, a platform is not provided with a completion mark, a progress prompt and a task priority;

one task can excite multiple rounds of free expression;

Parental review is still limited to text content and does not involve audio presentation.

The mechanism converts the expression behavior into the interaction task, and forms a light social driven game structure, so that children develop, create, communicate and explore under self driving. Because the system never enforces tasks or pushes responses, each response appears more proactive and surprise, constituting a typical "social blind box" interactive experience.

Embodiment eight, role Sound inheritance mechanism (Sound Blind Box)

This embodiment proposes an expression mechanism for a child to specify "speak again with a doll's voice" or "simulate a friend doll speaking style". The system supports the conversion of the expressed character sound in the original content into the registered sound of the appointed doll for deduction again, and realizes the inheritance re-expression of the 'sound identity'.

In the mechanism, the inheritance is not abstract emotion, intonation style or tone, but the platform registers the voice fingerprint of edge intelligent bodies such as a doll on a table, namely the identity voice of the character, and the mechanism has uniqueness and traceability. The system does not provide fuzzy dummies or free sound changes, but only supports formal inheritance and transfer between registered character sounds.

Example story:

nine years of age, expressed in evening, let her doll "small lion" tell a story:

"on playgrounds today, I pretend themselves to be time police, and the little penalty station 5 seconds, he smiles over. "

The next day, after listening to the event, the friend's music speaks her doll "snowman" with a voice command:

"speak again with the sound of a small lion-"

The system identifies the 'small lion' as the doll of the Qiqi family, and legally calls the sound fingerprint of the 'small lion' on the premise that the music is added with Qiqi as a friend, and completes 'sound inherited teaching' on the 'snowman'.

Thus, the snowman speaks with the sound of the small lion that I'm is the small lion, I'm is the time police-who does not brush the teeth, penalizes the station for 3 seconds- "

This sound "inherits the teaching" let Qiqi hear a haha laugh, "how you will use the sound of a small lion to cheer," happy Le Huiying, "I play is ' identity inherit ', I are now your small lion | '

They can then agree that each other "inherit once" each day, telling new stories in turn, expressing their own brain holes with the sound of the opponent's doll.

Mechanism description:

The voice inheritance blind box mechanism comprises the following core elements:

The system verifies the social relationship and the sound calling authority;

The voice is an identity unit, the registered voice has unique fingerprints and cannot be customized or imitated;

inheriting requires the child to issue explicit voice instructions such as "speak again with a certain sound";

the system does not generate a sound changing model, and all sound expressions call the existing role registration data;

the presentation content may be different, not only to reiterate, but also to support the presentation of new content with the sound.

The mechanism gives inheritance and character extensibility to 'sound', so that children can exchange identities among friends, share character view angles, deduct cross expression scenario, and further enrich the technical implementation boundary of 'sound is identity'. The blind box is characterized in that a child cannot predict what interesting contrast is generated by telling stories by using the roles of friends before inheriting, so that an exploratory expression experience is formed.

Embodiment nine, composite expression trigger mechanism (Combined blind box)

The embodiment provides a composite expression mechanism which is characterized in that a plurality of expression instructions are continuously sent by natural voice of children, and a system automatically analyzes and sequentially executes the expression instructions. The mechanism supports multi-instruction combinations such as sound changing, language changing, delay broadcasting, specified object, imitation speaking, and the like, the spoken language expression is freely organized by children, specific grammar or format is not relied on, and the system automatically analyzes semantic intention through a semantic understanding module and executes the semantic intention step by step.

The system does not provide selectable templates, nor limits the combination mode, allows children to creatively concatenate multiple expression intents to form a three-dimensional, multi-dimensional expression task.

Example story:

The seven years old people want to let her doll "cloth bear" speak a sentence of fuzzing to her oneself at eight evening with the sound of the doll "ancient cat" of good friend bean, and also take the loving emotion. She then issues a series of instructions:

"eight evening, with the sound of ancient cat, loving a little, say' how much you are most excellent with me, evening also has spirit-"

The system identifies after semantic analysis:

Broadcasting time is eight points at night;

the sound is 'ancient cat' of friend bean;

Mood requirements, a delicacy;

content text, namely, most you are the most excellent, and the evening also has spirit;

How many (self) objects are broadcast;

The execution equipment is 'cloth bear' (local doll);

The system confirms that the beans are added friends, has the sound authority of calling ancient cat, and has reasonable time setting, and parents can check and pass after voice contents are converted into characters.

Thus, eight points at night are reached, the 'cloth bear' is loved with the registration sound of the 'ancient cat', the 'most you are the most excellent' of the 'light voice, and the' evening has spirit-purring and purring 'of the' evening;

The Duoduoder is jumped up by holding the cloth bear as if the ancient cat appendage is | is too good to play |'

Mechanism description:

the combined command blind box mechanism has the following key characteristics:

The children natural language can express multiple intentions without learning instruction formats;

The intelligent analysis and the sequential execution of the system ensure that the multitasking does not conflict;

the combination type and sequence of instructions are not limited, and the expression path is determined by children;

not actively recommending a combination option, namely not providing a template and not presetting an operation frame;

the combination content has blind box characteristics, and the result is unknown according to the difference of the combination modes.

The mechanism reflects the great release of the degree of freedom of the voice expression of children, makes the expression behavior change from linear to structural, changes from single situation to multidimensional expression, and strengthens the core path of 'system non-dominant expression'. Each combination is self-design of the expression path of children, has creativity and explorability, and is an advanced embodiment of the concept of sound, namely identity and autonomous expression driving.

Claims

1. A child-centered asynchronous voice social interaction method that supports mutual simulation of character voices and re-expression of content, comprising the following steps:

1) Children use voice commands to make expressive requests to edge intelligence agents such as dolls;

2) The edge agent collects voice signals and uploads them to the cloud platform;

3) The platform identifies instructions, extracts keywords, or expresses intent;

4) Invoke the language generation model to generate structured content;

5) Based on the voice of the specified character, call the voice synthesis model to generate the broadcast speech;

6) Send the voice content to the target edge agent for broadcast;

7) After completing the broadcast task, the system returns a delivery status and does not guide subsequent actions;

In the method described, all expressive behaviors and model calls are triggered by children's natural speech commands, and the platform does not actively provide content suggestions, role recommendations, or expression guidance.

2. The method according to claim 1, wherein the expression request supports chorus control logic, and the child can designate a certain doll as the lead singer and other dolls as followers. After the lead singer announces, the followers take turns to announce, thereby achieving a multi-role collaborative chorus effect.

3. The method according to claim 1, wherein when the first doll performs an expression task, it can call the voice of the second doll, which has been added as a friend, to broadcast, and the voice of the doll is registered and filed on the platform and can only be authorized for use in the social graph, and does not support simulating the voice of a non-friend character or a real person.

4. The method according to claim 1, wherein parents only have the authority to review the text content, and the review only applies to the "pre-send" process. All control over the selection, listening, and expression of voices is entirely in the hands of the child, and the system does not provide an interface for parents to review voices.

5. The method according to claim 1, wherein the language generation model supports light input triggering, and after a child inputs keywords, phrases or colloquial expressions, the system automatically generates structurally and semantically complete expression content.

6. The method according to claim 1, wherein the language generation model has nondeterministic generation capability, can output multiple different versions under the same input, and has plot expansion, context adaptation and expression randomness, which is used to enhance the interest and exploration of the expressed content.

7. The method according to claim 1, wherein the language generation model and the voice synthesis model are called in a series manner, the call chain is triggered by the child's natural speech command, and the system does not have the function of pre-calling, predictive generation or default loading of expressive content.

8. The method according to claim 1, characterized in that the system does not provide prompting feedback buttons or guiding voice after the expression is completed, the feedback mechanism is not actively activated by the platform, the child can decide whether to initiate a response, and the response behavior can trigger a new round of social interaction.

9. The method according to claim 1, wherein the platform does not preset expression templates, content sentence structures or task goals, does not push expression suggestions through interfaces, voice, animations or other means, and the expression path is entirely based on the child's free will control, preventing the platform from dominating the expression behavior.