US20220394085A1 - Network connection method and device for training participant end of common training model - Google Patents
Network connection method and device for training participant end of common training model Download PDFInfo
- Publication number
- US20220394085A1 US20220394085A1 US17/886,771 US202217886771A US2022394085A1 US 20220394085 A1 US20220394085 A1 US 20220394085A1 US 202217886771 A US202217886771 A US 202217886771A US 2022394085 A1 US2022394085 A1 US 2022394085A1
- Authority
- US
- United States
- Prior art keywords
- worker
- state information
- communication state
- communication
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 156
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000004891 communication Methods 0.000 claims abstract description 267
- 230000004044 response Effects 0.000 claims abstract description 59
- 230000008569 process Effects 0.000 claims description 31
- 238000012790 confirmation Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000011084 recovery Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/044—Network management architectures or arrangements comprising hierarchical management structures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
- H04L41/0661—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
-
- H04L41/0672—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
- H04L67/1046—Joining mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/142—Managing session states for stateless protocols; Signalling session states; State transitions; Keeping-state mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- the embodiments of the present disclosure relate to the field of computer technologies, and in particular to a network connection method and a network connection apparatus for a training participant of a joint training model.
- the Master-Worker mode is generally used, in which the Master is responsible for receiving and distributing tasks (such as training tasks), and the Worker is responsible for processing subtasks.
- a related approach is usually to set a service recovery point in advance, and perform data recovery from the recovery point in the event of a failure.
- a network connection method and a network connection apparatus for a training participant of a joint training model are provided according to the embodiments of the present disclosure.
- a network connection method for a training participant of a joint training model is provided according to an embodiment of the present disclosure.
- the training participant operates in a master-worker mode, and the method includes: acquiring communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and resetting, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- the method further includes generating, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in.
- the resetting, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in includes: resetting the communication connection phase of the worker to a connection establishment phase before the communication connection phase, in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase.
- a network connection apparatus for a training participant of a joint training model is provided according to an embodiment of the present disclosure.
- the training participant operates in a master-slave mode, and the apparatus includes: a first acquisition unit, configured to acquire communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; a second acquisition unit, configured to acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and a reset unit, configured to reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- the apparatus further includes a generation unit.
- the generation unit is configured to generate, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in.
- the reset unit is further configured to: reset the communication connection phase of the worker to a connection establishment phase before the communication connection phase, in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase.
- a network connection system for a training participant of a joint training model includes: a worker, configured to acquire local communication state information of the worker, where the communication state information indicates a communication connection phase that the worker is in; acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; terminate the process in response to determining the target communication state information does not match the local communication state information of the worker; and set the communication connection phase that the worker is in to a preset phase in response to reception of information instructing restart sent by a master corresponding to the worker, and update the local communication state information of the worker; and the master, configured to send, in response to determining that there is a worker that actively terminated the process, the information instructing restart to the worker that actively terminated the process.
- the system further includes: a parameter server, configured to generate, in response to detecting an operational failure of a parameter server, failure prompt information indicating a failure of the parameter server; and the master is further configured to send a communication termination request to a master of another training participant of the joint training model in response to detection of presence of the failure prompt information indicating the failure of the parameter server; and disconnect a communication connection corresponding to the communication termination request in response to reception of a confirmation information corresponding to the communication termination request.
- a parameter server configured to generate, in response to detecting an operational failure of a parameter server, failure prompt information indicating a failure of the parameter server
- the master is further configured to send a communication termination request to a master of another training participant of the joint training model in response to detection of presence of the failure prompt information indicating the failure of the parameter server; and disconnect a communication connection corresponding to the communication termination request in response to reception of a confirmation information corresponding to the communication termination request.
- the communication termination request is further used to instruct to stop the training process of the joint training model; and the master is further configured to: restore the training process of the joint training model from a target checkpoint in response to reception of the confirmation information corresponding to the communication termination request.
- a server in a fourth aspect, includes: one or more processors; and a storage device storing one or more programs.
- the one or more programs when executed by the one or more processors, cause the one or more processors to implement the method according to any one of the implementations of the first aspect.
- a computer-readable medium having a computer program stored thereon is provided according to an embodiment of the present disclosure.
- the program when executed by a processor, implements the method according to any one of the implementations of the first aspect.
- a network connection method and a network connection apparatus for a training participant of a joint training model are provided according to the embodiments of the present disclosure.
- the training participant operates in a master-worker mode.
- the method includes acquiring communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; and resetting the communication connection phase that the worker is in in response to determining that the target communication state information does not match the communication state information of the worker. In this way, the data loss caused by the network failure can be reduced to the greatest extent.
- FIG. 1 is a schematic diagram illustrating an exemplary system architecture to which an embodiment of the present disclosure is applicable;
- FIG. 2 is a flowchart of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of an application scenario of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure
- FIG. 4 is a schematic diagram of a network connection apparatus for a training participant of a joint training model according to an embodiment of the present disclosure
- FIG. 5 is a sequence diagram of interactions between various devices in a network connection system for a training participant of a joint training model according to an embodiment of the present disclosure.
- FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure.
- FIG. 1 illustrates an exemplary architecture 100 to which a network connection method for a training participant of a joint training model or a network connection apparatus for a training participant of a joint training model of the present disclosure is applicable.
- the system architecture 100 may include server clusters 101 and 102 , and a network 103 .
- the network 103 is a medium for providing a communication link between the server clusters 101 and 102 .
- the network 103 may include various types of connection, such as wired connection, wireless communication links, fiber-optics, cables, or the like.
- the server clusters 101 and 102 may be servers that provide various services, such as servers for training models having a distributed or federated learning framework.
- the server cluster 101 may include a master 1011 and workers 1012 and 1013 .
- the server cluster 102 may include a master 1021 and workers 1022 and 1023 .
- the server clusters 101 and 102 may function as different participants in federated learning to jointly train a model using their respective training samples.
- the server may be hardware or software.
- the server may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; and in a case that the server is software, the server may be implemented as multiple software or software modules (for example, software or software modules for providing distributed services), or may be implemented as a single software or software module, which is not limited in the present disclosure.
- the network connection method for a training participant of a joint training model is generally performed by the workers (for example, the servers 1012 and 1013 or the servers 1022 and 1023 ).
- the network connection apparatus for a training participant of a joint training model is generally provided in the workers (for example, the servers 1012 and 1013 or the servers 1022 and 1023 ).
- FIG. 1 the number of networks and servers in FIG. 1 is merely illustrative. There may be any number of networks and servers depending on the implementation requirements.
- FIG. 2 illustrates a flow 200 of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure.
- the network connection method for a training participant of a joint training model includes the following steps 201 to 203 .
- step 201 communication state information of a worker is acquired.
- the execution body (the server 1011 or 1012 shown in FIG. 1 ) of the network connection method for a training participant of a joint training model may acquire local communication state information of the worker through wired connection or wireless connection.
- the communication state information may indicate a communication connection phase that the worker is in.
- the execution body may be the master or the worker in the server cluster.
- the worker may locally acquire the communication state information of the worker.
- the worker may further report the communication state information to a master belonging to the same cluster after acquiring the communication state information of the worker.
- the master of the server cluster may acquire the communication state information of respective workers from the workers in the cluster.
- the communication connection phase may include, for example, a connection establishment phase, a communication phase, and a disconnection phase.
- the communication phase may generally refer to a data transmission phase.
- the connection establishment phase and disconnection phase may generally refer to a preparation phase before data transmission and a disconnection phase after data transmission, respectively.
- the disconnection phase may include operations such as disconnecting a pipeline and removing a socket.
- step 202 communication state information of a target worker is acquired as target communication state information.
- the execution body may acquire the communication state information of the target worker through wired connection or wireless connection.
- the target worker may include a peer node corresponding to the execution subject in the joint training model.
- the peer node may belong to a different training participant of the joint training model.
- the joint training model may include various machine learning models that are trained using a distributed or Federated Learning (FL) framework.
- Peer nodes may be, for example, nodes paired between different participants in a decentralized joint training model.
- the worker may acquire the communication state information of the peer node serving as the target worker as the target communication state information through peer-to-peer connection. After acquiring the target communication state information, the worker may further report the target communication state information to the master belonging to the same cluster. In this way, the master of the server cluster may acquire the communication state information of the peer nodes corresponding to respective workers from the workers in the cluster.
- step 203 in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in is reset.
- the execution body may reset the communication connection phase that the worker is in through various ways.
- the execution body may generally reset the communication connection phase that the worker is in to be consistent with the communication connection phase that the node indicated by the target communication state information is in.
- the execution body in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase, the execution body may reset the communication connection phase of the worker to the connection establishment phase before the communication connection phase.
- the process of the communication requester for example, the worker in the server cluster serving as the first training participant
- the server sets a new worker in the server cluster serving as the first training participant and set the communication connection phase of the new worker to be the connection establishment phase.
- the peer node that is, the worker in the server cluster serving as the second training participant
- the peer node that restores the connection to the worker that has exited due to the communication failure is still in the communication phase as the communication receiver of the previous communication.
- the peer node interrupts the process.
- the master corresponding to the peer node may reset the peer node and set the peer node to be in the connection establishment phase, such that the peer node and the newly set worker performs connection and pairing.
- the master in the server cluster serving as the training participant may delegate codes for determining the local communication state and interrupting the process due to an exception to workers for execution, thereby reducing the policy complexity of the upper-layer master, which improves the efficiency of execution.
- FIG. 3 is a schematic diagram of an application scenario of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure.
- a user 301 and a user 301 ′ perform machine learning training by using the federated learning framework through a terminal 302 and a terminal 302 ′, respectively.
- a server 3031 serving as the master in the above server cluster 303 is paired with a server 3031 ′ serving as the master in the server cluster 303 ′.
- a server 3032 serving as the worker in the server cluster 303 is paired with a server 3032 ′ serving as a worker in the server cluster 303 ′.
- a server 3033 serving as the worker in the server cluster 303 is paired with a server 3033 ′ serving as the worker in the server cluster 303 ′.
- the above servers 3032 and 3033 and servers 3032 ′ and 3033 ′ may be used as trainers.
- the server 3031 serving as the master resets the server 3032 and set the server 3032 to be in the “connection establishment phase”.
- the node server 3032 ′ corresponding to the server 3032 locally acquires the communication state information indicating that the server 3032 ′ is in the “communication phase”.
- the node server 3032 ′ may further acquire the communication state information used by the target node server 3032 for indicating that the target node server is in the “connection establishment phase”. In response to determining that there is a mismatch between the above communication state information, the node server 3032 ′ disconnects the connection. The node server 3031 ′ resets the node server 3032 ′ and sets the server 3032 ′ to be in the “connection establishment phase”.
- the communication phase of the unmatched worker is reset by matching with the communication connection phase of the target worker based on the communication state of the worker. Since data loss occurs only in a relatively short time period after the communication failure, data loss caused by network failures can be reduced to the greatest extent compared with data recovery from the preset recovery point.
- FIG. 4 as an implementation of the method shown in the above drawings, a network connection apparatus for a training participant of a joint training model is provided according to an embodiment of the present disclosure.
- the apparatus embodiment corresponds to the above method embodiment shown in FIG. 2 , where the training participant adopts the master-worker mode.
- the apparatus is applicable to various electronic devices.
- a network connection apparatus 400 for a training participant of a joint training model includes a first acquisition unit 401 , a second acquisition unit 402 , and a reset unit 403 .
- the first acquisition unit 401 is configured to acquire communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in.
- the second acquisition unit 402 is configured to acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model.
- the reset unit 403 is configured to reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- the apparatus 400 for controlling network connection between nodes may further include a generation unit (not shown in the drawings).
- the generation unit may be configured to generate, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in.
- the reset unit 403 may be further configured to: reset the communication connection phase of the worker to a connection establishment phase before the communication connection phase in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection stage.
- the first acquisition unit 401 acquires communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in; then, the second acquisition unit 402 acquires communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; and finally, the reset unit 403 resets the communication connection phase that the worker is in in response to determining that the target communication state information does not match the communication state information of the worker. In this way, the data loss caused by the network failure can be reduced to the greatest extent.
- FIG. 5 illustrates a sequence 500 of interactions between various devices in a network connection system for a training participant of a joint training model according to an embodiment of the present disclosure.
- the network connection system for a training participant of a joint training model may include workers (for example, 1012 , 1013 , 1022 , and 1023 shown in FIG. 1 ) and masters (for example, 1011 and 1021 shown in FIG. 1 ).
- the worker may be configured to acquire local communication state information of the worker, where the communication state information indicates a communication connection phase that the worker is in; acquire communication state information of a target worker as target communication state information, where the target worker may include a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; terminate the process in response to determining that the target communication state information does not match the local communication state information of the worker; set the communication connection phase that the worker is in to a preset phase in response to reception of information instructing restart sent by a master corresponding to the worker, and update the local communication state information of the worker.
- the master may be configured to send, in response to determining that there is a worker that actively terminated the process, information instructing restart to the worker that actively terminated the process.
- the system may further include: a parameter server.
- the parameter server may be configured to generate failure prompt information indicating a failure of the parameter server.
- the master may further be configured to send a communication termination request to a master of another training participant of the joint training model in response to detection of presence of the failure prompt information indicating the failure of the parameter server; and disconnect a communication connection corresponding to the communication termination request in response to reception of a confirmation information corresponding to the communication termination request.
- the communication termination request may further be used to instruct to stop the training process of the joint training model.
- the master may further be configured to restore the training process of the joint training model from a target checkpoint in response to reception of the confirmation information corresponding to the communication termination request.
- a worker acquires local communication state information of the worker.
- step 502 the worker acquires communication state information of a target node as target communication state information.
- step 503 in response to determining that the target communication state information does not match the local communication state information of the worker, the worker terminates the process.
- step 504 in response to determining that there is a worker that actively terminated the process, the master sends information instructing restart to the worker that actively terminated the process.
- step 505 in response to reception of the information instructing restart sent by the master corresponding to the worker, the worker sets the local communication connection phase of the worker to a preset phase, and updates the local communication state information of the worker.
- the preset phase may be a preset communication connection phase.
- the communication connection phase may be the same as the “communication connection phase” in step 201 of the foregoing embodiment, and is not described in detail here.
- steps 501 to 505 correspond to steps 201 to 203 in the foregoing embodiment (the execution body is the worker) and their optional implementations, and the above descriptions for steps 201 to 203 and their optional implementations are also applicable to steps 501 - 505 , which are not described in detail here.
- a parameter server (not shown in the drawings) communicatively connected to the master may generate failure prompt information indicating a failure of the parameter server.
- the parameter server may be various servers for storing parameters of the joint training model.
- the joint training model may be the same as the joint training model described in the above embodiments. Since the parameters stored in the parameter server are updated with the model training process, the parameter server has a state. Therefore, when the parameter server fails, the entire joint training model fails and needs to be reset as a whole.
- step 506 in response to detection of presence of the failure prompt information indicating the failure of the parameter server, the master may send a communication termination request to the master of another training participant of the joint training model.
- the master in response to detection of presence of the failure prompt information indicating that the parameter server of the joint training model is in a fault state, the master (for example, the server 1011 shown in FIG. 1 ) corresponding to the execution body (for example, the server 1012 shown in FIG. 1 ) of the network connection method for a training participant of the joint training model may send the communication termination request to the master of another participant of the joint training model in various ways.
- the communication termination request may include, for example, a FIN (finish) packet in TCP (Transmission Control Protocol, transmission control protocol).
- the communication termination request may further be used to instruct to stop the training process of the joint training model.
- the master may send the communication termination request to a peer node of the joint training model (that is, the master of another participant of the joint training model) to stop a trainer functioning as a worker corresponding to the peer node.
- step 507 the peer node sends confirmation information corresponding to the communication termination request to the master.
- the above peer node may include the master of the participant of the common training model.
- the confirmation information may include an ACK (Acknowledge character) of the TCP.
- step 508 in response to reception of the confirmation information corresponding to the communication termination request, the master disconnects a communication connection corresponding to the communication termination request.
- the above execution body may disconnect the communication connection corresponding to the communication termination request.
- the master in response to reception of the confirmation information corresponding to the communication termination request sent by the peer node of the joint training model (that is, the master of another participant of the joint training model), the master may further restore the training process of the joint training model from a target checkpoint.
- the target checkpoint may be the latest checkpoint to minimize data loss.
- training can be restored from the target checkpoint when the parameter server fails, so that a restore method with the least data loss can be selected according to different failure states.
- the network connection system for a training participant of a joint training model is provided according to the above embodiments of the present disclosure.
- the worker acquires the local communication state information of the worker, where the communication state information indicates the communication connection phase that the worker is in. Then, the worker acquires the communication state information of the target worker as the target communication state information, where the target worker includes the peer node corresponding to the worker, the peer node belongs to a different training participant of the joint training model.
- the worker in response to determining that the target communication state information does not match the local communication state information of the worker, the worker terminates the process.
- the master sends information instructing restart to the worker that actively terminated the process.
- the worker sets the communication connection phase of the worker to the preset phase, and updates the local communication state information of the worker. Therefore, from one aspect, the data loss caused by network failure is reduced to the greatest extent, and from another aspect, the policy complexity of the upper-layer master is reduced, and the execution efficiency is improved.
- FIG. 6 is a schematic structural diagram of an electronic device 600 (for example, the server in FIG. 1 ) for implementing the embodiments of the present disclosure.
- the electronic device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of the embodiments of the present disclosure.
- the electronic device 600 may include a processing apparatus 601 , such as a central processing unit or a graphics processor, which can execute various appropriate actions and processes based on a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage apparatus 608 into a Random Access Memory (RAM) 603 .
- ROM Read Only Memory
- RAM Random Access Memory
- various programs and data required by the electronic device 600 for operation are further stored.
- the processing apparatus 601 , the ROM 602 , and the RAM 603 are connected to each other through a bus 604 .
- An input/output (I/O) interface 605 is also connected to the bus 604 .
- the following may be connected to the I/O interface 605 : an input apparatus 606 such as a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, an output apparatus 607 such as a Liquid Crystal Display (LCD), a speaker, a vibrator, a storage apparatus 608 such as a magnetic tape, a hard disk, and a communication apparatus 609 .
- the electronic device 600 may communicate with other devices through wired or wireless communication to exchange data.
- FIG. 6 shows the electronic device 600 including various apparatuses, it should be understood that not all shown apparatuses are required to be implemented or included. The shown apparatuses may be replaced by other apparatuses, or more or less apparatuses may be included. Each block in FIG. 6 may represent one or multiple devices as required.
- the processes described with reference to flow charts may be implemented as a computer software program according to an embodiment of the present disclosure.
- a computer program product is provided according to an embodiment of the present disclosure, the computer program product includes a computer program embodied on a computer readable medium.
- the computer program includes program codes for performing the method shown in the flowchart.
- the computer program may be downloaded and installed from the network through the communication apparatus 609 , installed from the storage apparatus 608 , or installed from the ROM 602 .
- the computer program when being executed by the processing apparatus 601 , performs functions defined in the method according to the embodiments of the present disclosure.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
- the computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing.
- the computer readable storage medium may include, but not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM or a flash memory), an optical fiber, a portable Compact Disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- the computer readable storage medium may be any tangible medium containing or storing a program, where the program may be used by an instruction execution system, apparatus or device or used in combination therewith.
- the computer readable signal medium may include a data signal transmitted in a baseband or transmitted as a part of a carrier wave.
- the data signal carries computer readable program codes.
- the transmitted data signal may has a variety of forms including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above.
- the computer readable signal medium may also be any other computer readable medium except for the computer readable storage medium.
- the computer readable signal medium may send, transmit or transfer programs used by an instruction execution system, apparatus or device or used in combination therewith.
- the program codes included in the computer readable medium may be transferred through any proper medium including, but not limited to, an electric wire, an optical cable, RF (Radio Frequency), and the like, or any suitable combination of the foregoing.
- the above-mentioned computer-readable medium may be included in the above server; or may exist alone without being assembled into the server.
- the computer-readable medium carries one or more programs, which, when executed by the server, cause the server to: acquire communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- the computer program codes for performing the operations according to the embodiments of the present disclosure may be written in one or more programming languages or a combination of the one or more programming languages.
- the programming languages include, but are not limited to, an object oriented programming language such as Java, Smalltalk, C++ and a conventional procedural programming language such as “C” programming language or a programming language similar to “C” programming language.
- the program codes may be completely executed on a user computer, partially executed on the user computer, executed as a standalone software package, partially executed on the user computer and partially executed on a remote computer, completely executed on the remote computer or a server.
- the remote computer may be connected to the user computer via any kind of networks including Local Area Network (LAN) or Wide Area Network (WAN), or the remote computer may be connected to an external computer (for example, via Internet provided by an Internet service provider).
- LAN Local Area Network
- WAN Wide Area Network
- each block in the flowcharts or block diagrams may represent a module, program segment, or a portion of code that contains one or more executable instructions for implementing the specified logical functions.
- the functions noted in the blocks may occur in an order other than the order shown in the drawings. For example, two blocks shown in succession may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations may be implemented in dedicated hardware-based systems that perform the specified functions or operations, or may be implemented by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
- the described units may also be provided in the processor, for example, it may be described as: a processor, including a first acquisition unit, a second acquisition unit, and a reset unit.
- the names of these units do not constitute a limitation on the units themselves under certain circumstances.
- the first acquisition unit may also be described as “a unit for acquiring local communication state information of the worker, where the communication status information indicates the communication connection phase the worker is in”.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computer And Data Communications (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
- The present application is a continuation application of International Application No. PCT/CN2021/080881, titled “NETWORK CONNECTION METHOD AND DEVICE FOR TRAINING PARTICIPANT END OF COMMON TRAINING MODEL”, filed on Mar. 15, 2021, which claims priority to Chinese Patent Application No. 202010270128.5, titled “NETWORK CONNECTION METHOD AND DEVICE FOR TRAINING PARTICIPANT END OF COMMON TRAINING MODEL”, filed on Apr. 8, 2020, both of which are incorporated herein by reference in their entireties.
- The embodiments of the present disclosure relate to the field of computer technologies, and in particular to a network connection method and a network connection apparatus for a training participant of a joint training model.
- The rapid development of artificial intelligence technology greatly increases the scale of model training and popularizes parallel model training. In parallel design, the Master-Worker mode is generally used, in which the Master is responsible for receiving and distributing tasks (such as training tasks), and the Worker is responsible for processing subtasks.
- In the process of jointly training a model by multiple training participants, various failures, such as network failures, trainer failures, and parameter server failures, are inevitable. A related approach is usually to set a service recovery point in advance, and perform data recovery from the recovery point in the event of a failure.
- A network connection method and a network connection apparatus for a training participant of a joint training model are provided according to the embodiments of the present disclosure.
- In a first aspect, a network connection method for a training participant of a joint training model is provided according to an embodiment of the present disclosure. The training participant operates in a master-worker mode, and the method includes: acquiring communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and resetting, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- In some embodiments, the method further includes generating, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in.
- In some embodiments, the resetting, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in includes: resetting the communication connection phase of the worker to a connection establishment phase before the communication connection phase, in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase.
- In a second aspect, a network connection apparatus for a training participant of a joint training model is provided according to an embodiment of the present disclosure. The training participant operates in a master-slave mode, and the apparatus includes: a first acquisition unit, configured to acquire communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; a second acquisition unit, configured to acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and a reset unit, configured to reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- In some embodiments, the apparatus further includes a generation unit. The generation unit is configured to generate, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in.
- In some embodiments, the reset unit is further configured to: reset the communication connection phase of the worker to a connection establishment phase before the communication connection phase, in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase.
- In a third aspect, a network connection system for a training participant of a joint training model is provided according to an embodiment of the present disclosure. The system includes: a worker, configured to acquire local communication state information of the worker, where the communication state information indicates a communication connection phase that the worker is in; acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; terminate the process in response to determining the target communication state information does not match the local communication state information of the worker; and set the communication connection phase that the worker is in to a preset phase in response to reception of information instructing restart sent by a master corresponding to the worker, and update the local communication state information of the worker; and the master, configured to send, in response to determining that there is a worker that actively terminated the process, the information instructing restart to the worker that actively terminated the process.
- In some embodiments, the system further includes: a parameter server, configured to generate, in response to detecting an operational failure of a parameter server, failure prompt information indicating a failure of the parameter server; and the master is further configured to send a communication termination request to a master of another training participant of the joint training model in response to detection of presence of the failure prompt information indicating the failure of the parameter server; and disconnect a communication connection corresponding to the communication termination request in response to reception of a confirmation information corresponding to the communication termination request.
- In some embodiments, the communication termination request is further used to instruct to stop the training process of the joint training model; and the master is further configured to: restore the training process of the joint training model from a target checkpoint in response to reception of the confirmation information corresponding to the communication termination request.
- In a fourth aspect, a server is provided according to an embodiment of the present disclosure, the server includes: one or more processors; and a storage device storing one or more programs. The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method according to any one of the implementations of the first aspect.
- In a fifth aspect, a computer-readable medium having a computer program stored thereon is provided according to an embodiment of the present disclosure. The program, when executed by a processor, implements the method according to any one of the implementations of the first aspect.
- A network connection method and a network connection apparatus for a training participant of a joint training model are provided according to the embodiments of the present disclosure. The training participant operates in a master-worker mode. The method includes acquiring communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; and resetting the communication connection phase that the worker is in in response to determining that the target communication state information does not match the communication state information of the worker. In this way, the data loss caused by the network failure can be reduced to the greatest extent.
- Other features, objects and advantages of the present disclosure will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
-
FIG. 1 is a schematic diagram illustrating an exemplary system architecture to which an embodiment of the present disclosure is applicable; -
FIG. 2 is a flowchart of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure; -
FIG. 3 is a schematic diagram of an application scenario of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure; -
FIG. 4 is a schematic diagram of a network connection apparatus for a training participant of a joint training model according to an embodiment of the present disclosure; -
FIG. 5 is a sequence diagram of interactions between various devices in a network connection system for a training participant of a joint training model according to an embodiment of the present disclosure; and -
FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure. - The present disclosure will be described in more detail below in conjunction with the embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to explain rather than limiting the present disclosure. In addition, it is to be noted that, for the convenience of description, only the parts related to the present disclosure are shown in the drawings.
- It is to be noted that the embodiments in the present disclosure and the features of the embodiments may be combined with each other in the case of no conflict. The present disclosure will be described in detail below in conjunction with the embodiments with reference to the accompanying drawings.
-
FIG. 1 illustrates anexemplary architecture 100 to which a network connection method for a training participant of a joint training model or a network connection apparatus for a training participant of a joint training model of the present disclosure is applicable. - As shown in
FIG. 1 , thesystem architecture 100 may include 101 and 102, and aserver clusters network 103. Thenetwork 103 is a medium for providing a communication link between the 101 and 102. Theserver clusters network 103 may include various types of connection, such as wired connection, wireless communication links, fiber-optics, cables, or the like. - The
101 and 102 may be servers that provide various services, such as servers for training models having a distributed or federated learning framework. Theserver clusters server cluster 101 may include amaster 1011 and 1012 and 1013. Theworkers server cluster 102 may include amaster 1021 and 1022 and 1023. Theworkers 101 and 102 may function as different participants in federated learning to jointly train a model using their respective training samples.server clusters - It should be noted that the server may be hardware or software. In a case that the server is hardware, the server may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; and in a case that the server is software, the server may be implemented as multiple software or software modules (for example, software or software modules for providing distributed services), or may be implemented as a single software or software module, which is not limited in the present disclosure.
- It is to be noted that the network connection method for a training participant of a joint training model according to the embodiments of the present disclosure is generally performed by the workers (for example, the
1012 and 1013 or theservers servers 1022 and 1023). Correspondingly, the network connection apparatus for a training participant of a joint training model is generally provided in the workers (for example, the 1012 and 1013 or theservers servers 1022 and 1023). - It should be understood that the number of networks and servers in
FIG. 1 is merely illustrative. There may be any number of networks and servers depending on the implementation requirements. - Reference is made to
FIG. 2 , which illustrates aflow 200 of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure. The network connection method for a training participant of a joint training model includes thefollowing steps 201 to 203. - In
step 201, communication state information of a worker is acquired. - In this embodiment, the execution body (the
1011 or 1012 shown inserver FIG. 1 ) of the network connection method for a training participant of a joint training model may acquire local communication state information of the worker through wired connection or wireless connection. The communication state information may indicate a communication connection phase that the worker is in. The execution body may be the master or the worker in the server cluster. Generally, the worker may locally acquire the communication state information of the worker. Then, the worker may further report the communication state information to a master belonging to the same cluster after acquiring the communication state information of the worker. In this way, the master of the server cluster may acquire the communication state information of respective workers from the workers in the cluster. - In this embodiment, the communication connection phase may include, for example, a connection establishment phase, a communication phase, and a disconnection phase. The communication phase may generally refer to a data transmission phase. The connection establishment phase and disconnection phase may generally refer to a preparation phase before data transmission and a disconnection phase after data transmission, respectively. The disconnection phase may include operations such as disconnecting a pipeline and removing a socket.
- In
step 202, communication state information of a target worker is acquired as target communication state information. - In this embodiment, the execution body may acquire the communication state information of the target worker through wired connection or wireless connection. The target worker may include a peer node corresponding to the execution subject in the joint training model. The peer node may belong to a different training participant of the joint training model. The joint training model may include various machine learning models that are trained using a distributed or Federated Learning (FL) framework. Peer nodes may be, for example, nodes paired between different participants in a decentralized joint training model.
- In this embodiment, the worker may acquire the communication state information of the peer node serving as the target worker as the target communication state information through peer-to-peer connection. After acquiring the target communication state information, the worker may further report the target communication state information to the master belonging to the same cluster. In this way, the master of the server cluster may acquire the communication state information of the peer nodes corresponding to respective workers from the workers in the cluster.
- In
step 203, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in is reset. - In this embodiment, in response to determining that the target communication state information acquired in
step 202 does not match the communication state information of the worker, the execution body may reset the communication connection phase that the worker is in through various ways. As an example, the execution body may generally reset the communication connection phase that the worker is in to be consistent with the communication connection phase that the node indicated by the target communication state information is in. - In some optional implementations of this embodiment, in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection phase, the execution body may reset the communication connection phase of the worker to the connection establishment phase before the communication connection phase. As an example, when the number of communication timeouts between a worker in the server cluster serving as a first training participant of the joint training model and a worker in the server cluster serving as a second training participant of the joint training model reaches a preset threshold, the process of the communication requester (for example, the worker in the server cluster serving as the first training participant) exits due to a communication failure. When the master in the server cluster serving as the first training participant is informed of the communication failure of the worker, the server sets a new worker in the server cluster serving as the first training participant and set the communication connection phase of the new worker to be the connection establishment phase. In this case, when the network communication is restored, the peer node (that is, the worker in the server cluster serving as the second training participant) that restores the connection to the worker that has exited due to the communication failure is still in the communication phase as the communication receiver of the previous communication. In response to determining that the communication phases of the two workers do not match, the peer node interrupts the process. Then, the master corresponding to the peer node may reset the peer node and set the peer node to be in the connection establishment phase, such that the peer node and the newly set worker performs connection and pairing.
- Based on the above optional implementation, the master in the server cluster serving as the training participant may delegate codes for determining the local communication state and interrupting the process due to an exception to workers for execution, thereby reducing the policy complexity of the upper-layer master, which improves the efficiency of execution.
- Reference is made to
FIG. 3 , which is a schematic diagram of an application scenario of a network connection method for a training participant of a joint training model according to an embodiment of the present disclosure. In the application scenario ofFIG. 3 , auser 301 and auser 301′ perform machine learning training by using the federated learning framework through a terminal 302 and a terminal 302′, respectively. Aserver 3031 serving as the master in theabove server cluster 303 is paired with aserver 3031′ serving as the master in theserver cluster 303′. Aserver 3032 serving as the worker in theserver cluster 303 is paired with aserver 3032′ serving as a worker in theserver cluster 303′. Aserver 3033 serving as the worker in theserver cluster 303 is paired with aserver 3033′ serving as the worker in theserver cluster 303′. The 3032 and 3033 andabove servers servers 3032′ and 3033′ may be used as trainers. When the process of theserver 3032 is abnormally exited due to a failure, theserver 3031 serving as the master resets theserver 3032 and set theserver 3032 to be in the “connection establishment phase”. Thenode server 3032′ corresponding to theserver 3032 locally acquires the communication state information indicating that theserver 3032′ is in the “communication phase”. Moreover, thenode server 3032′ may further acquire the communication state information used by thetarget node server 3032 for indicating that the target node server is in the “connection establishment phase”. In response to determining that there is a mismatch between the above communication state information, thenode server 3032′ disconnects the connection. Thenode server 3031′ resets thenode server 3032′ and sets theserver 3032′ to be in the “connection establishment phase”. - Currently, one conventional technology is to determine whether to roll back the service to a preset recovery point only according to the local communication state, resulting in a large data loss during network failure recovery. However, in the method according to the above embodiments of the present disclosure, the communication phase of the unmatched worker is reset by matching with the communication connection phase of the target worker based on the communication state of the worker. Since data loss occurs only in a relatively short time period after the communication failure, data loss caused by network failures can be reduced to the greatest extent compared with data recovery from the preset recovery point.
- Reference is made to
FIG. 4 , as an implementation of the method shown in the above drawings, a network connection apparatus for a training participant of a joint training model is provided according to an embodiment of the present disclosure. The apparatus embodiment corresponds to the above method embodiment shown inFIG. 2 , where the training participant adopts the master-worker mode. The apparatus is applicable to various electronic devices. - As shown in
FIG. 4 , anetwork connection apparatus 400 for a training participant of a joint training model according to this embodiment includes afirst acquisition unit 401, asecond acquisition unit 402, and areset unit 403. Thefirst acquisition unit 401 is configured to acquire communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in. Thesecond acquisition unit 402 is configured to acquire communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model. Thereset unit 403 is configured to reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in. - In this embodiment, in the
network connection apparatus 400 for a training participant of a joint training model, for the processing of thefirst acquisition unit 401, thesecond acquisition unit 402 and thereset unit 403 and the technical effects brought by the processing, reference can be made to the related descriptions of 201, 202 and 203 in the corresponding embodiment shown insteps FIG. 2 , and the details are not repeated here. - In some optional implementations of this embodiment, the
apparatus 400 for controlling network connection between nodes may further include a generation unit (not shown in the drawings). The generation unit may be configured to generate, in response to determining that the communication connection phase that the worker is in is changed, new communication state information indicating the changed communication connection phase that the worker is in. - In some optional implementations of this embodiment, the
reset unit 403 may be further configured to: reset the communication connection phase of the worker to a connection establishment phase before the communication connection phase in response to determining that the target communication state information indicates that the target worker is in the connection establishment phase before the communication connection phase and the communication state information of the worker indicates that the worker is in the communication connection stage. - In the apparatus according to the embodiments of the present disclosure, the
first acquisition unit 401 acquires communication state information of a worker, where the communication state information indicates a communication connection phase that the worker is in; then, thesecond acquisition unit 402 acquires communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belong to a different training participant of the joint training model; and finally, thereset unit 403 resets the communication connection phase that the worker is in in response to determining that the target communication state information does not match the communication state information of the worker. In this way, the data loss caused by the network failure can be reduced to the greatest extent. - Reference is made to
FIG. 5 , which illustrates asequence 500 of interactions between various devices in a network connection system for a training participant of a joint training model according to an embodiment of the present disclosure. The network connection system for a training participant of a joint training model may include workers (for example, 1012, 1013, 1022, and 1023 shown inFIG. 1 ) and masters (for example, 1011 and 1021 shown inFIG. 1 ). The worker may be configured to acquire local communication state information of the worker, where the communication state information indicates a communication connection phase that the worker is in; acquire communication state information of a target worker as target communication state information, where the target worker may include a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; terminate the process in response to determining that the target communication state information does not match the local communication state information of the worker; set the communication connection phase that the worker is in to a preset phase in response to reception of information instructing restart sent by a master corresponding to the worker, and update the local communication state information of the worker. The master may be configured to send, in response to determining that there is a worker that actively terminated the process, information instructing restart to the worker that actively terminated the process. - In some optional implementations of this embodiment, the system may further include: a parameter server. The parameter server may be configured to generate failure prompt information indicating a failure of the parameter server. The master may further be configured to send a communication termination request to a master of another training participant of the joint training model in response to detection of presence of the failure prompt information indicating the failure of the parameter server; and disconnect a communication connection corresponding to the communication termination request in response to reception of a confirmation information corresponding to the communication termination request.
- In some optional implementations of this embodiment, the communication termination request may further be used to instruct to stop the training process of the joint training model. The master may further be configured to restore the training process of the joint training model from a target checkpoint in response to reception of the confirmation information corresponding to the communication termination request.
- As shown in
FIG. 5 , instep 501, a worker acquires local communication state information of the worker. - In
step 502, the worker acquires communication state information of a target node as target communication state information. - In
step 503, in response to determining that the target communication state information does not match the local communication state information of the worker, the worker terminates the process. - In step 504, in response to determining that there is a worker that actively terminated the process, the master sends information instructing restart to the worker that actively terminated the process.
- In step 505, in response to reception of the information instructing restart sent by the master corresponding to the worker, the worker sets the local communication connection phase of the worker to a preset phase, and updates the local communication state information of the worker.
- In this embodiment, the preset phase may be a preset communication connection phase. The communication connection phase may be the same as the “communication connection phase” in
step 201 of the foregoing embodiment, and is not described in detail here. - The
above steps 501 to 505 correspond tosteps 201 to 203 in the foregoing embodiment (the execution body is the worker) and their optional implementations, and the above descriptions forsteps 201 to 203 and their optional implementations are also applicable to steps 501-505, which are not described in detail here. - In some optional implementations of this embodiment, in response to detecting an operational failure, a parameter server (not shown in the drawings) communicatively connected to the master may generate failure prompt information indicating a failure of the parameter server.
- In these implementations, the parameter server may be various servers for storing parameters of the joint training model. The joint training model may be the same as the joint training model described in the above embodiments. Since the parameters stored in the parameter server are updated with the model training process, the parameter server has a state. Therefore, when the parameter server fails, the entire joint training model fails and needs to be reset as a whole.
- Based on the above optional implementation, in step 506, in response to detection of presence of the failure prompt information indicating the failure of the parameter server, the master may send a communication termination request to the master of another training participant of the joint training model.
- In these implementations, in response to detection of presence of the failure prompt information indicating that the parameter server of the joint training model is in a fault state, the master (for example, the
server 1011 shown inFIG. 1 ) corresponding to the execution body (for example, theserver 1012 shown inFIG. 1 ) of the network connection method for a training participant of the joint training model may send the communication termination request to the master of another participant of the joint training model in various ways. The communication termination request may include, for example, a FIN (finish) packet in TCP (Transmission Control Protocol, transmission control protocol). - Based on the foregoing optional implementation, optionally, the communication termination request may further be used to instruct to stop the training process of the joint training model. The master may send the communication termination request to a peer node of the joint training model (that is, the master of another participant of the joint training model) to stop a trainer functioning as a worker corresponding to the peer node.
- Based on the above optional implementation, in
step 507, the peer node sends confirmation information corresponding to the communication termination request to the master. - In these implementations, the above peer node may include the master of the participant of the common training model. The confirmation information may include an ACK (Acknowledge character) of the TCP.
- Based on the above optional implementation, in
step 508, in response to reception of the confirmation information corresponding to the communication termination request, the master disconnects a communication connection corresponding to the communication termination request. - In these implementations, in response to reception of the confirmation information corresponding to the communication termination request, the above execution body may disconnect the communication connection corresponding to the communication termination request.
- In some optional implementations of this embodiment, in step 509, in response to reception of the confirmation information corresponding to the communication termination request sent by the peer node of the joint training model (that is, the master of another participant of the joint training model), the master may further restore the training process of the joint training model from a target checkpoint. The target checkpoint may be the latest checkpoint to minimize data loss.
- Based on the above optional implementation, training can be restored from the target checkpoint when the parameter server fails, so that a restore method with the least data loss can be selected according to different failure states.
- The network connection system for a training participant of a joint training model is provided according to the above embodiments of the present disclosure. First, the worker acquires the local communication state information of the worker, where the communication state information indicates the communication connection phase that the worker is in. Then, the worker acquires the communication state information of the target worker as the target communication state information, where the target worker includes the peer node corresponding to the worker, the peer node belongs to a different training participant of the joint training model. Next, in response to determining that the target communication state information does not match the local communication state information of the worker, the worker terminates the process. Next, in response to determining that there is a worker that actively terminated the process, the master sends information instructing restart to the worker that actively terminated the process. Finally, in response to reception of the information instructing restart sent by the master corresponding to the worker, the worker sets the communication connection phase of the worker to the preset phase, and updates the local communication state information of the worker. Therefore, from one aspect, the data loss caused by network failure is reduced to the greatest extent, and from another aspect, the policy complexity of the upper-layer master is reduced, and the execution efficiency is improved.
- Reference is made to
FIG. 6 , which is a schematic structural diagram of an electronic device 600 (for example, the server inFIG. 1 ) for implementing the embodiments of the present disclosure. The electronic device shown inFIG. 6 is only an example, and should not bring any limitation to the function and scope of the embodiments of the present disclosure. - As shown in
FIG. 6 , theelectronic device 600 may include aprocessing apparatus 601, such as a central processing unit or a graphics processor, which can execute various appropriate actions and processes based on a program stored in a Read Only Memory (ROM) 602 or a program loaded from astorage apparatus 608 into a Random Access Memory (RAM) 603. In theRAM 603, various programs and data required by theelectronic device 600 for operation are further stored. Theprocessing apparatus 601, theROM 602, and theRAM 603 are connected to each other through abus 604. An input/output (I/O)interface 605 is also connected to thebus 604. - Generally, the following may be connected to the I/O interface 605: an
input apparatus 606 such as a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, anoutput apparatus 607 such as a Liquid Crystal Display (LCD), a speaker, a vibrator, astorage apparatus 608 such as a magnetic tape, a hard disk, and acommunication apparatus 609. Based on thecommunication apparatus 609, theelectronic device 600 may communicate with other devices through wired or wireless communication to exchange data. AlthoughFIG. 6 shows theelectronic device 600 including various apparatuses, it should be understood that not all shown apparatuses are required to be implemented or included. The shown apparatuses may be replaced by other apparatuses, or more or less apparatuses may be included. Each block inFIG. 6 may represent one or multiple devices as required. - Specifically, the processes described with reference to flow charts, may be implemented as a computer software program according to an embodiment of the present disclosure. For example, a computer program product is provided according to an embodiment of the present disclosure, the computer program product includes a computer program embodied on a computer readable medium. The computer program includes program codes for performing the method shown in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network through the
communication apparatus 609, installed from thestorage apparatus 608, or installed from theROM 602. The computer program, when being executed by theprocessing apparatus 601, performs functions defined in the method according to the embodiments of the present disclosure. - It should be noted that the computer readable medium according to the embodiments of present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More particularly, the computer readable storage medium may include, but not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM or a flash memory), an optical fiber, a portable Compact Disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, the computer readable storage medium may be any tangible medium containing or storing a program, where the program may be used by an instruction execution system, apparatus or device or used in combination therewith. In the present disclosure, the computer readable signal medium may include a data signal transmitted in a baseband or transmitted as a part of a carrier wave. The data signal carries computer readable program codes. The transmitted data signal may has a variety of forms including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer readable signal medium may also be any other computer readable medium except for the computer readable storage medium. The computer readable signal medium may send, transmit or transfer programs used by an instruction execution system, apparatus or device or used in combination therewith. The program codes included in the computer readable medium may be transferred through any proper medium including, but not limited to, an electric wire, an optical cable, RF (Radio Frequency), and the like, or any suitable combination of the foregoing.
- The above-mentioned computer-readable medium may be included in the above server; or may exist alone without being assembled into the server. The computer-readable medium carries one or more programs, which, when executed by the server, cause the server to: acquire communication state information of a worker, the communication state information indicating a communication connection phase that the worker is in; acquiring communication state information of a target worker as target communication state information, where the target worker includes a peer node corresponding to the worker, and the peer node belongs to a different training participant of the joint training model; and reset, in response to determining that the target communication state information does not match the communication state information of the worker, the communication connection phase that the worker is in.
- The computer program codes for performing the operations according to the embodiments of the present disclosure may be written in one or more programming languages or a combination of the one or more programming languages. The programming languages include, but are not limited to, an object oriented programming language such as Java, Smalltalk, C++ and a conventional procedural programming language such as “C” programming language or a programming language similar to “C” programming language. The program codes may be completely executed on a user computer, partially executed on the user computer, executed as a standalone software package, partially executed on the user computer and partially executed on a remote computer, completely executed on the remote computer or a server. In the cases relating to the remote computer, the remote computer may be connected to the user computer via any kind of networks including Local Area Network (LAN) or Wide Area Network (WAN), or the remote computer may be connected to an external computer (for example, via Internet provided by an Internet service provider).
- The flowchart and block diagrams in the drawings illustrate the architecture, functionality, and operations of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, program segment, or a portion of code that contains one or more executable instructions for implementing the specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur in an order other than the order shown in the drawings. For example, two blocks shown in succession may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented in dedicated hardware-based systems that perform the specified functions or operations, or may be implemented by a combination of dedicated hardware and computer instructions.
- The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. The described units may also be provided in the processor, for example, it may be described as: a processor, including a first acquisition unit, a second acquisition unit, and a reset unit. The names of these units do not constitute a limitation on the units themselves under certain circumstances. For example, the first acquisition unit may also be described as “a unit for acquiring local communication state information of the worker, where the communication status information indicates the communication connection phase the worker is in”.
- The above are only preferred embodiments of the present disclosure and are illustrative of the technical principles applied in the present disclosure. It should be understood by those skilled in the art that the scope of the present disclosure is not limited to the above technical solutions formed by a specific combination of technical features, and also encompasses other technical solutions formed by any combination of the above technical features or equivalent features thereof, without departing from the inventive concept of the present disclosure, for example, technical solutions formed by replacing the above features and the technical features disclosed in present disclosure (but not limited to) with similar functions.
Claims (10)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010270128.5 | 2020-04-08 | ||
| CN202010270128.5A CN111510327B (en) | 2020-04-08 | 2020-04-08 | Network connection method, device, system and server for training participants of co-training model |
| PCT/CN2021/080881 WO2021203920A1 (en) | 2020-04-08 | 2021-03-15 | Network connection method and device for training participant end of common training model |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/080881 Continuation WO2021203920A1 (en) | 2020-04-08 | 2021-03-15 | Network connection method and device for training participant end of common training model |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220394085A1 true US20220394085A1 (en) | 2022-12-08 |
| US11811864B2 US11811864B2 (en) | 2023-11-07 |
Family
ID=71870822
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/886,771 Active US11811864B2 (en) | 2020-04-08 | 2022-08-12 | Network connection method and device for training participant end of common training model |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US11811864B2 (en) |
| EP (1) | EP4120631B1 (en) |
| JP (1) | JP7454065B2 (en) |
| CN (1) | CN111510327B (en) |
| WO (1) | WO2021203920A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111510327B (en) | 2020-04-08 | 2022-01-21 | 北京字节跳动网络技术有限公司 | Network connection method, device, system and server for training participants of co-training model |
| CN112801815B (en) * | 2020-12-30 | 2024-03-29 | 国网江苏省电力公司信息通信分公司 | Power communication network fault early warning method based on federal learning |
| CN116669085A (en) * | 2022-02-18 | 2023-08-29 | 展讯通信(上海)有限公司 | Communication method, device, equipment and storage medium |
| WO2025159318A1 (en) * | 2024-01-24 | 2025-07-31 | 삼성전자주식회사 | Electronic device and method for delegating training of artificial intelligence model, and non-transitory computer-readable recording medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160050262A1 (en) * | 2014-08-13 | 2016-02-18 | Microsoft Corporation | Scalable fault resilient communications within distributed clusters |
| US9357003B1 (en) * | 2010-03-30 | 2016-05-31 | Chelsio Communications, Inc. | Failover and migration for full-offload network interface devices |
| US20170339005A1 (en) * | 2015-02-10 | 2017-11-23 | Huawei Technologies Co., Ltd. | Method and Device for Processing Failure in at Least One Distributed Cluster, and System |
| US10498817B1 (en) * | 2017-03-21 | 2019-12-03 | Amazon Technologies, Inc. | Performance tuning in distributed computing systems |
| US20200394552A1 (en) * | 2019-06-12 | 2020-12-17 | International Business Machines Corporation | Aggregated maching learning verification for database |
| US20210097477A1 (en) * | 2019-09-05 | 2021-04-01 | Nanjing University Of Posts And Telecommunications | Container image management system for distributed clusters |
| US11075980B2 (en) * | 2010-12-14 | 2021-07-27 | International Business Machines Corporation | Method for operating a node cluster system in a network and node cluster system |
Family Cites Families (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004320423A (en) * | 2003-04-16 | 2004-11-11 | Yazaki Corp | Master node, master-slave communication system, master-slave communication method, header transmission program, and master-slave communication program |
| CN101022451B (en) * | 2006-02-14 | 2014-07-23 | 杭州华三通信技术有限公司 | Connection state synchronizing method in data communication and applied communication node thereof |
| JP2008052315A (en) * | 2006-08-22 | 2008-03-06 | Yokogawa Electric Corp | Fieldbus communication diagnostic device and fieldbus communication diagnostic method |
| EP2439885B1 (en) * | 2010-10-08 | 2013-06-26 | Honeywell International Inc. | Method for digital communication between a plurality of nodes connected by a serial field bus and corresponding system, in particular a field control system or field surveillance system |
| TWI520629B (en) * | 2011-02-18 | 2016-02-01 | Univ Kyushu Nat Univ Corp | A transmission cycle determining method, a transmission cycle determining means, and a computer-readable recording medium |
| WO2016041153A1 (en) * | 2014-09-17 | 2016-03-24 | Qualcomm Incorporated | Receiver training using predicted data |
| CN106411629B (en) * | 2015-08-03 | 2020-06-30 | 阿里巴巴集团控股有限公司 | Method and equipment for monitoring state of CDN node |
| CN107025205B (en) * | 2016-01-30 | 2021-06-22 | 华为技术有限公司 | Method and equipment for training model in distributed system |
| CN109547388A (en) * | 2017-07-25 | 2019-03-29 | 上海掌门科技有限公司 | Equipment connection method and device |
| CN108229528A (en) * | 2017-08-16 | 2018-06-29 | 北京市商汤科技开发有限公司 | Clustering Model training method and device, electronic equipment, computer storage media |
| CN109729111B (en) * | 2017-10-27 | 2021-10-08 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer program product for managing distributed systems |
| JP6877393B2 (en) * | 2017-12-18 | 2021-05-26 | 株式会社東芝 | Systems, programs and methods |
| US10560313B2 (en) * | 2018-06-26 | 2020-02-11 | Sas Institute Inc. | Pipeline system for time-series data forecasting |
| CN109039733A (en) * | 2018-07-26 | 2018-12-18 | 郑州云海信息技术有限公司 | A kind of alarm method, system and electronic equipment and storage medium |
| CN109981750B (en) * | 2019-03-06 | 2021-09-17 | 北京百度网讯科技有限公司 | Business process system, business data processing method and device |
| CN110598870B (en) * | 2019-09-02 | 2024-04-30 | 深圳前海微众银行股份有限公司 | Federal learning method and device |
| CN110635944A (en) * | 2019-09-03 | 2019-12-31 | 苏州浪潮智能科技有限公司 | A cluster network configuration method, device, electronic device and storage medium |
| CN110688230B (en) * | 2019-10-17 | 2022-06-24 | 广州文远知行科技有限公司 | Synchronous training method and device, computer equipment and storage medium |
| CN111510327B (en) * | 2020-04-08 | 2022-01-21 | 北京字节跳动网络技术有限公司 | Network connection method, device, system and server for training participants of co-training model |
-
2020
- 2020-04-08 CN CN202010270128.5A patent/CN111510327B/en active Active
-
2021
- 2021-03-15 JP JP2022556484A patent/JP7454065B2/en active Active
- 2021-03-15 EP EP21784197.2A patent/EP4120631B1/en active Active
- 2021-03-15 WO PCT/CN2021/080881 patent/WO2021203920A1/en not_active Ceased
-
2022
- 2022-08-12 US US17/886,771 patent/US11811864B2/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9357003B1 (en) * | 2010-03-30 | 2016-05-31 | Chelsio Communications, Inc. | Failover and migration for full-offload network interface devices |
| US11075980B2 (en) * | 2010-12-14 | 2021-07-27 | International Business Machines Corporation | Method for operating a node cluster system in a network and node cluster system |
| US20160050262A1 (en) * | 2014-08-13 | 2016-02-18 | Microsoft Corporation | Scalable fault resilient communications within distributed clusters |
| US20170339005A1 (en) * | 2015-02-10 | 2017-11-23 | Huawei Technologies Co., Ltd. | Method and Device for Processing Failure in at Least One Distributed Cluster, and System |
| US10498817B1 (en) * | 2017-03-21 | 2019-12-03 | Amazon Technologies, Inc. | Performance tuning in distributed computing systems |
| US20200394552A1 (en) * | 2019-06-12 | 2020-12-17 | International Business Machines Corporation | Aggregated maching learning verification for database |
| US20210097477A1 (en) * | 2019-09-05 | 2021-04-01 | Nanjing University Of Posts And Telecommunications | Container image management system for distributed clusters |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111510327A (en) | 2020-08-07 |
| WO2021203920A1 (en) | 2021-10-14 |
| US11811864B2 (en) | 2023-11-07 |
| JP2023518779A (en) | 2023-05-08 |
| EP4120631A1 (en) | 2023-01-18 |
| CN111510327B (en) | 2022-01-21 |
| JP7454065B2 (en) | 2024-03-21 |
| EP4120631B1 (en) | 2024-11-20 |
| EP4120631A4 (en) | 2023-08-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11811864B2 (en) | Network connection method and device for training participant end of common training model | |
| CN109951331B (en) | Method, apparatus and computing cluster for sending information | |
| CN107357571B (en) | Maintenance method and system for equipment component program | |
| CN107295080A (en) | Date storage method and server applied to distributed server cluster | |
| CN110933137A (en) | Data synchronization method, system, equipment and readable storage medium | |
| CN112448858A (en) | Network communication control method and device, electronic equipment and readable storage medium | |
| CN107483297B (en) | Active monitoring system and method for quality of service carried on embedded equipment | |
| JP6431197B2 (en) | Snapshot processing methods and associated devices | |
| CN112804213A (en) | Communication disconnection reconnection method, device, system, readable medium and electronic equipment | |
| CN114706615A (en) | Automatic reverse analysis method and device for industrial robot protocol | |
| CN117130719A (en) | Data source switching method, device, electronic equipment and computer readable medium | |
| US20220066436A1 (en) | Industrial field device replacement system | |
| CN114553870A (en) | Data processing method and device based on distributed cluster | |
| CN114095343A (en) | Disaster recovery method, device, equipment and storage medium based on double-active system | |
| CN114679472A (en) | Communication system, method, apparatus, storage medium, and electronic device | |
| CN1980232A (en) | Telnet session maitenance method, telnet proxy and computer network system | |
| CN112787868B (en) | Information synchronization method and device | |
| CN116133058B (en) | Session establishment method, system, device, equipment and storage medium | |
| CN116367204B (en) | User equipment service processing method, electronic equipment, storage medium and system | |
| CN112311833A (en) | Data updating method and device | |
| CN117319431A (en) | A task management and control method, device and task management and control system | |
| CN117978849A (en) | Communication connection construction method, device, electronic device and storage medium | |
| CN119728682A (en) | Method and device for transmitting configuration data among clusters, electronic equipment and storage medium | |
| CN120785726A (en) | Dual-computer hot standby method, system, electronic equipment and storage medium | |
| CN120896837A (en) | Network fault tolerance methods, devices, equipment, media and products |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: SPECIAL NEW |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| AS | Assignment |
Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEIJING YOUZHUJU NETWORK TECHNOLOGY CO. LTD.;REEL/FRAME:064977/0650 Effective date: 20230606 Owner name: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO. LTD.,, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHANG, LONG;REEL/FRAME:064977/0536 Effective date: 20221124 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| AS | Assignment |
Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, LONGYIJIA;CHEN, CHENG;WU, DI;AND OTHERS;SIGNING DATES FROM 20220526 TO 20221124;REEL/FRAME:065021/0754 Owner name: HANGZHOU OCEAN ENGINE NETWORK TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FANG, CHENLIAOHUI;REEL/FRAME:065003/0251 Effective date: 20221124 Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BYTEDANCE INC.;REEL/FRAME:065003/0303 Effective date: 20230606 Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEIJING YOUZHUJU NETWORK TECHNOLOGY CO. LTD.;REEL/FRAME:065003/0300 Effective date: 20230606 Owner name: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO. LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHANG, LONG;REEL/FRAME:065003/0289 Effective date: 20221124 Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEIJING DOUYIN INFORMATION SERVICE CO., LTD.;REEL/FRAME:065003/0286 Effective date: 20230606 Owner name: BEIJING DOUYIN INFORMATION SERVICE CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WU, LIANGCHAO;REEL/FRAME:065003/0270 Effective date: 20220907 Owner name: DOUYIN VISION CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HANGZHOU OCEAN ENGINE NETWORK TECHNOLOGY CO., LTD.;REEL/FRAME:065003/0267 Effective date: 20230606 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP, ISSUE FEE PAYMENT VERIFIED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |