[go: up one dir, main page]

MX2018000942A - Control continuo con aprendizaje de refuerzo profundo. - Google Patents

Control continuo con aprendizaje de refuerzo profundo.

Info

Publication number
MX2018000942A
MX2018000942A MX2018000942A MX2018000942A MX2018000942A MX 2018000942 A MX2018000942 A MX 2018000942A MX 2018000942 A MX2018000942 A MX 2018000942A MX 2018000942 A MX2018000942 A MX 2018000942A MX 2018000942 A MX2018000942 A MX 2018000942A
Authority
MX
Mexico
Prior art keywords
neural network
experience
tuple
actor
parameters
Prior art date
Application number
MX2018000942A
Other languages
English (en)
Inventor
Paul Lillicrap Timothy
Pritzel Alexander
Manfred Otto Heess Nicolas
Erez Tom
Pieter Wierstra Daniel
Tassa Yuval
Silver David
James Hunt Jonathan
Original Assignee
Deepmind Tech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deepmind Tech Ltd filed Critical Deepmind Tech Ltd
Publication of MX2018000942A publication Critical patent/MX2018000942A/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • User Interface Of Digital Computer (AREA)
  • Feedback Control In General (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Métodos, sistemas y aparatos, incluyendo programas de computadora codificados en un medio de almacenamiento de computadora, para entrenar una red neuronal de actor utilizada para seleccionar acciones que van a ser ejecutadas por un agente que interactúa con un ambiente; uno de los métodos incluye obtener un mini-lote de tuplas de experiencia y actualizar valores actuales de los parámetros de la red neuronal de actor, comprendiendo: para cada tupla de experiencia en el mini-lote: procesar la observación de entrenamiento y la acción de entrenamiento en la tupla de experiencia utilizando una red neuronal crítica para determinar una salida de red neuronal para la tupla de experiencia, y determinar una salida de red neuronal objetivo para la tupla de experiencia; actualizar valores actuales de los parámetros de la red neuronal crítica utilizando errores entre la salidas de red neuronal objetivo y las salidas de red neuronal, y actualizar los valores actuales de los parámetros de la red neuronal de actor utilizando la red actual crítica.
MX2018000942A 2015-07-24 2016-07-22 Control continuo con aprendizaje de refuerzo profundo. MX2018000942A (es)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562196854P 2015-07-24 2015-07-24
PCT/US2016/043716 WO2017019555A1 (en) 2015-07-24 2016-07-22 Continuous control with deep reinforcement learning

Publications (1)

Publication Number Publication Date
MX2018000942A true MX2018000942A (es) 2018-08-09

Family

ID=56555869

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2018000942A MX2018000942A (es) 2015-07-24 2016-07-22 Control continuo con aprendizaje de refuerzo profundo.

Country Status (13)

Country Link
US (3) US10776692B2 (es)
EP (1) EP3326114B1 (es)
JP (1) JP6664480B2 (es)
KR (1) KR102165126B1 (es)
CN (2) CN114757333B (es)
AU (1) AU2016297852C1 (es)
CA (1) CA2993551C (es)
DE (1) DE112016003350T5 (es)
GB (1) GB2559491A (es)
IL (1) IL257103B (es)
MX (1) MX2018000942A (es)
RU (1) RU2686030C1 (es)
WO (1) WO2017019555A1 (es)

Families Citing this family (121)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3326114B1 (en) * 2015-07-24 2024-09-04 DeepMind Technologies Limited Continuous control with deep reinforcement learning
US10885432B1 (en) * 2015-12-16 2021-01-05 Deepmind Technologies Limited Selecting actions from large discrete action sets using reinforcement learning
US11188821B1 (en) * 2016-09-15 2021-11-30 X Development Llc Control policies for collective robot learning
JP6728495B2 (ja) * 2016-11-04 2020-07-22 ディープマインド テクノロジーズ リミテッド 強化学習を用いた環境予測
US20180204108A1 (en) * 2017-01-18 2018-07-19 Microsoft Technology Licensing, Llc Automated activity-time training
CN110383298B (zh) * 2017-01-31 2024-08-06 渊慧科技有限公司 用于连续控制任务的数据高效强化学习
KR102113465B1 (ko) 2017-02-09 2020-05-21 미쓰비시덴키 가부시키가이샤 위치 제어 장치 및 위치 제어 방법
KR102113462B1 (ko) * 2017-02-09 2020-05-21 미쓰비시덴키 가부시키가이샤 위치 제어 장치 및 위치 제어 방법
US11010948B2 (en) * 2017-02-09 2021-05-18 Google Llc Agent navigation using visual inputs
DK3568810T3 (da) 2017-02-24 2023-11-13 Deepmind Tech Ltd Handlingsudvælgelse til forstærkningslæring ved hjælp af neurale netværk
CN110326004B (zh) * 2017-02-24 2023-06-30 谷歌有限责任公司 使用路径一致性学习训练策略神经网络
US11308391B2 (en) * 2017-03-06 2022-04-19 Baidu Usa Llc Offline combination of convolutional/deconvolutional and batch-norm layers of convolutional neural network models for autonomous driving vehicles
BR112019019653A2 (pt) * 2017-03-21 2020-04-22 Blue River Technology Inc. método para controlar os mecanismos de atuação de uma pluralidade de componentes de uma colheitadeira
KR102399535B1 (ko) * 2017-03-23 2022-05-19 삼성전자주식회사 음성 인식을 위한 학습 방법 및 장치
WO2018189279A1 (en) 2017-04-12 2018-10-18 Deepmind Technologies Limited Black-box optimization using neural networks
EP3616137A1 (en) 2017-04-26 2020-03-04 Google LLC Integrating machine learning into control systems
CN110663049B (zh) * 2017-04-28 2023-12-26 谷歌有限责任公司 神经网络优化器搜索
KR102391452B1 (ko) * 2017-06-02 2022-04-27 에스케이텔레콤 주식회사 신경망에 복수의 태스크를 순차 학습시키는 방법
WO2019002465A1 (en) * 2017-06-28 2019-01-03 Deepmind Technologies Limited NEURONAL LEARNING ACTION SELECTION NETWORKS USING APPRENTICESHIP
CN109204308B (zh) * 2017-07-03 2020-04-07 上海汽车集团股份有限公司 车道保持算法的确定方法、车道保持的控制方法及系统
US10235881B2 (en) * 2017-07-28 2019-03-19 Toyota Motor Engineering & Manufacturing North America, Inc. Autonomous operation capability configuration for a vehicle
US11182676B2 (en) 2017-08-04 2021-11-23 International Business Machines Corporation Cooperative neural network deep reinforcement learning with partial input assistance
US10678241B2 (en) * 2017-09-06 2020-06-09 GM Global Technology Operations LLC Unsupervised learning agents for autonomous driving applications
US10254759B1 (en) * 2017-09-14 2019-04-09 Waymo Llc Interactive autonomous vehicle agent
CN110730970B (zh) * 2017-09-20 2024-03-05 谷歌有限责任公司 优化策略控制器的方法和系统
CN109726808B (zh) * 2017-10-27 2022-12-09 腾讯科技(深圳)有限公司 神经网络训练方法和装置、存储介质及电子装置
CN109726811A (zh) * 2017-10-27 2019-05-07 谷歌有限责任公司 使用优先级队列训练神经网络
CN107861061B (zh) * 2017-10-27 2019-11-01 安徽大学 一种数据驱动的感应电机参数在线辨识方法
CN109726813A (zh) * 2017-10-27 2019-05-07 渊慧科技有限公司 任务的强化和模仿学习
US11688160B2 (en) * 2018-01-17 2023-06-27 Huawei Technologies Co., Ltd. Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations
US11568236B2 (en) 2018-01-25 2023-01-31 The Research Foundation For The State University Of New York Framework and methods of diverse exploration for fast and safe policy improvement
CN118211640A (zh) 2018-02-05 2024-06-18 渊慧科技有限公司 使用异策略行动者-评价者强化学习进行分布式训练
US12221101B2 (en) 2018-03-04 2025-02-11 Traxen Inc. Automated cruise control system
US11106211B2 (en) * 2018-04-02 2021-08-31 Sony Group Corporation Vision-based sample-efficient reinforcement learning framework for autonomous driving
CN112272831B (zh) * 2018-05-18 2024-11-08 渊慧科技有限公司 包括用于生成环境中的实体之间的数据编码关系的关系网络的强化学习系统
US11113605B2 (en) * 2018-05-18 2021-09-07 Deepmind Technologies Limited Reinforcement learning using agent curricula
EP3572982A1 (en) * 2018-05-25 2019-11-27 Robert Bosch GmbH Machine learning system
CN108921284B (zh) * 2018-06-15 2020-11-17 山东大学 基于深度学习的人际交互肢体语言自动生成方法及系统
EP3793786A1 (en) * 2018-06-15 2021-03-24 Google LLC Self-supervised robotic object interaction
EP3584721A1 (de) * 2018-06-18 2019-12-25 Siemens Aktiengesellschaft System, computergestütztes verfahren und computerprogrammprodukt zum generieren von konstruktionsparametern einer komplexen vorrichtung
KR102103644B1 (ko) * 2018-06-19 2020-04-23 한국과학기술원 연속 행동 공간 제어를 위한 적응형 다중-배치 경험 리플레이 기법
US10747224B2 (en) * 2018-06-19 2020-08-18 Toyota Research Institute, Inc. Debugging an autonomous driving machine learning model
CN109240280B (zh) * 2018-07-05 2021-09-07 上海交通大学 基于强化学习的锚泊辅助动力定位系统控制方法
EP3619654B1 (en) 2018-07-23 2024-09-04 Google LLC Continuous parametrizations of neural network layer weights
FR3084867B1 (fr) 2018-08-07 2021-01-15 Psa Automobiles Sa Procede d’assistance pour qu’un vehicule a conduite automatisee suive une trajectoire, par apprentissage par renforcement de type acteur critique a seuil
US10733510B2 (en) 2018-08-24 2020-08-04 Ford Global Technologies, Llc Vehicle adaptive learning
JP7048455B2 (ja) * 2018-08-30 2022-04-05 本田技研工業株式会社 学習装置、シミュレーションシステム、学習方法、およびプログラム
CN109271629B (zh) * 2018-09-07 2023-07-14 中山大学 基于强化学习的生成式文本摘要方法
CN110888401B (zh) * 2018-09-11 2022-09-06 京东科技控股股份有限公司 火力发电机组燃烧控制优化方法、装置及可读存储介质
CN109212476B (zh) * 2018-09-18 2023-03-14 广西大学 一种基于ddpg的rfid室内定位算法
WO2020062911A1 (en) * 2018-09-26 2020-04-02 Huawei Technologies Co., Ltd. Actor ensemble for continuous control
US11676008B2 (en) * 2018-09-27 2023-06-13 Google Llc Parameter-efficient multi-task and transfer learning
EP3864581A1 (en) * 2018-10-12 2021-08-18 DeepMind Technologies Limited Controlling agents over long time scales using temporal value transport
BR112021007884A2 (pt) * 2018-10-26 2021-08-03 Dow Global Technologies Llc método implementado por computador, dispositivo de computação, artigo de fabricação, e, sistema de computação
CN111105029B (zh) * 2018-10-29 2024-04-16 北京地平线机器人技术研发有限公司 神经网络的生成方法、生成装置和电子设备
CN109598332B (zh) * 2018-11-14 2021-04-09 北京市商汤科技开发有限公司 神经网络生成方法及装置、电子设备和存储介质
WO2020113228A1 (en) 2018-11-30 2020-06-04 Google Llc Controlling robots using entropy constraints
DE102018220865B4 (de) * 2018-12-03 2020-11-05 Psa Automobiles Sa Verfahren zum Trainieren wenigstens eines Algorithmus für ein Steuergerät eines Kraftfahrzeugs, Computerprogrammprodukt sowie Kraftfahrzeug
US11204761B2 (en) 2018-12-03 2021-12-21 International Business Machines Corporation Data center including cognitive agents and related methods
US11295236B2 (en) * 2018-12-10 2022-04-05 International Business Machines Corporation Machine learning in heterogeneous processing systems
CN113039495A (zh) * 2018-12-13 2021-06-25 安德里茨公司 工业厂房控制器
KR102209917B1 (ko) * 2018-12-31 2021-01-29 아주대학교산학협력단 심층 강화 학습을 위한 데이터 처리 장치 및 방법
CN113196308B (zh) * 2019-01-14 2024-05-14 赫尔实验室有限公司 用于控制移动平台的系统、方法和计算机程序产品
KR102309682B1 (ko) * 2019-01-22 2021-10-07 (주)티비스톰 강화학습을 통해 진화하는 ai 개체를 제공하는 방법 및 플랫폼
US11636347B2 (en) * 2019-01-23 2023-04-25 Deepmind Technologies Limited Action selection using interaction history graphs
CN110798842B (zh) * 2019-01-31 2022-06-28 湖北工业大学 一种基于多用户深度强化学习的异构蜂窝网络流量卸载方法
JP2020135011A (ja) 2019-02-13 2020-08-31 キオクシア株式会社 情報処理装置及び方法
DE102019104966A1 (de) * 2019-02-27 2020-08-27 Bayerische Motoren Werke Aktiengesellschaft Selbstlernende Steuervorrichtung und Verfahren für selbstlernende Steuervorrichtung
US11410023B2 (en) 2019-03-01 2022-08-09 International Business Machines Corporation Lexicographic deep reinforcement learning using state constraints and conditional policies
KR102267316B1 (ko) * 2019-03-05 2021-06-21 네이버랩스 주식회사 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템
US11216001B2 (en) 2019-03-20 2022-01-04 Honda Motor Co., Ltd. System and method for outputting vehicle dynamic controls using deep neural networks
CN113574547B (zh) * 2019-03-20 2024-01-19 索尼集团公司 通过双演员评论家算法进行强化学习
CN109992000B (zh) * 2019-04-04 2020-07-03 北京航空航天大学 一种基于分层强化学习的多无人机路径协同规划方法及装置
JP7106486B2 (ja) * 2019-04-22 2022-07-26 株式会社東芝 学習装置、学習方法、プログラムおよび情報処理システム
US11410558B2 (en) 2019-05-21 2022-08-09 International Business Machines Corporation Traffic control with reinforcement learning
US11526755B2 (en) * 2019-05-23 2022-12-13 Deepmind Technologies Limited Training more secure neural networks by using local linearity regularization
WO2020242367A1 (en) * 2019-05-28 2020-12-03 Telefonaktiebolaget Lm Ericsson (Publ) Cavity filter tuning using imitation and reinforcement learning
WO2020249299A1 (en) * 2019-06-11 2020-12-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for data traffic routing
CN114051444B (zh) * 2019-07-01 2024-04-26 库卡德国有限公司 借助于至少一个机器人执行应用
DE102019117839A1 (de) * 2019-07-02 2021-01-07 Bayerische Motoren Werke Aktiengesellschaft Verfahren, Vorrichtung, Computerprogramm und Computerprogrammprodukt zur Datenverarbeitung in einem Fahrzeug und Fahrzeug
DE102019209725B4 (de) 2019-07-03 2023-10-26 Zf Friedrichshafen Ag Verfahren zum Anpassen von Mitteln einer Steuereinrichtung
EP3953869A4 (en) 2019-07-26 2022-06-08 Samsung Electronics Co., Ltd. LEARNING METHOD FOR AN AI MODEL AND ELECTRONIC DEVICE
CN110456644B (zh) * 2019-08-13 2022-12-06 北京地平线机器人技术研发有限公司 确定自动化设备的执行动作信息的方法、装置及电子设备
DE112019007598B4 (de) 2019-09-05 2024-05-08 Mitsubishi Electric Corporation Inferenzeinrichtung, vorrichtung-steuerungssystem und lerneinrichtung
CN110609474B (zh) * 2019-09-09 2020-10-02 创新奇智(南京)科技有限公司 一种基于强化学习的数据中心能效优化方法
CN112688974B (zh) * 2019-10-17 2024-12-20 伊姆西Ip控股有限责任公司 用于管理备份系统的方法、装置和计算机程序产品
CN112731804A (zh) * 2019-10-29 2021-04-30 北京京东乾石科技有限公司 一种实现路径跟随的方法和装置
US11500337B2 (en) * 2019-11-04 2022-11-15 Honeywell International Inc. Method and system for directly tuning PID parameters using a simplified actor-critic approach to reinforcement learning
CN111062632B (zh) * 2019-12-24 2023-04-18 国网黑龙江省电力有限公司 一种基于边缘智能的5g能源互联网虚拟电厂经济调度方法
CN111242443B (zh) * 2020-01-06 2023-04-18 国网黑龙江省电力有限公司 基于深度强化学习的能源互联网中虚拟电厂经济调度方法
US20230101930A1 (en) * 2020-02-07 2023-03-30 Deepmind Technologies Limited Generating implicit plans for accomplishing goals in an environment using attention operations over planning embeddings
CN111582441B (zh) * 2020-04-16 2021-07-30 清华大学 共享循环神经网络的高效值函数迭代强化学习方法
CN115485630A (zh) * 2020-06-15 2022-12-16 华为技术有限公司 用于控制器的方法和系统
CN111882030B (zh) * 2020-06-29 2023-12-05 武汉钢铁有限公司 一种基于深度强化学习的加锭策略方法
GB202009983D0 (en) * 2020-06-30 2020-08-12 Microsoft Technology Licensing Llc Partially-observed sequential variational auto encoder
US20220036186A1 (en) * 2020-07-30 2022-02-03 Waymo Llc Accelerated deep reinforcement learning of agent control policies
CN111898770B (zh) * 2020-09-29 2021-01-15 四川大学 一种多智能体强化学习方法、电子设备及存储介质
US12277194B2 (en) * 2020-09-29 2025-04-15 Sony Group Corporation Task prioritized experience replay algorithm for reinforcement learning
CN116324818A (zh) * 2020-10-02 2023-06-23 渊慧科技有限公司 使用增强时间差异学习训练强化学习智能体
KR102697184B1 (ko) * 2020-11-25 2024-08-21 한국전자통신연구원 강화학습 시스템 상의 개체 포팅 방법 및 장치
WO2022131433A1 (ko) * 2020-12-14 2022-06-23 한국과학기술원 샘플 효율적인 탐색을 위한 샘플-인지 엔트로피 정규화 기법
RU2755339C1 (ru) * 2020-12-16 2021-09-15 Федеральное государственное бюджетное образовательное учреждение высшего образования "Кубанский государственный технологический университет" (ФГБОУ ВО "КубГТУ") Модифицированный интеллектуальный контроллер с адаптивным критиком
CN112911647A (zh) * 2021-01-20 2021-06-04 长春工程学院 一种基于深度强化学习的计算卸载和资源分配方法
CN113222106B (zh) * 2021-02-10 2024-04-30 西北工业大学 一种基于分布式强化学习的智能兵棋推演方法
DE102021107458A1 (de) 2021-03-25 2022-09-29 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Steuervorrichtung und Verfahren
US20240185577A1 (en) 2021-04-01 2024-06-06 Bayer Aktiengesellschaft Reinforced attention
CN113104050B (zh) * 2021-04-07 2022-04-12 天津理工大学 一种基于深度强化学习的无人驾驶端到端决策方法
CN113269315B (zh) * 2021-06-29 2024-04-02 安徽寒武纪信息科技有限公司 利用深度强化学习执行任务的设备、方法及可读存储介质
WO2023075631A1 (ru) 2021-10-29 2023-05-04 Ооо (Общество С Ограниченной Ответственностью) "Арлойд Аутомейшн" Система управления устройствами отопления, вентиляции и кондиционирования воздуха
CN114154413B (zh) * 2021-11-26 2025-08-05 脸萌有限公司 用于物理系统的状态预测的方法、设备、介质和产品
US12417412B2 (en) 2022-02-16 2025-09-16 International Business Machines Corporation Automated model predictive control using a regression-optimization framework for sequential decision making
CN114722998B (zh) * 2022-03-09 2024-02-02 三峡大学 一种基于cnn-ppo的兵棋推演智能体构建方法
US12511547B2 (en) * 2022-11-02 2025-12-30 Robert Bosch Gmbh Smoothed reward system transfer for actor- critic reinforcement learning models
US12485795B2 (en) 2022-11-02 2025-12-02 Robert Bosch Gmbh Reinforcement learning for continued learning of optimal battery charging
CN116681105A (zh) * 2023-06-09 2023-09-01 思必驰科技股份有限公司 用于超大规模语言模型的智能决策方法、电子设备和存储介质
CN116611194B (zh) * 2023-07-17 2023-09-29 合肥工业大学 基于深度强化学习的线路重合调度策略模型、方法和系统
FR3156894A1 (fr) 2023-12-15 2025-06-20 Stellantis Auto Sas Procédé et dispositif d'estimation du taux d'usure des composants d'un véhicule circulant sur une route
FR3156737A1 (fr) 2023-12-15 2025-06-20 Stellantis Auto Sas Procédé et dispositif de commande d'un système actif embarqué d'un vehicule circulant sur une route
CN117863948B (zh) * 2024-01-17 2024-06-11 广东工业大学 一种辅助调频的分散电动汽车充电控制方法及装置
US20250377668A1 (en) * 2024-06-05 2025-12-11 The Boeing Company Decentralized multi-agent actor-critic reinforcement learning model for controlling autonomous vehicles in multi-vehicle environments
CN119984267B (zh) * 2025-01-16 2025-09-19 华中科技大学 跨区域无地图场景的移动机器人自主导航方法及其系统

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05346915A (ja) * 1992-01-30 1993-12-27 Ricoh Co Ltd 学習機械並びにニューラルネットワークおよびデータ分析装置並びにデータ分析方法
US5608843A (en) * 1994-08-01 1997-03-04 The United States Of America As Represented By The Secretary Of The Air Force Learning controller with advantage updating algorithm
US6085178A (en) 1997-03-21 2000-07-04 International Business Machines Corporation Apparatus and method for communicating between an intelligent agent and client computer process using disguised messages
US7805388B2 (en) * 1998-05-01 2010-09-28 Health Discovery Corporation Method for feature selection in a support vector machine using feature ranking
US7970718B2 (en) * 2001-05-18 2011-06-28 Health Discovery Corporation Method for feature selection and for evaluating features identified as significant for classifying data
US6917925B2 (en) * 2001-03-30 2005-07-12 Intelligent Inference Systems Corporation Convergent actor critic-based fuzzy reinforcement learning apparatus and method
US6665651B2 (en) * 2001-07-18 2003-12-16 Colorado State University Research Foundation Control system and technique employing reinforcement learning having stability and learning phases
US6954744B2 (en) * 2001-08-29 2005-10-11 Honeywell International, Inc. Combinatorial approach for supervised neural network learning
US7837543B2 (en) 2004-04-30 2010-11-23 Microsoft Corporation Reward-driven adaptive agents for video games
US20060050953A1 (en) * 2004-06-18 2006-03-09 Farmer Michael E Pattern recognition method and apparatus for feature selection and object classification
US7454388B2 (en) * 2005-05-07 2008-11-18 Thaler Stephen L Device for the autonomous bootstrapping of useful information
JP5330138B2 (ja) 2008-11-04 2013-10-30 本田技研工業株式会社 強化学習システム
CN101466111B (zh) * 2009-01-13 2010-11-17 中国人民解放军理工大学通信工程学院 基于政策规划约束q学习的动态频谱接入方法
WO2012000648A1 (en) * 2010-06-28 2012-01-05 Precitec Kg Method for closed-loop controlling a laser processing operation and laser material processing head using the same
US9015093B1 (en) * 2010-10-26 2015-04-21 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
CN102207928B (zh) * 2011-06-02 2013-04-24 河海大学常州校区 基于强化学习的多Agent污水处理决策支持系统
US8943008B2 (en) * 2011-09-21 2015-01-27 Brain Corporation Apparatus and methods for reinforcement learning in artificial neural networks
JP5879899B2 (ja) 2011-10-12 2016-03-08 ソニー株式会社 情報処理装置、情報処理方法、及びプログラム
US9134707B2 (en) * 2012-03-30 2015-09-15 Board Of Regents, The University Of Texas System Optimal online adaptive controller
US20140025613A1 (en) * 2012-07-20 2014-01-23 Filip Ponulak Apparatus and methods for reinforcement learning in large populations of artificial spiking neurons
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
RU2542901C1 (ru) 2014-03-11 2015-02-27 Федеральное государственное казенное военное образовательное учреждение высшего профессионального образования "ВОЕННАЯ АКАДЕМИЯ СВЯЗИ имени Маршала Советского Союза С.М. Буденного" Министерства обороны Российской Федерации Способ управления компьютерной сетью
EP3326114B1 (en) * 2015-07-24 2024-09-04 DeepMind Technologies Limited Continuous control with deep reinforcement learning

Also Published As

Publication number Publication date
CA2993551A1 (en) 2017-02-02
BR112018001520A2 (pt) 2019-05-07
US10776692B2 (en) 2020-09-15
IL257103A (en) 2018-03-29
EP3326114B1 (en) 2024-09-04
DE112016003350T5 (de) 2018-04-05
IL257103B (en) 2021-09-30
AU2016297852A1 (en) 2018-02-08
CN108027897A (zh) 2018-05-11
CN114757333A (zh) 2022-07-15
US20170024643A1 (en) 2017-01-26
CA2993551C (en) 2022-10-11
WO2017019555A1 (en) 2017-02-02
AU2016297852C1 (en) 2019-12-05
EP3326114A1 (en) 2018-05-30
KR102165126B1 (ko) 2020-10-13
RU2686030C1 (ru) 2019-04-23
KR20180034553A (ko) 2018-04-04
AU2016297852B2 (en) 2019-08-22
CN114757333B (zh) 2025-12-12
JP2018525759A (ja) 2018-09-06
US20240177002A1 (en) 2024-05-30
GB201802748D0 (en) 2018-04-04
CN108027897B (zh) 2022-04-12
US20200410351A1 (en) 2020-12-31
US11803750B2 (en) 2023-10-31
JP6664480B2 (ja) 2020-03-13
GB2559491A (en) 2018-08-08

Similar Documents

Publication Publication Date Title
MX2018000942A (es) Control continuo con aprendizaje de refuerzo profundo.
Antonakis et al. What makes articles highly cited?
EP4386624A3 (en) Environment navigation using reinforcement learning
MX2017004340A (es) Sistema de control de perforación integrado y método asociado.
BR112016022195A2 (pt) Codificação diferencial em redes neurais
MX2018002206A (es) Composiciones no toxicas de agente para plantas y sus metodos y usos.
BR112016022268A2 (pt) Treinamento, reconhecimento e geração em uma rede de extrema convicção de pico (dbn)
JP2016523402A5 (es)
WO2016028472A3 (en) Automatically adjusting spreadsheet formulas and/or formatting
BR112016030948A2 (pt) Método para simulação do tempo necessário para executar uma operação marítima restrita ao clima
WO2015119963A3 (en) Short-term synaptic memory based on a presynaptic spike
Brodsky et al. A microscopic approach to Souslin-tree constructions, Part I
BR112018071600A2 (pt) métodos, composições e usos relacionados aos mesmos
BR112017011944A2 (pt) aparelhos e métodos para monitorar sistemas elétricos submarinos utilizando modelos adaptativos
WO2016004350A3 (en) Systems and methods for monitoring product development
MX2015008690A (es) Sistema y procedimiento para análisis prescriptivos.
WO2019071041A3 (en) System and method for compact tree representation for machine learning
BR112016001923A2 (pt) método implementado por computador, meio legível por computador não transitório e sistema para realizar simulações de fluxo de árvore declassificação e regressão (cart) para um reservatório
PH12017000097B1 (en) Game system, server, and donation control method
MX363104B (es) Herramienta para flujometro diferencial.
AR103042A1 (es) Supresión del arni parental del gen kruppel para controlar plagas de coleópteros
BR112017011513A2 (pt) tratamento de condições oculares usando células progenitoras
BR112017001981A2 (pt) processador vetorial
TW201614530A (en) Method for modeling a photoresist profile
CL2018001307A1 (es) Procedimiento para la determinación asistida por ordenador de parámetros de un acumulador de energía electroquímico