<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Feryal Behbahani on Feryal Behbahani</title>
    <link>https://feryal.github.io/</link>
    <description>Recent content in Feryal Behbahani on Feryal Behbahani</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>&amp;copy; 2017 Feryal Behbahani</copyright>
    <lastBuildDate>Mon, 17 Jul 2017 00:00:00 +0000</lastBuildDate>
    <atom:link href="/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Acme: A Research Framework for Distributed Reinforcement Learning</title>
      <link>https://feryal.github.io/publication/acme/</link>
      <pubDate>Wed, 15 Jul 2020 22:00:19 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/acme/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Tutorial on RL (EEML 2020)</title>
      <link>https://feryal.github.io/project/eeml/</link>
      <pubDate>Wed, 15 Jul 2020 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/project/eeml/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Privileged Information Dropout in Reinforcement Learning</title>
      <link>https://feryal.github.io/publication/privinfodropout/</link>
      <pubDate>Wed, 15 Jan 2020 22:00:19 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/privinfodropout/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation</title>
      <link>https://feryal.github.io/publication/analysis/</link>
      <pubDate>Fri, 15 Nov 2019 22:00:19 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/analysis/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Modular Meta-Learning with Shrinkage</title>
      <link>https://feryal.github.io/publication/shrinkage/</link>
      <pubDate>Fri, 15 Nov 2019 22:00:19 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/shrinkage/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Learning from Demonstration: Applications and challenges</title>
      <link>https://feryal.github.io/talk/oxford/</link>
      <pubDate>Mon, 26 Nov 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/talk/oxford/</guid>
      <description>&lt;p&gt;Recent advances in deep reinforcement learning have enabled a wide range of capabilities, from learning to play video games to acquiring robotic visuomotor skills. However, there are a wide range of problems where hand-coding behaviour or a reward function is impractical. Learning from demonstration (LfD) serves as an essential tool for learning skills that are difficult to program by hand. These demonstrations provide snapshots of near-optimal behaviours, offering guidance for the learning process and alleviating the need to start from scratch or manually engineering parts of the solution. However, it is often unclear how to acquire these in non-controlled settings, and the new challenges that arise when trying to apply these techniques in the real world.&lt;/p&gt;

&lt;p&gt;In this talk, I will present some of the recent techniques that can help us bridge that gap and learn realistic behaviours from a large source of untapped data already existing “in the wild”. I will cover some of the latest LfD approaches that leverage recent advances in deep learning and generative adversarial methods. I will present our recent work, video to behaviour (ViBe), which can extract realistic behaviours from raw unlabelled video data, without additional expert knowledge. We can automatically extract trajectories and use them to perform LfD through a novel curriculum and cope with multiple agents interacting in complex settings. I will finish with a discussion of open questions and future research directions required to extend these approaches further.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Learning from Demonstration in the Wild</title>
      <link>https://feryal.github.io/publication/vibe/</link>
      <pubDate>Sat, 15 Sep 2018 22:00:19 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/vibe/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Automated Curriculum Learning for Reinforcement Learning</title>
      <link>https://feryal.github.io/publication/acl/</link>
      <pubDate>Fri, 07 Sep 2018 21:53:04 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/acl/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Extending World Models for Multi-Agent Reinforcement Learning in MALMÖ</title>
      <link>https://feryal.github.io/publication/malmo/</link>
      <pubDate>Fri, 07 Sep 2018 21:53:04 +0100</pubDate>
      
      <guid>https://feryal.github.io/publication/malmo/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Learning from Demonstration in the Wild</title>
      <link>https://feryal.github.io/project/vibe/</link>
      <pubDate>Sat, 25 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/project/vibe/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Automated Curriculum Learning for Reinforcement Learning</title>
      <link>https://feryal.github.io/project/acl/</link>
      <pubDate>Wed, 15 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/project/acl/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Craft Environment</title>
      <link>https://feryal.github.io/project/craftenv/</link>
      <pubDate>Wed, 15 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/project/craftenv/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Automated Curriculum Learning for Reinforcement Learning</title>
      <link>https://feryal.github.io/talk/jeju/</link>
      <pubDate>Sat, 28 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/talk/jeju/</guid>
      <description>

&lt;h2 id=&#34;how-would-you-make-an-agent-capable-of-solving-the-complex-hierarchical-tasks&#34;&gt;How would you make an agent capable of solving the complex hierarchical tasks?&lt;/h2&gt;

&lt;p&gt;Imagine a problem that is complex and requires a collection of skills, which are extremely hard to learn in one go with sparse rewards (e.g. solving complex object manipulation in robotics). Hence, one might need to learn to generate a curriculum of simpler tasks, so that overall a student network can learn to perform a complex task efficiently. Designing this curriculum by hand is inefficient. In this project, I set out to train an automatic curriculum generator using a Teacher network which keeps track of the progress of the student network, and proposes new tasks as a function of how well the student is learning. I adapted an state-of-the-art distributed reinforcement learning algorithm, for training the student network, while using an adversarial multi-armed bandit algorithm, for teacher network. I also developed an environment, Craft Env, with possibility of hierarchical task design with a range of complexity that is fast to iterate through. I analysed how using different metrics for quantifying student progress affect the curriculum that the teacher learns to propose and demonstrate that this approach can accelerate learning and interpretability of how the agent is learning to perform complex tasks. In order to start, I adapted the Craft Environment from work by Andreas et al.,[1] as it has a nice simple structure with possibility of hierarchical task design with a range of complexity that is fast to iterate through. I have developed a fixed curriculum of simpler target sub-tasks (in craft environment: &amp;ldquo;get wood&amp;rdquo; &amp;ldquo;get grass&amp;rdquo; &amp;ldquo;get iron&amp;rdquo; &amp;ldquo;make cloth&amp;rdquo; &amp;ldquo;get gold&amp;rdquo;), and in the future will make a teacher network who proposes tasks for the student to learn. I could also kick-start the student with demonstrations from an expert.&lt;/p&gt;

&lt;p&gt;I have interfaced IMPALA[2], a GPU utilised version of A3C architecture which uses multiple distributed actors with V-Trace off-policy correction, with my Craft Environment to train on all the possible Craft tasks concurrently. This is possible by providing the hash of the task name as instruction to the network (similar setup to DMLab IMPALA, using an LSTM to process the instruction).&lt;/p&gt;

&lt;p&gt;I have interfaced IMPALA[2], a GPU utilised version of A3C architecture which uses multiple distributed actors with V-Trace off-policy correction, with my Craft Environment to train on all the possible Craft tasks concurrently. This is possible by providing the hash of the task name as instruction to the network (similar setup to DMLab IMPALA, using an LSTM to process the instruction).&lt;/p&gt;

&lt;p&gt;Other papers that I am inspired by in this work include [3], [4].&lt;/p&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;p&gt;[1] &lt;a href=&#34;https://arxiv.org/abs/1611.01796&#34;&gt;Modular Multitask Reinforcement Learning with Policy Sketches&lt;/a&gt; (Andreas et al., 2016)&lt;/p&gt;

&lt;p&gt;[2] &lt;a href=&#34;https://arxiv.org/abs/1704.03003&#34;&gt;Automated Curriculum Learning for Neural Networks&lt;/a&gt; (Graves et al., 2017)&lt;/p&gt;

&lt;p&gt;[3] &lt;a href=&#34;https://arxiv.org/abs/1802.10567&#34;&gt;Learning by Playing-Solving Sparse Reward Tasks from Scratch&lt;/a&gt; (Reidmiller et al., 2018)&lt;/p&gt;

&lt;p&gt;[4] &lt;a href=&#34;https://arxiv.org/abs/1602.01783&#34;&gt;POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem&lt;/a&gt; (Schmidhuber, 2011)&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Successor Representations</title>
      <link>https://feryal.github.io/post/successorrepresentations/</link>
      <pubDate>Fri, 23 Mar 2018 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/post/successorrepresentations/</guid>
      <description>

&lt;p&gt;Successor representations were introduced by &lt;a href=&#34;#references&#34;&gt;Dayan in 1993&lt;/a&gt;, as a way to
represent states by thinking of how &amp;ldquo;similarity&amp;rdquo; for TD learning is similar
to the temporal sequence of states that can be reached from a given state.&lt;/p&gt;

&lt;p&gt;Dayan derived it in the tabular case, but let&amp;rsquo;s do it when assuming a feature
vector $\phi$.&lt;/p&gt;

&lt;p&gt;We assumes that the reward function can be factorised linearly:
$$r(s) = \phi(s) \cdot w$$&lt;/p&gt;

&lt;p&gt;This can then be inserted back into the value expectation formula:
$$\begin{array}{rl}
Q^\pi(s, a) &amp;amp;= E \left\lbrack \sum_{t=0}^\infty \gamma^t R(s_t) ~|~ s_o = s, a_0 = a \right\rbrack  \\&lt;br /&gt;
 &amp;amp; = E\left\lbrack \sum_t^\infty \gamma^t \phi(s_t) \cdot w ~|~ s_0=s, a_0 = a \right\rbrack \\&lt;br /&gt;
 &amp;amp; = E\left\lbrack \sum_t^\infty \gamma^t \phi(s_t) ~|~ s_0=s, a_0 = a \right\rbrack \cdot w \\&lt;br /&gt;
 &amp;amp; = \psi^\pi(s, a) \cdot w
 \end{array} $$&lt;/p&gt;

&lt;p&gt;$\psi^\pi$ represents the Successor features, and does not depend on the rewards.
It also represents a partial model of the world / occupancy of the states given the behaviour policy.&lt;/p&gt;

&lt;p&gt;$w$ represents the rewards or the goal, and can be thought of as preferences about the given features $\phi(s)$ in a
given task.&lt;/p&gt;

&lt;p&gt;Assuming that the features $\phi(s)$ are good and consistent across a domain, we can imagine transfering to new
tasks really quickly, as we simply need to learn new goal vectors $w$.&lt;/p&gt;

&lt;p&gt;In Dayan&amp;rsquo;s case, the features $\phi(s)$ were directly the one-hot occupancy of a tabular state-space. This means
that the successor features directly corresponds to the visitation counts under a policy.
This is not as simple in more complex scenarios or when we want to learn $\phi$.&lt;/p&gt;

&lt;p&gt;One important observation, also mentioned in passing by Dayan, is that one can rewrite the definition of the
successor features $\psi$ using the Value expectation formula, obtaining independent Bellman updates for every components of $\phi$:&lt;/p&gt;

&lt;p&gt;$$E\left\lbrack \sum_t^\infty \gamma^t \phi_i(s_t) ~|~ s_0=s, a_0 = a \right\rbrack$$&lt;/p&gt;

&lt;p&gt;$$= E\left\lbrack \phi_i(s_t) + \gamma \psi_i(s_{t+1}, a^\star) - \psi_i(s_t, a_t) ~|~ s_0=s, a_0 = a \right\rbrack $$&lt;/p&gt;

&lt;p&gt;In other terms, the features $\phi(s)$ behave as the rewards in a Bellman update, where $\psi(s, a)$ takes the role
of the Q-values.&lt;/p&gt;

&lt;!-- ## Deep successor features

* Tejas
* Barreto
* Bengio

## Subgoal discovery using successor features

## Relation to Expected Features in Inverse Reinforcement Learning --&gt;

&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Peter Dayan, &amp;ldquo;Improving generalization for temporal difference learning: The successor representation&amp;rdquo;, Neural Computation, 5(4):613–624, 1993. &lt;a href=&#34;http://www.gatsby.ucl.ac.uk/~dayan/papers/d93b.pdf&#34;&gt;pdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- config.toml   | page preamble  | highlighting enabled for page?
--------------|----------------|-------------------------------
unset or true | unset or true  | yes
unset or true | false          | no
false         | unset or false | no
false         | true           | yes

### Imitation Learning


#### Behaviour cloning
#### Inverse Reinforcement Learning
#### Third-person Imitation Learning

### Environments

- Mujoco
- Gazeebo
- Bullet
- V-REP

### Tasks

- sim2real / darla / levine

- openai blocks

- pieter abbeel/ chelsea/ levine

&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;p&gt;Emphasise these are only a few example tasl setups.&lt;/p&gt;

&lt;/div&gt;


$$\left [ – \frac{\hbar^2}{2 m} \frac{\partial^2}{\partial x^2} + V \right ] \Psi = i \hbar \frac{\partial}{\partial t} \Psi$$

### References

- **Transfer Learning**:

    Gupta, Abhishek, et al. [Learning invariant feature spaces to transfer skills with reinforcement learning.] (https://arxiv.org/abs/1703.02949) arXiv preprint arXiv:1703.02949, 2017 ([video](https://www.youtube.com/watch?v=NK5OkvZe_K4),[pdf](https://arxiv.org/pdf/1703.02949.pdf)).


&lt;aside class=&#34;admonition note&#34;&gt;
	&lt;div class=&#34;note-icon&#34;&gt;
		
	&lt;/div&gt;
	
	
	&lt;div class=&#34;admonition-content&#34;&gt;&lt;p&gt;Before you begin writing your content in markdown, Blackfriday has a known issue &lt;a href=&#34;https://github.com/russross/blackfriday/issues/329&#34;&gt;(#329)&lt;/a&gt; with handling deeply nested lists. Luckily, there is an easy workaround. Use 4-spaces (i.e., &lt;kbd&gt;tab&lt;/kbd&gt;) rather than 2-space indentations.&lt;/p&gt;
&lt;/div&gt;
&lt;/aside&gt;

&lt;!-- &lt;script src=&#34;//gist.github.com/spf13/7896402.js&#34;&gt;&lt;/script&gt; --&gt;

&lt;!-- &lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Hugo 0.24 Released: Big archetype update + &lt;a href=&#34;https://twitter.com/Netlify?ref_src=twsrc%5Etfw&#34;&gt;@Netlify&lt;/a&gt; _redirects etc. file support&lt;a href=&#34;https://t.co/X94FmYDEZJ&#34;&gt;https://t.co/X94FmYDEZJ&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/gohugo?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#gohugo&lt;/a&gt; &lt;a href=&#34;https://twitter.com/hashtag/golang?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#golang&lt;/a&gt; &lt;a href=&#34;https://twitter.com/spf13?ref_src=twsrc%5Etfw&#34;&gt;@spf13&lt;/a&gt; &lt;a href=&#34;https://twitter.com/bepsays?ref_src=twsrc%5Etfw&#34;&gt;@bepsays&lt;/a&gt;&lt;/p&gt;&amp;mdash; GoHugo.io (@GoHugoIO) &lt;a href=&#34;https://twitter.com/GoHugoIO/status/877500564405444608?ref_src=twsrc%5Etfw&#34;&gt;June 21, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
  --&gt;

&lt;!-- 
&lt;div style=&#34;position: relative; padding-bottom: 56.25%; padding-top: 30px; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;//www.youtube.com/embed/w7Ft2ymGmfc?autoplay=1&#34; 
  style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%;&#34; allowfullscreen frameborder=&#34;0&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt; --&gt;

&lt;!-- &lt;script async class=&#39;speakerdeck-embed&#39; data-id=&#39;4e8126e72d853c0060001f97&#39; data-ratio=&#39;1.33333333333333&#39; src=&#39;//speakerdeck.com/assets/embed.js&#39;&gt;&lt;/script&gt; --&gt;
</description>
    </item>
    
    <item>
      <title>What Would it Take to Train an Agent to Play with a Shape-Sorter?</title>
      <link>https://feryal.github.io/talk/rework/</link>
      <pubDate>Fri, 22 Sep 2017 00:00:00 +0000</pubDate>
      
      <guid>https://feryal.github.io/talk/rework/</guid>
      <description>&lt;p&gt;The capabilities of humans to precisely and robustly recognise and manipulate objects has been instrumental in the development of human cognition. However, understanding and replicating this process has proven to be difficult. This is of particular importance when thinking of agents or robots acting in naturalistic environments, solving complex tasks. I will present recent work in this direction, focusing on computational optimality and Deep Reinforcement Learning techniques, to discover how to manipulate objects within a 3D physics simulator from high-dimensional sensory observations.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
