default search action
Richard S. Sutton
Person information
- affiliation: DeepMind Alberta, Edmonton, AB, Canada
- affiliation: University of Alberta, Department of Computing Science, Edmonton, AB, Canada
- affiliation (PhD 1984): University of Massachusetts Amherst, MA, USA
Other persons with the same name
- Richard Sutton 0002 — Skyhook Wireless, Boston, MA, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j29]Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton:
SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning. RLJ 2: 840-863 (2024) - [j28]Kris De Asis, Richard S. Sutton:
An Idiosyncrasy of Time-discretization in Reinforcement Learning. RLJ 3: 1306-1316 (2024) - [j27]Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton:
Reward Centering. RLJ 4: 1995-2016 (2024) - [j26]Shibhansh Dohare, J. Fernando Hernandez-Garcia, Qingfeng Lan, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton:
Loss of plasticity in deep continual learning. Nat. 632(8026): 768-774 (2024) - [c87]Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-Respecting Subtasks for Model-Based Reinforcement Learning (Abstract Reprint). AAAI 2024: 22713 - [i73]Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard S. Sutton:
Step-size Optimization for Continual Learning. CoRR abs/2401.17401 (2024) - [i72]Arsalan Sharifnassab, Saber Salehkaleybar, Richard S. Sutton:
MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters. CoRR abs/2402.02342 (2024) - [i71]Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton:
Reward Centering. CoRR abs/2405.09999 (2024) - [i70]Kris De Asis, Richard S. Sutton:
An Idiosyncrasy of Time-discretization in Reinforcement Learning. CoRR abs/2406.14951 (2024) - [i69]Yi Wan, Huizhen Yu, Richard S. Sutton:
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes. CoRR abs/2408.16262 (2024) - [i68]Huizhen Yu, Yi Wan, Richard S. Sutton:
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning. CoRR abs/2409.03915 (2024) - 2023
- [j25]Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard S. Sutton, Elliot A. Ludvig, Adam White:
From eye-blinks to state construction: Diagnostic benchmarks for online representation learning. Adapt. Behav. 31(1): 3-19 (2023) - [j24]Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-respecting subtasks for model-based reinforcement learning. Artif. Intell. 324: 104001 (2023) - [j23]Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White:
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks. J. Mach. Learn. Res. 24: 256:1-256:34 (2023) - [j22]Kory W. Mathewson, Adam S. R. Parker, Craig Sherstan, Ann L. Edwards, Richard S. Sutton, Patrick M. Pilarski:
Communicative capital: a key resource for human-machine shared agency and collaborative capacity. Neural Comput. Appl. 35(23): 16805-16819 (2023) - [c86]Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard S. Sutton, Jun Luo, Adam White:
Auxiliary task discovery through generate-and-test. CoLLAs 2023: 703-714 - [c85]Kristopher De Asis, Eric Graves, Richard S. Sutton:
Value-aware Importance Weighting for Off-policy Reinforcement Learning. CoLLAs 2023: 745-763 - [c84]Arsalan Sharifnassab, Richard S. Sutton:
Toward Efficient Gradient-Based Value Estimation. ICML 2023: 30827-30849 - [i67]Arsalan Sharifnassab, Richard Sutton:
Toward Efficient Gradient-Based Value Estimation. CoRR abs/2301.13757 (2023) - [i66]Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White:
Online Real-Time Recurrent Learning Using Sparse Connections and Selective Learning. CoRR abs/2302.05326 (2023) - [i65]Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sutton, A. Rupam Mahmood:
Maintaining Plasticity in Deep Continual Learning. CoRR abs/2306.13812 (2023) - [i64]Kristopher De Asis, Eric Graves, Richard S. Sutton:
Value-aware Importance Weighting for Off-policy Reinforcement Learning. CoRR abs/2306.15625 (2023) - [i63]Kenny Young, Richard S. Sutton:
Iterative Option Discovery for Planning, by Planning. CoRR abs/2310.01569 (2023) - [i62]Huizhen Yu, Yi Wan, Richard S. Sutton:
A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays. CoRR abs/2312.15091 (2023) - 2022
- [c83]Tian Tian, Kenny Young, Richard S. Sutton:
Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions. NeurIPS 2022 - [i61]Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-Respecting Subtasks for Model-Based Reinforcement Learning. CoRR abs/2202.03466 (2022) - [i60]Richard S. Sutton:
A History of Meta-gradient: Gradient Methods for Meta-learning. CoRR abs/2202.09701 (2022) - [i59]Richard S. Sutton:
The Quest for a Common Model of the Intelligent Decision Maker. CoRR abs/2202.13252 (2022) - [i58]Yi Wan, Richard S. Sutton:
Toward Discovering Options that Achieve Faster Planning. CoRR abs/2205.12515 (2022) - [i57]Tian Tian, Kenny Young, Richard S. Sutton:
Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions. CoRR abs/2207.01613 (2022) - [i56]Richard S. Sutton, Michael H. Bowling, Patrick M. Pilarski:
The Alberta Plan for AI Research. CoRR abs/2208.11173 (2022) - [i55]Yi Wan, Richard S. Sutton:
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs. CoRR abs/2209.15141 (2022) - [i54]Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard S. Sutton, Jun Luo, Adam White:
Auxiliary task discovery through generate-and-test. CoRR abs/2210.14361 (2022) - 2021
- [j21]David Silver, Satinder Singh, Doina Precup, Richard S. Sutton:
Reward is enough. Artif. Intell. 299: 103535 (2021) - [j20]Jae Young Lee, Richard S. Sutton:
Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods. Autom. 126: 109421 (2021) - [j19]Andrew G. Barto, Richard S. Sutton, Charles W. Anderson:
Looking Back on the Actor-Critic Architecture. IEEE Trans. Syst. Man Cybern. Syst. 51(1): 40-50 (2021) - [c82]Yi Wan, Abhishek Naik, Richard S. Sutton:
Learning and Planning in Average-Reward Markov Decision Processes. ICML 2021: 10653-10662 - [c81]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. ICML 2021: 12578-12588 - [c80]Yi Wan, Abhishek Naik, Richard S. Sutton:
Average-Reward Learning and Planning with Options. NeurIPS 2021: 22758-22769 - [i53]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. CoRR abs/2101.02808 (2021) - [i52]Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton:
Does Standard Backpropagation Forget Less Catastrophically Than Adam? CoRR abs/2102.07686 (2021) - [i51]Khurram Javed, Martha White, Richard S. Sutton:
Scalable Online Recurrent Learning Using Columnar Neural Networks. CoRR abs/2103.05787 (2021) - [i50]Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton:
Planning with Expectation Models for Control. CoRR abs/2104.08543 (2021) - [i49]Sina Ghiassian, Richard S. Sutton:
An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task. CoRR abs/2106.00922 (2021) - [i48]Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton:
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR abs/2108.06325 (2021) - [i47]Sina Ghiassian, Richard S. Sutton:
An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment. CoRR abs/2109.05110 (2021) - [i46]Yi Wan, Abhishek Naik, Richard S. Sutton:
Average-Reward Learning and Planning with Options. CoRR abs/2110.13855 (2021) - [i45]Amir Samani, Richard S. Sutton:
Learning Agent State Online with Recurrent Generate-and-Test. CoRR abs/2112.15236 (2021) - 2020
- [j18]Dagmar Monett, Colin W. P. Lewis, Kristinn R. Thórisson, Joscha Bach, Gianluca Baldassarre, Giovanni Granato, Istvan S. N. Berkeley, François Chollet, Matthew Crosby, Henry Shevlin, John F. Sowa, John E. Laird, Shane Legg, Peter Lindes, Tomás Mikolov, William J. Rapaport, Raúl Rojas, Marek Rosa, Peter Stone, Richard S. Sutton, Roman V. Yampolskiy, Pei Wang, Roger C. Schank, Aaron Sloman, Alan F. T. Winfield:
Special Issue "On Defining Artificial Intelligence" - Commentaries and Author's Response. J. Artif. Gen. Intell. 11(2): 1-100 (2020) - [c79]Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves:
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning. AAAI 2020: 3741-3748 - [c78]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. ICLR 2020 - [i44]Yi Wan, Abhishek Naik, Richard S. Sutton:
Learning and Planning in Average-Reward Markov Decision Processes. CoRR abs/2006.16318 (2020) - [i43]Alan Chan, Kristopher De Asis, Richard S. Sutton:
Inverse Policy Evaluation for Value-based Sequential Decision-making. CoRR abs/2008.11329 (2020) - [i42]Katya Kudashkina, Patrick M. Pilarski, Richard S. Sutton:
Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI. CoRR abs/2008.12095 (2020) - [i41]Kenny Young, Richard S. Sutton:
Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning. CoRR abs/2010.15268 (2020)
2010 – 2019
- 2019
- [c77]Banafsheh Rafiee, Sina Ghiassian, Adam White, Richard S. Sutton:
Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots. AAMAS 2019: 332-340 - [c76]Tian Tian, Richard S. Sutton:
Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning. IJCAI 2019: 67-82 - [c75]Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton:
Planning with Expectation Models. IJCAI 2019: 3649-3655 - [i40]J. Fernando Hernandez-Garcia, Richard S. Sutton:
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target. CoRR abs/1901.07510 (2019) - [i39]Xiang Gu, Sina Ghiassian, Richard S. Sutton:
Should All Temporal Difference Learning Use Emphasis? CoRR abs/1903.00194 (2019) - [i38]Alexandra Kearney, Vivek Veeriah, Jaden B. Travnik, Patrick M. Pilarski, Richard S. Sutton:
Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning. CoRR abs/1903.03252 (2019) - [i37]Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton:
Planning with Expectation Models. CoRR abs/1904.01191 (2019) - [i36]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. CoRR abs/1908.03568 (2019) - [i35]Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves:
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning. CoRR abs/1909.03906 (2019) - [i34]Abhishek Naik, Roshan Shariff, Niko Yasui, Richard S. Sutton:
Discounted Reinforcement Learning is Not an Optimization Problem. CoRR abs/1910.02140 (2019) - [i33]J. Fernando Hernandez-Garcia, Richard S. Sutton:
Learning Sparse Representations Incrementally in Deep Reinforcement Learning. CoRR abs/1912.04002 (2019) - 2018
- [j17]Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski:
Reactive Reinforcement Learning in Asynchronous Environments. Frontiers Robotics AI 5: 79 (2018) - [j16]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018) - [c74]Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton:
Multi-Step Reinforcement Learning: A Unifying Algorithm. AAAI 2018: 2902-2909 - [c73]Craig Sherstan, Dylan R. Ashley, Brendan Bennett, Kenny Young, Adam White, Martha White, Richard S. Sutton:
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return. UAI 2018: 63-72 - [c72]Kristopher De Asis, Richard S. Sutton:
Per-decision Multi-step Temporal Difference Learning with Control Variates. UAI 2018: 786-794 - [i32]Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton:
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods. CoRR abs/1801.08287 (2018) - [i31]Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski:
Reactive Reinforcement Learning in Asynchronous Environments. CoRR abs/1802.06139 (2018) - [i30]Alexandra Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski:
TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent. CoRR abs/1804.03334 (2018) - [i29]Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton:
Two geometric input transformation methods for fast online reinforcement learning with neural nets. CoRR abs/1805.07476 (2018) - [i28]Kenny J. Young, Richard S. Sutton, Shuo Yang:
Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling. CoRR abs/1806.00540 (2018) - [i27]Kristopher De Asis, Richard S. Sutton:
Per-decision Multi-step Temporal Difference Learning with Control Variates. CoRR abs/1807.01830 (2018) - [i26]Kristopher De Asis, Brendan Bennett, Richard S. Sutton:
Predicting Periodicity with Temporal Difference Learning. CoRR abs/1809.07435 (2018) - [i25]Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White:
Online Off-policy Prediction. CoRR abs/1811.02597 (2018) - 2017
- [c71]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian AI 2017: 3-14 - [c70]Vivek Veeriah, Harm van Seijen, Richard S. Sutton:
Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning. AAMAS 2017: 556-564 - [c69]Vivek Veeriah, Shangtong Zhang, Richard S. Sutton:
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks. ECML/PKDD (1) 2017: 445-459 - [i24]Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017) - [i23]Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton:
Multi-step Reinforcement Learning: A Unifying Algorithm. CoRR abs/1703.01327 (2017) - [i22]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017) - [i21]Jae Young Lee, Richard S. Sutton:
Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space. CoRR abs/1705.03520 (2017) - [i20]Adam White, Richard S. Sutton:
GQ($λ$) Quick Reference and Implementation Guide. CoRR abs/1705.03967 (2017) - [i19]Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton:
A First Empirical Study of Emphatic Temporal Difference Learning. CoRR abs/1705.04185 (2017) - [i18]Patrick M. Pilarski, Richard S. Sutton, Kory W. Mathewson, Craig Sherstan, Adam S. R. Parker, Ann L. Edwards:
Communicative Capital for Prosthetic Agents. CoRR abs/1711.03676 (2017) - [i17]Shangtong Zhang, Richard S. Sutton:
A Deeper Look at Experience Replay. CoRR abs/1712.01275 (2017) - 2016
- [j15]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. J. Mach. Learn. Res. 17: 73:1-73:29 (2016) - [j14]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. J. Mach. Learn. Res. 17: 145:1-145:40 (2016) - [i16]Vivek Veeriah, Patrick M. Pilarski, Richard S. Sutton:
Face valuing: Training user interfaces with facial expressions and reinforcement learning. CoRR abs/1606.02807 (2016) - [i15]Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, Katie Witkiewitz:
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward. CoRR abs/1607.05047 (2016) - [i14]Richard S. Sutton, Vivek Veeriah:
Learning representations through stochastic gradient descent in cross-validation error. CoRR abs/1612.02879 (2016) - 2015
- [c68]Harm Vanseijen, Richard S. Sutton:
A Deeper Look at Planning as Learning from Replay. ICML 2015: 2314-2322 - [c67]Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy learning based on weighted importance sampling with linear computational complexity. UAI 2015: 552-561 - [i13]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. CoRR abs/1503.04269 (2015) - [i12]Richard S. Sutton, Brian Tanner:
Temporal-Difference Networks. CoRR abs/1504.05539 (2015) - [i11]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton:
An Empirical Evaluation of True Online TD(λ). CoRR abs/1507.00353 (2015) - [i10]Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015) - [i9]Richard S. Sutton:
True Online Emphatic TD(λ): Quick Reference and Implementation Guide. CoRR abs/1507.07147 (2015) - [i8]Hado van Hasselt, Richard S. Sutton:
Learning to Predict Independent of Span. CoRR abs/1508.04582 (2015) - [i7]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. CoRR abs/1512.04087 (2015) - 2014
- [j13]Joseph Modayil, Adam White, Richard S. Sutton:
Multi-timescale nexting in a reinforcement learning robot. Adapt. Behav. 22(2): 146-160 (2014) - [c66]Richard S. Sutton, Ashique Rupam Mahmood, Doina Precup, Hado van Hasselt:
A new Q(lambda) with interim forward view and Monte Carlo equivalence. ICML 2014: 568-576 - [c65]Harm van Seijen, Richard S. Sutton:
True Online TD(lambda). ICML 2014: 692-700 - [c64]Hengshuai Yao, Csaba Szepesvári, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar:
Universal Option Models. NIPS 2014: 990-998 - [c63]Ashique Rupam Mahmood, Hado van Hasselt, Richard S. Sutton:
Weighted importance sampling for off-policy learning with linear function approximation. NIPS 2014: 3014-3022 - [c62]Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy TD( l) with a true online equivalence. UAI 2014: 330-339 - 2013
- [j12]Patrick M. Pilarski, Michael Rory Dawson, Thomas Degris, Jason P. Carey, K. Ming Chan, Jacqueline S. Hebert, Richard S. Sutton:
Adaptive Artificial Limbs: A Real-Time Approach to Prediction and Anticipation. IEEE Robotics Autom. Mag. 20(1): 53-64 (2013) - [c61]Ashique Rupam Mahmood, Richard S. Sutton:
Representation Search through Generate and Test. AAAI Workshop: Learning Rich Representations from Low-Level Sensors 2013 - [c60]David Silver, Richard S. Sutton, Martin Müller:
Temporal-Difference Search in Computer Go. ICAPS 2013 - [c59]Harm van Seijen, Richard S. Sutton:
Planning by Prioritized Sweeping with Small Backups. ICML (3) 2013: 361-369 - [c58]Patrick M. Pilarski, Travis B. Dick, Richard S. Sutton:
Real-time prediction learning for the simultaneous actuation of multiple prosthetic joints. ICORR 2013: 1-8 - [c57]Ashique Rupam Mahmood, Richard S. Sutton:
Position Paper: Representation Search through Generate and Test. SARA 2013 - [i6]Harm van Seijen, Richard S. Sutton:
Planning by Prioritized Sweeping with Small Backups. CoRR abs/1301.2343 (2013) - [i5]Ann L. Edwards, Alexandra Kearney, Michael Rory Dawson, Richard S. Sutton, Patrick M. Pilarski:
Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb. CoRR abs/1309.4714 (2013) - 2012
- [j11]David Silver, Richard S. Sutton, Martin Müller:
Temporal-difference search in computer Go. Mach. Learn. 87(2): 183-219 (2012) - [c56]Patrick M. Pilarski, Richard S. Sutton:
Between Instruction and Reward: Human-Prompted Switching. AAAI Fall Symposium: Robots Learning Interactively from Human Teachers 2012 - [c55]Thomas Degris, Patrick M. Pilarski, Richard S. Sutton:
Model-Free reinforcement learning with continuous action in practice. ACC 2012: 2177-2182 - [c54]Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski:
Tuning-free step-size adaptation. ICASSP 2012: 2121-2124 - [c53]Adam White, Joseph Modayil, Richard S. Sutton:
Scaling life-long off-policy learning. ICDL-EPIROB 2012: 1-6 - [c52]Thomas Degris, Martha White, Richard S. Sutton:
Linear Off-Policy Actor-Critic. ICML 2012 - [c51]Joseph Modayil, Adam White, Richard S. Sutton:
Multi-timescale Nexting in a Reinforcement Learning Robot. SAB 2012: 299-309 - [c50]Joseph Modayil, Adam White, Patrick M. Pilarski, Richard S. Sutton:
Acquiring a broad range of empirical knowledge in real time by temporal-difference learning. SMC 2012: 1903-1910 - [i4]Thomas Degris, Martha White, Richard S. Sutton:
Off-Policy Actor-Critic. CoRR abs/1205.4839 (2012) - [i3]Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael Bowling:
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. CoRR abs/1206.3285 (2012) - [i2]Adam White, Joseph Modayil, Richard S. Sutton:
Scaling Life-long Off-policy Learning. CoRR abs/1206.6262 (2012) - 2011
- [c49]Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, Doina Precup:
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. AAMAS 2011: 761-768 - [c48]Richard S. Sutton:
Beyond Reward: The Problem of Knowledge and Data. ILP 2011: 2-6 - [i1]Joseph Modayil, Adam White, Richard S. Sutton:
Multi-timescale Nexting in a Reinforcement Learning Robot. CoRR abs/1112.1133 (2011) - 2010
- [c47]Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton:
Toward Off-Policy Learning Control with Function Approximation. ICML 2010: 719-726
2000 – 2009
- 2009
- [j10]Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee:
Natural actor-critic algorithms. Autom. 45(11): 2471-2482 (2009) - [c46]Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora:
Fast gradient-descent methods for temporal-difference learning with linear function approximation. ICML 2009: 993-1000 - [c45]Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton:
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. NIPS 2009: 1204-1212 - [c44]Hengshuai Yao, Richard S. Sutton, Shalabh Bhatnagar, Diao Dongcui, Csaba Szepesvári:
Multi-Step Dyna Planning for Policy Evaluation and Control. NIPS 2009: 2187-2195 - 2008
- [j9]Elliot A. Ludvig, Richard S. Sutton, E. James Kehoe:
Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System. Neural Comput. 20(12): 3034-3054 (2008) - [c43]Maria Cutumisu, Duane Szafron, Michael H. Bowling, Richard S. Sutton:
Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games. AIIDE 2008 - [c42]David Silver, Richard S. Sutton, Martin Müller:
Sample-based learning and search with permanent and transient memories. ICML 2008: 968-975 - [c41]Elliot A. Ludvig, Richard S. Sutton, Eric Verbeek, E. James Kehoe:
A computational model of hippocampal function in trace conditioning. NIPS 2008: 993-1000 - [c40]Richard S. Sutton, Csaba Szepesvári, Hamid Reza Maei:
A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation. NIPS 2008: 1609-1616 - [c39]Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael H. Bowling:
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. UAI 2008: 528-536 - 2007
- [c38]Richard S. Sutton, Anna Koop, David Silver:
On the role of tracking in stationary environments. ICML 2007: 871-878 - [c37]David Silver, Richard S. Sutton, Martin Müller:
Reinforcement Learning of Local Shape in the Game of Go. IJCAI 2007: 1053-1058 - [c36]Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee:
Incremental Natural Actor-Critic Algorithms. NIPS 2007: 105-112 - 2006
- [c35]Alborz Geramifard, Michael H. Bowling, Richard S. Sutton:
Incremental Least-Squares Temporal Difference Learning. AAAI 2006: 356-361 - [c34]Alborz Geramifard, Michael H. Bowling, Martin Zinkevich, Richard S. Sutton:
iLSTD: Eligibility Traces and Convergence Analysis. NIPS 2006: 441-448 - 2005
- [j8]Peter Stone, Richard S. Sutton, Gregory Kuhlmann:
Reinforcement Learning for RoboCup Soccer Keepaway. Adapt. Behav. 13(3): 165-188 (2005) - [c33]Brian Tanner, Richard S. Sutton:
TD(lambda) networks: temporal-difference networks with eligibility traces. ICML 2005: 888-895 - [c32]Eddie J. Rafols, Mark B. Ring, Richard S. Sutton, Brian Tanner:
Using Predictive Representations to Improve Generalization in Reinforcement Learning. IJCAI 2005: 835-840 - [c31]Brian Tanner, Richard S. Sutton:
Temporal-Difference Networks with History. IJCAI 2005: 865-870 - [c30]Doina Precup, Richard S. Sutton, Cosmin Paduraru, Anna Koop, Satinder Singh:
Off-policy Learning with Options and Recognizers. NIPS 2005: 1097-1104 - [c29]Richard S. Sutton, Eddie J. Rafols, Anna Koop:
Temporal Abstraction in Temporal-difference Networks. NIPS 2005: 1313-1320 - 2004
- [c28]Richard S. Sutton, Brian Tanner:
Temporal-Difference Networks. NIPS 2004: 1377-1384 - 2002
- [e1]Rina Dechter, Michael J. Kearns, Richard S. Sutton:
Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28 - August 1, 2002, Edmonton, Alberta, Canada. AAAI Press / The MIT Press 2002 [contents] - 2001
- [c27]Doina Precup, Richard S. Sutton, Sanjoy Dasgupta:
Off-Policy Temporal Difference Learning with Function Approximation. ICML 2001: 417-424 - [c26]Peter Stone, Richard S. Sutton:
Scaling Reinforcement Learning toward RoboCup Soccer. ICML 2001: 537-544 - [c25]Michael L. Littman, Richard S. Sutton, Satinder Singh:
Predictive Representations of State. NIPS 2001: 1555-1561 - [c24]Peter Stone, Richard S. Sutton:
Keepaway Soccer: A Machine Learning Testbed. RoboCup 2001: 214-223 - 2000
- [c23]Doina Precup, Richard S. Sutton, Satinder Singh:
Eligibility Traces for Off-Policy Policy Evaluation. ICML 2000: 759-766 - [c22]Peter Stone, Richard S. Sutton, Satinder Singh:
Reinforcement Learning for 3 vs. 2 Keepaway. RoboCup 2000: 249-258
1990 – 1999
- 1999
- [j7]Richard S. Sutton, Doina Precup, Satinder Singh:
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2): 181-211 (1999) - [c21]Richard S. Sutton:
Open Theoretical Questions in Reinforcement Learning. EuroCOLT 1999: 11-17 - [c20]Richard S. Sutton, David A. McAllester, Satinder Singh, Yishay Mansour:
Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS 1999: 1057-1063 - 1998
- [b1]Richard S. Sutton, Andrew G. Barto:
Reinforcement learning - an introduction. Adaptive computation and machine learning, MIT Press 1998, ISBN 978-0-262-19398-6, pp. I-XVIII, 1-322 - [j6]Richard S. Sutton, Andrew G. Barto:
Reinforcement Learning: An Introduction. IEEE Trans. Neural Networks 9(5): 1054-1054 (1998) - [c19]Doina Precup, Richard S. Sutton, Satinder Singh:
Theoretical Results on Reinforcement Learning with Temporally Abstract Options. ECML 1998: 382-393 - [c18]Richard S. Sutton, Doina Precup, Satinder Singh:
Intra-Option Learning about Temporally Abstract Actions. ICML 1998: 556-564 - [c17]Robert Moll, Andrew G. Barto, Theodore J. Perkins, Richard S. Sutton:
Learning Instance-Independent Value Functions to Enhance Local Search. NIPS 1998: 1017-1023 - [c16]Richard S. Sutton, Satinder Singh, Doina Precup, Balaraman Ravindran:
Improved Switching among Temporally Abstract Actions. NIPS 1998: 1066-1072 - [c15]Richard S. Sutton:
Reinforcement Learning: Past, Present and Future. SEAL 1998: 195-197 - 1997
- [j5]Juan Carlos Santamaría, Richard S. Sutton, Ashwin Ram:
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces. Adapt. Behav. 6(2): 163-217 (1997) - [c14]Richard S. Sutton:
On the Significance of Markov Decision Processes. ICANN 1997: 273-282 - [c13]Doina Precup, Richard S. Sutton:
Exponentiated Gradient Methods for Reinforcement Learning. ICML 1997: 272-277 - [c12]Doina Precup, Richard S. Sutton:
Multi-time Models for Temporally Abstract Planning. NIPS 1997: 1050-1056 - 1996
- [j4]Satinder P. Singh, Richard S. Sutton:
Reinforcement Learning with Replacing Eligibility Traces. Mach. Learn. 22(1-3): 123-158 (1996) - 1995
- [c11]Richard S. Sutton:
TD Models: Modeling the World at a Mixture of Time Scales. ICML 1995: 531-539 - [c10]Richard S. Sutton:
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. NIPS 1995: 1038-1044 - 1993
- [c9]Richard S. Sutton, Steven D. Whitehead:
Online Learning with Random Representations. ICML 1993: 314-321 - 1992
- [c8]Richard S. Sutton:
Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta. AAAI 1992: 171-176 - 1991
- [j3]Richard S. Sutton:
Dyna, an Integrated Architecture for Learning, Planning, and Reacting. SIGART Bull. 2(4): 160-163 (1991) - [c7]Richard S. Sutton, Christopher J. Matheus:
Learning Polynomial Functions by Feature Construction. ML 1991: 208-212 - [c6]Richard S. Sutton:
Planning by Incremental Dynamic Programming. ML 1991: 353-357 - [c5]Terence D. Sanger, Richard S. Sutton, Christopher J. Matheus:
Iterative Construction of Sparse Polynomial Approximations. NIPS 1991: 1064-1071 - 1990
- [c4]Richard S. Sutton:
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. ML 1990: 216-224 - [c3]Richard S. Sutton:
Integrated Modeling and Control Based on Reinforcement Learning. NIPS 1990: 471-478
1980 – 1989
- 1989
- [c2]Andrew G. Barto, Richard S. Sutton, Christopher J. C. H. Watkins:
Sequential Decision Probelms and Neural Networks. NIPS 1989: 686-693 - 1988
- [j2]Richard S. Sutton:
Learning to Predict by the Methods of Temporal Differences. Mach. Learn. 3: 9-44 (1988) - 1985
- [c1]Oliver G. Selfridge, Richard S. Sutton, Andrew G. Barto:
Training and Tracking in Robotics. IJCAI 1985: 670-672 - 1983
- [j1]Andrew G. Barto, Richard S. Sutton, Charles W. Anderson:
Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 13(5): 834-846 (1983)
Coauthor Index
aka: Kris De Asis
aka: Ashique Rupam Mahmood
aka: Satinder P. Singh
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-10 21:43 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint