The third AI Safety Camp took place in April 2019 in Madrid. Our teams worked on the projects summarized below:
Categorizing Wireheading in Partially Embedded Agents:
Team: Embedded agents – Arushi, Davide, Sayan
They presented their work at the AI Safety Workshop in IJCAI 2019.
Read their paper here.
AI Safety Debate and Its Applications:
Team: Debate – Vojta Kovarik, Anna Gajdova, David Lindner, Lukas Finnveden, Rajashree Agrawal
Read their blog post here.
See their GitHub here.
Regularization and visualization of attention in reinforcement learning agents
Team: RL Attention – Dmitry Nikulin, Sebastian Kosch, Fabian Steuer, Hoagy Cunningham
Read their research report here.
Modelling Cooperation
See visualisation of their mathematical model here.
Robustness of Multi-Armed Bandits
Team: Bandits – Dominik Fay, Misha Yagudin, Ronak Mehta
Learning Models of Mistakes
Team Mistakes – Lewis Hammond, Nikolas Bernaola, Saasha Nair
Cooperative Environments with Terminal Consequences
Team CIRL Environment: Jason Hepburn, Nix Goldowsky-Dill, Pablo Antonio Moreno Casares, Ross Gruetzemacher, Vasilios Mavroudis
Responsible Disclosure in AI Research
Team AI Governance: Cynthia Yoon, Jordi Bieger, Laszlo Treszkai, Ronja Lutz
Psychological Distance and Group Blindspots
Team – Psychological Distance: Remmelt Ellen