AISC5: Research Summaries

Modularity Loss Function Team members:  Logan Smith, Viktor Rehnberg, Vlado Baca, Philip Blagoveschensky, Viktor PetukhovExternal collaborators:  Gurkenglas Making neural networks (NNs) more modular may improve their interpretability. If we cluster neurons or weights together according to their different functions, we can analyze each cluster individually. Once we better understand the clusters that make up a …

AISC5: Research Summaries Read More »

AISC4: Research Summaries

The fourth AI Safety Camp took place in May 2020 in Toronto. Due to COVID-19, the camp was held virtually. Six teams participated and worked on the following topics: Survey on AI risk scenarios Options to defend a vulnerable world Extraction of human preferences Transferring reward functions across environments to encourage safety for agents in …

AISC4: Research Summaries Read More »

AISC3: Research Summaries

The third AI Safety Camp took place in April 2019 in Madrid. Our teams worked on the projects summarized below: Categorizing Wireheading in Partially Embedded Agents: Team: Embedded agents – Arushi, Davide, Sayan They presented their work at the AI Safety Workshop in IJCAI 2019. Read their paper here. AI Safety Debate and Its Applications: Team: Debate – …

AISC3: Research Summaries Read More »

AISC2: Research Summaries

The second AI Safety Camp took place this October in Prague. Our teams have worked on exciting projects which are summarized below:   AI Governance and the Policymaking Process: Key Considerations for Reducing AI Risk: Team: Policymaking for AI Strategy – Brandon Perry, Risto Uuk Our project was an attempt to introduce literature from theories on the …

AISC2: Research Summaries Read More »

The first AI Safety Camp & onwards

by Remmelt Ellen and Linda Linsefors Summary Last month, 5 teams of up-and-coming researchers gathered to solve concrete problems in AI-alignment at our 10-day AI safety research camp in Gran Canaria. This post describes      the event format we came up with      our experience & lessons learned in running it in Gran Canaria      how you can contribute …

The first AI Safety Camp & onwards Read More »

The participants of the first AI safety camp in Gran Canaria

AISC 1: Research Summaries

The 2018 Gran Canaria AI safety camp teams have worked hard in the preparation of the camp and in the 10 day sprint. Each team has written a brief summary of the work they did during the camp: Irrationality Team: Christopher Galias, Johannes Heidecke, Dmitrii Krasheninnikov, Jan Kulveit, Nandi Schoots Our team worked on how …

AISC 1: Research Summaries Read More »