Senior Program Officer, Open Philanthropy
Ajeya Cotra leads Open Philanthropy’s grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. As part of this role, she conducts analysis on threat models (ways that advanced AI could cause catastrophic harm) and technical agendas (technical work that may help to address these threat models). She also co-founded, edits, and writes for Planned Obsolescence, a blog about AI futurism and AI alignment. She is interested in advising technical writers interested in AI alignment and other aspects of AI safety.
CEO, Redwood Research
Buck Shlegeris is the CTO at Redwood Research, a nonprofit organization that focuses on applied alignment research for artificial intelligence. Previously, he was a researcher at the Machine Intelligence Research Institute. His recent work includes a library for expressing and manipulating tensor computations for neural network interpretability, as well as the papers “Causal Scrubbing,” “Interpretability in the Wild,” “Polysemanticity and Capacity in Neural Networks,” and “Adversarial Training for High-Stakes Reliability.” He is interested in developing evaluation methodologies which AI developers could use to make robust safety cases for their training and deployment plans, examples here, as well as evaluating and improving safety techniques by using these evaluations.
Research Scientist, OpenAI Governance Team
Daniel Kokotajlo has a background in academic philosophy, but now works at OpenAI doing alignment and governance work. In between he did some good work on AI timelines. He's also worked at CLR trying to reduce s-risk and promote cooperative AGI.
Research Lead, Anthropic
Ethan Perez is a Research Scientist at Anthropic, where he leads a team working on developing model organisms of misalignment. He has recently published work on "Discovering Language Model Behaviors with Model-Written Evaluations,” “Measuring Progress on Scalable Oversight for Large Language Models,” and co-founded the Inverse Scaling Prize. Ethan’s research interests include robustness, model transparency, and the development of techniques to better understand and control AI systems.
Research Lead, Anthropic
Evan Hubinger is a research scientist and team lead at Anthropic leading work on model organisms of misalignment. Before joining Anthropic, Evan was a research fellow at the Machine Intelligence Research Institute. Evan's research focuses on inner alignment and deceptive alignment, concepts Evan and his coauthors coined in the paper "Risks from Learned Optimization in Advanced Machine Learning Systems.” He is interested in work related to model organisms of misalignment, conditioning predictive models, and deceptive alignment.
Member of Technical Staff, Redwood Research
Fabien Roger is a member of technical staff at Redwood Research. He is interested in finding methods to defend against deceptive models. In particular, he is interested in projects aimed at exploring one of the defenses against potentially deceptive models, such as improving paraphrasing to prevent steganography, training reliable detectors of suspicious activity, or training translators to make models unable to tell if there was a distribution shift.
Member of Technical Staff, METR
Hjalmar Wijk is a member of technical staff at ARC Evals. He is interested in work clarifying threat models and creating model evaluations to prevent harm from these threat models. In particular, he would like to make progress on the following questions: What are the most likely ways AI systems trained in the next year or two could cause catastrophic harms? What are the capabilities that enable these threat models, and how (at a high level) could we evaluate for them? At what levels do these risks clearly demand strong containment measures like state-proof security? There is a lot of conceptual work involved with mapping out this space, understanding future AI capabilities and identifying key considerations, but there is also an opportunity for more concrete work talking to diverse experts, identifying current bottlenecks to harm and finding reference classes or historical examples.
Research Analyst, Open Philanthropy
Lukas Finnveden is a member of Open Philanthropy's worldview investigation team. He is interested in issues other than intent alignment that might be important if transformative AI happens in the next decade or two, such as: how society should treat digital mind as it becomes increasingly plausible that they are sentient, ways to increase the probability that humanity gets on a good deliberative track to eventually reach good empirical & philosophical beliefs, and ways evidential cooperation in large worlds could be decision-relevant.
Member of Technical Staff, METR
Megan Kinniment is a member of technical staff at ARC Evals who leads a variety of research projects. ARC Evals’ current research focuses on eliciting and measuring capabilities of frontier models, as well as threat modeling work to understand the minimum capabilities needed for autonomous AI to pose a catastrophic risk. They are interested in projects relating to LLM agent development, task creation, threat modeling, and model assessment.
Research Associate, Future of Humanity Institute
Owain Evans leads a research group in Berkeley and has mentored 25+ alignment researchers in the past, primarily at Oxford’s Future of Humanity Institute. He is currently focused on defining and evaluating situational awareness in LLMs (relevant paper), learning how to predict the emergence of other dangerous capabilities empirically (see “out-of-context” reasoning and the Reversal Curse), and honesty, lying, truthfulness and introspection in LLMs.
Research Scientist, OpenAI Governance Team
Richard Ngo is a member of the OpenAI Governance team. He was previously a research engineer on the AGI safety team at DeepMind and was one of the main developers of the AI Safety Fundamentals curriculum. He is interested in AI threat modeling – understanding both why AIs might be misaligned and what misaligned AIs might do – as well as understanding defensive uses of AI and forecasting the far future.
Rob Long (with Ethan Perez)
Research Associate, Center for AI Safety (and Research Lead, Anthropic)
Rob Long is a Research Associate at the Center for AI Safety, where he works on issues at the intersection of philosophy of mind, cognitive science, and ethics of AI. Ethan Perez is a Research Scientist at Anthropic, where he leads a team working on developing model organisms of misalignment. They are interested in finding methods to determine whether AI systems possess consciousness, desires, or other states of moral significance. Self-reports, or an AI system’s statements about its own internal states, could provide a promising method for investigating this question. They are interested in testing whether current language models can learn to introspect in ways that generalize to other introspective questions, a necessary condition for eliciting reliable AI self-reports.
Senior Research Analyst, Open Philanthropy
Tom Davidson is senior research analyst at Open Philanthropy. He works on assessing whether transformative AI might be developed relatively soon, how suddenly it might be developed, and what impact it might have. See his recent report on AI takeoff speeds. He is interested in various projects related to this work: in particular, whether the biggest AI algorithmic discoveries of the last ten years would have been possible without fast-growing amounts of available compute, and how important Responsible Scaling Policies might be for reducing risks from accelerating AI progress.
Frontier Model Redteaming, Compute Governance, & Cybersecurity
We've talked to a few potential advisors that might be interested in advising someone if the right person applied, but are <50% likely to take a fellow. You can express interest in these advisors in the application and we will be in touch if you may be a good fit for one or more.
We will provide housing and transportation within Berkeley for the duration of the program. Additionally, we recommended Astra invitees to AI Safety Support (an Australian charity) for independent research grants and they have decided to provide grants of $15k for 10 weeks of independent research to accepted Astra applicants in support of their AI Safety research.
Fellows will conduct research from Constellation’s shared office space, and lunch and dinner will be provided daily. Individual advisors will choose when and how to interact with their fellows, but most advisors will work out of the Constellation office frequently. There will be regular invited talks from senior researchers, social events with Constellation members, and opportunities to receive feedback on research. Before the program begins, we may also provide tutorial support to fellows interested in going through Constellation and Redwood Research’s MLAB curriculum.
We expect to inform all applicants whether they have progressed to the second round of applications within a week of them applying, and to make final decisions by December 1, 2023. For more details on the application process, see the FAQ below. The deadline to apply has passed.
"Participating in MLAB [the Machine Learning for Alignment Bootcamp, jointly run by Constellation and Redwood Research] was probably the biggest single direct cause for me to land my current role. The material was hugely helpful, and the Constellation network is awesome for connecting with AI safety organizations."
“Having research chats with people I met at Constellation has given rise to new research directions I hadn't previously considered, like model organisms. Talking with people at Constellation is how I decided that existential risk from AI is non-trivial, after having many back and forth conversations with people in the office. These updates have had large ramifications for how I’ve done my research, and significantly increased the impact of my research.”
"Speaking with AI safety researchers in Constellation was an essential part of how I formed my views on AI threat models and AI safety research prioritization. It also gave me access to a researcher network that I've found very valuable for my career.”
“Participating in MLAB was likely the most important thing I did for upskilling to get my current position and has generally been quite valuable for my research via gaining intuitions on how language models work, gaining more Python fluency, and better understanding ML engineering. I’m excited for other people to have similar opportunities.”
If your question isn't answered here, please reach out to email@example.com
Constellation is a research center dedicated to safely navigating the development of transformative AI. We host a number of organizations, teams and individuals working on topics including alignment, dangerous capability evaluations, and AI governance, in addition to running field-building programs such as this one.
We welcome a wide range of applicants. We expect professionals working in related industries, graduate students, and especially promising undergraduates to be good fits, but are excited about applicants with other backgrounds. If you are unsure about your fit, please err on the side of applying. We especially encourage women and underrepresented minorities to apply.
If you and your advisor would like to continue your project after the program ends, we may be able to provide support. While individual cases will vary, we are generally excited to help fellows complete their research.
Additionally, over 15 participants in past programs, such as Constellation and Redwood Research’s Machine Learning for Alignment Bootcamps, are (as of the time of writing) working at Anthropic, ARC Evals, ARC Theory, Google DeepMind, OpenAI, Open Philanthropy, and Redwood Research.
The application process involves two rounds.
Applications will be processed on a rolling basis up until the deadline, November 17th. We encourage you to submit the application as soon as convenient, so that if you progress to the next round you will have more time to complete the advisor-specific questions. We plan to get back to all candidates by November 20th with information on their next steps.
While we prefer if participants are present for the full duration of the program from January through March, we can accommodate variable start and end dates as long as you are able to join for the majority of the program. If you aren’t available for any of the dates, we still recommend filling out the application since we may run future iterations of this program.
The Visiting Research Program provides an opportunity for established researchers to spend time at Constellation while continuing their full-time research. The Astra Fellowship allows people interested in starting new research to work with an experienced advisor, who will provide guidance and project direction. If you would like to continue your own research, we recommend applying to the Visiting Researcher Program. If you would like to be paired with an advisor to start a new project, we recommend applying to this program.
We will cover travel to and from Berkeley, CA in addition to housing for the time you are here. Constellation provides lunch and dinner during weekdays. Feel free to email firstname.lastname@example.org with any questions.
Yes — please email email@example.com with their name, email (if you’d like us to reach out to them,) and (optionally) a short sentence on why you think they’d be a good fit.
Constellation is not compensating you for your participation in this program. However, AI Safety Support (an Australian charity) has decided to provide grants of $15k for 10 weeks of independent research to accepted Astra applicants in support of their AI Safety research.