Research Scientist, Interpretability
Company: Anthropic Limited
Location: San Francisco
Posted on: April 26, 2024
|
|
Job Description:
When you see what modern language models are capable of, do you
wonder, "How do these things work? How can we trust them?"
The Interpretability team at Anthropic is working to
reverse-engineer how trained models work because we believe that a
mechanistic understanding is the most robust way to make advanced
systems safe. We're looking for researchers and engineers to join
our efforts.
People mean many different things by "interpretability". We're
focused onmechanistic interpretability, which aims to discover how
neural network parameters map to meaningful algorithms. If you're
unfamiliar with this type of research, you might be interested in,
or. (For a broader overview of work in this space, one of our
team's alumni maintains a.)
Some useful analogies might be to think of us as trying to do
"biology" or "neuroscience" of neural networks, or as treating
neural networks as binary computer programs we're trying to
"reverse engineer".
Some of our team's notable publications include,, and. This work
builds on ideas from members' work prior to Anthropic such as
the,,, and.
We aim to create a solid foundation for mechanistically
understanding neural networks and making them safe (see our). In
the short term, this means a we focus a lot of our attention on the
issue of "superposition" (see,, and our). But this is just a
stepping stone towards our goal of mechanistically understanding
neural networks.
Responsibilities:
You may be a good fit if you:
Familiarity with Python is required for this role.
#J-18808-Ljbffr
Keywords: Anthropic Limited, Salinas , Research Scientist, Interpretability, Other , San Francisco, California
Click
here to apply!
|