Most papers here follow the Hardy Littlewood Rules where authors are presented in alphabetical order, with a few exceptions. (*) indicates equal contribution.
Ongoing Work and Preprints
Scalable Gradient-Based Attribution of LLM Behaviors
Linda Cai*, Xinyan Hu*, Alex Pan*, Socrates Osorio, Pratyay Pandey, Michael I. Jordan, Jacob Steinhardt. Ongoing work. Preprint available upon request![code]