Sankaran Vaidyanathan

shun-ka-run    •    /ʃʌŋkəˈɹʌn/    •    சங்கரன்

me.jpg

I am a PhD candidate at the College of Information and Computer Sciences, UMass Amherst, where I am advised by David Jensen. My research focuses on developing principled tools grounded in causal reasoning for explaining and evaluating complex AI systems, including large language models (LLMs) and reinforcement learning agents.

In particular, I focus on problems where subjective human judgments play a central role, including mechanistic interpretability in neural networks, evaluating LLM outputs, blame and responsibility attribution, and alignment with human social norms. These domains are often difficult to model using conventional statistical approaches in causality and machine learning: human judgments are shaped by implicit expectations, context-sensitive reasoning, and the tendency to highlight some causes over others based on agreed-upon social norms.

By developing methods grounded in scientific rigor and the human values that guide real-world decision-making, I aim to enable reliable evaluation and responsible governance of AI systems.

news

Feb 27, 2025 I am joining the 10th edition of AI Safety Camp.
May 06, 2024 Passed the qualifier stage and am officially a Ph.D. candidate!

selected publications

  1. arXiv
    Automated Discovery of Functional Actual Causes in Complex Environments
    Caleb Chuck*, Sankaran Vaidyanathan*, Stephen Giguere, and 3 more authors
    arXiv preprint arXiv:2404.10883, 2024
  2. ACL GEM
    Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
    Aman Singh Thakur*, Kartik Choudhary*, Venkat Srinik Ramayapally*, and 2 more authors
    In Proceedings of the Fourth Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), 2025
  3. arXiv
    Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
    Jatin Nainani*, Sankaran Vaidyanathan*, AJ Yeung, and 2 more authors
    arXiv preprint arXiv:2411.16105, 2024