Publications
2024
- Neural NetworksData-driven learning of chaotic dynamical systems using Discrete-Temporal Sobolev NetworksConnor Kennedy, Trace Crowdis, Haoran Hu, and 2 more authorsNeural Networks, 2024
We introduce the Discrete-Temporal Sobolev Network (DTSN), a neural network loss function that assists dynamical system forecasting by minimizing variational differences between the network output and the training data via a temporal Sobolev norm. This approach is entirely data-driven, architecture agnostic, and does not require derivative information from the estimated system. The DTSN is particularly well suited to chaotic dynamical systems as it minimizes noise in the network output which is crucial for such sensitive systems. For our test cases we consider discrete approximations of the Lorenz-63 system and the Chua circuit. For the network architectures we use the Long Short-Term Memory (LSTM) and the Transformer. The performance of the DTSN is compared with the standard MSE loss for both architectures, as well as with the Physics Informed Neural Network (PINN) loss for the LSTM. The DTSN loss is shown to substantially improve accuracy for both architectures, while requiring less information than the PINN and without noticeably increasing computational time, thereby demonstrating its potential to improve neural network forecasting of dynamical systems.
@article{kennedy2024data, title = {Data-driven learning of chaotic dynamical systems using Discrete-Temporal Sobolev Networks}, author = {Kennedy, Connor and Crowdis, Trace and Hu, Haoran and Vaidyanathan, Sankaran and Zhang, Hong-Kun}, journal = {Neural Networks}, pages = {106152}, year = {2024}, publisher = {Pergamon}, }
- arXivAutomated Discovery of Functional Actual Causes in Complex EnvironmentsCaleb Chuck*, Sankaran Vaidyanathan*, Stephen Giguere, and 3 more authorsarXiv preprint arXiv:2404.10883, 2024
Reinforcement learning (RL) algorithms often struggle to learn policies that generalize to novel situations due to issues such as causal confusion, overfitting to irrelevant factors, and failure to isolate control of state factors. These issues stem from a common source: a failure to accurately identify and exploit state-specific causal relationships in the environment. While some prior works in RL aim to identify these relationships explicitly, they rely on informal domain-specific heuristics such as spatial and temporal proximity. Actual causality offers a principled and general framework for determining the causes of particular events. However, existing definitions of actual cause often attribute causality to a large number of events, even if many of them rarely influence the outcome. Prior work on actual causality proposes normality as a solution to this problem, but its existing implementations are challenging to scale to complex and continuous-valued RL environments. This paper introduces functional actual cause (FAC), a framework that uses context-specific independencies in the environment to restrict the set of actual causes. We additionally introduce Joint Optimization for Actual Cause Inference (JACI), an algorithm that learns from observational data to infer functional actual causes. We demonstrate empirically that FAC agrees with known results on a suite of examples from the actual causality literature, and JACI identifies actual causes with significantly higher accuracy than existing heuristic methods in a set of complex, continuous-valued environments.
@article{chuck2024automated, title = {Automated Discovery of Functional Actual Causes in Complex Environments}, author = {Chuck, Caleb and Vaidyanathan, Sankaran and Giguere, Stephen and Zhang, Amy and Jensen, David and Niekum, Scott}, journal = {arXiv preprint arXiv:2404.10883}, year = {2024}, }
- arXivJudging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-JudgesAman Singh Thakur*, Kartik Choudhary*, Venkat Srinik Ramayapally*, and 2 more authorsarXiv preprint arXiv:2406.12624, 2024
Offering a promising solution to the scalability challenges associated with human evaluation, the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models (LLMs). However, there are still many open questions about the strengths and weaknesses of this paradigm, and what potential biases it may hold. In this paper, we present a comprehensive study of the performance of various LLMs acting as judges, focusing on a clean scenario in which inter-human agreement is high. Investigating thirteen judge models of different model sizes and families, judging answers of nine different ’examtaker models’ - both base and instruction-tuned - we find that only the best (and largest) models achieve reasonable alignment with humans. However, they are still quite far behind inter-human agreement and their assigned scores may still differ with up to 5 points from human-assigned scores. In terms of their ranking of the nine exam-taker models, instead, also smaller models and even the lexical metric contains may provide a reasonable signal. Through error analysis and other studies, we identify vulnerabilities in judge models, such as their sensitivity to prompt complexity and length, and a tendency toward leniency. The fact that even the best judges differ from humans in this comparatively simple setup suggest that caution may be wise when using judges in more complex setups. Lastly, our research rediscovers the importance of using alignment metrics beyond simple percent alignment, showing that judges with high percent agreement can still assign vastly different scores.
@article{thakur2024judging, title = {Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges}, author = {Thakur, Aman Singh and Choudhary, Kartik and Ramayapally, Venkat Srinik and Vaidyanathan, Sankaran and Hupkes, Dieuwke}, journal = {arXiv preprint arXiv:2406.12624}, year = {2024}, }
- arXivAdaptive Circuit Behavior and Generalization in Mechanistic InterpretabilityJatin Nainani*, Sankaran Vaidyanathan*, AJ Yeung, and 2 more authorsarXiv preprint arXiv:2411.16105, 2024
Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.
@article{nainani2024adaptive, title = {Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability}, author = {Nainani, Jatin and Vaidyanathan, Sankaran and Yeung, AJ and Gupta, Kartik and Jensen, David}, journal = {arXiv preprint arXiv:2411.16105}, year = {2024}, }
2020
- Complex NetworksA new measure of modularity in hypergraphs: Theoretical insights and implications for effective clusteringTarun Kumar*, Sankaran Vaidyanathan*, Harini Ananthapadmanabhan, and 2 more authorsIn Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, 2020
Many real-world systems consist of entities that exhibit complex group interactions rather than simple pairwise relationships; such multi-way relations are more suitably modeled using hypergraphs. In this work, we generalize the framework of modularity maximization, commonly used for community detection on graphs, for the hypergraph clustering problem. We introduce a hypergraph null model that can be shown to correspond exactly to the configuration model for undirected graphs. We then derive an adjacency matrix reduction that preserves the hypergraph node degree sequence, for use with this null model. The resultant modularity function can be maximized using the Louvain method, a popular fast algorithm known to work well in practice for graphs. We additionally propose an iterative refinement over this clustering that exploits higher-order information within the hypergraph, seeking to encourage balanced hyperedge cuts. We demonstrate the efficacy of our methods on several real-world datasets.
@inproceedings{kumar2020new, title = {A new measure of modularity in hypergraphs: Theoretical insights and implications for effective clustering}, author = {Kumar, Tarun and Vaidyanathan, Sankaran and Ananthapadmanabhan, Harini and Parthasarathy, Srinivasan and Ravindran, Balaraman}, booktitle = {Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8}, pages = {286--297}, year = {2020}, organization = {Springer International Publishing}, }
- Appl. NetSciHypergraph clustering by iteratively reweighted modularity maximizationTarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, and 2 more authorsApplied Network Science, 2020
Learning on graphs is a subject of great interest due to the abundance of relational data from real-world systems. Many of these systems involve higher-order interactions (super-dyadic) rather than mere pairwise (dyadic) relationships; examples of these are co-authorship, co-citation, and metabolic reaction networks. Such super-dyadic relations are more adequately modeled using hypergraphs rather than graphs. Learning on hypergraphs has thus been garnering increased attention with potential applications in network analysis, VLSI design, and computer vision, among others. Especially, hypergraph clustering is gaining attention because of its enormous applications such as component placement in VLSI, group discovery in bibliographic systems, image segmentation in CV, etc. For the problem of clustering on graphs, modularity maximization has been known to work well in the pairwise setting. Our primary contribution in this article is to provide a generalization of the modularity maximization framework for clustering on hypergraphs. In doing so, we introduce a null model for graphs generated by hypergraph reduction and prove its equivalence to the configuration model for undirected graphs. The proposed graph reduction technique preserves the node degree sequence from the original hypergraph. The modularity function can be defined on a thus reduced graph, which can be maximized using any standard modularity maximization method, such as the Louvain method. We additionally propose an iterative technique that provides refinement over the obtained clusters. We demonstrate both the efficacy and efficiency of our methods on several real-world datasets.
@article{kumar2020hypergraph, title = {Hypergraph clustering by iteratively reweighted modularity maximization}, author = {Kumar, Tarun and Vaidyanathan, Sankaran and Ananthapadmanabhan, Harini and Parthasarathy, Srinivasan and Ravindran, Balaraman}, journal = {Applied Network Science}, volume = {5}, number = {1}, pages = {52}, year = {2020}, publisher = {Springer International Publishing Cham}, }