Change the repository type filter
All
Repositories list
21 repositories
ThinkEdit
PublicAn effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks
posthoc-generative-cbm
Public[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality- [NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs
RAT_MisD
PublicBoosting misclassification detection ability by radius-aware training (RAT)Describe-and-Dissect
Public[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language modelsCB-LLMs
Public[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.Concept-Bottleneck-LLM
PublicVLG-CBM
Public[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)- [ECCV 24] A new and low-cost test-time defense for DNNs based on neuron-level-interpretability methods
Audio_Network_Dissection
Public[ICML 24] AND: the first framework to provide automatic natural language explanations for deep acoustic networkDSC-210-NLA-FA22
Public- [ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance
NN-LPK
Public- [ICLR 24] This work proposes RSCP+ to provide robustness guarantee in evaluation, and two novel methods PTT and RCT to robustify conformal predictions with improved efficiency through post-hoc transformation and training.
Label-free-CBM
Public[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data- [NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs
CLIP-dissect
Public[ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs- [ICCV 23] Evaluating robustness of neuron explanation methods