Subash Khanal

I am a PhD candidate in Computer Science at the Washington University in St. Louis , working in Multimodal Vision Research Laboratory led by Dr. Nathan Jacobs.

I have a MS in Electrical Engineering from the University of Kentucky in Lexington. During my masters, I worked with Dr. Michael T. Johnson focusing on speech recognition and signal processing.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github

profile photo


I'm interested in using deep learning to solve various Computer Vision (CV) problems while learning from multimodal data. Currently, I am focusing on developing CV models having geospatial understanding of sounds around the world.


Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Khanal Subash, Sastry Srikumar, Dhakal Aayush and Jacobs Nathan
BMVC, 2023
arxiv / supplementary / bibtex / code

We learn a tri-modal embedding space between audio, text and overhead imagery. This enables us to create soundscape maps over any geographic region, using either audio or textual queries.

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Dhakal Aayush, Ahmad Adeel, Khanal Subash, Sastry Srikumar, Kerner Hannah and Jacobs Nathan
CVPRW (EarthVision), 2024
arxiv / bibtex / code

We train a contrastive learning framework, Sat2Cap on a novel large scale dataset. This enables us to create maps using free-form textual descriptions.

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
Sastry Srikumar, Khanal Subash, Dhakal Aayush, and Jacobs Nathan
CVPRW (EarthVision), 2024
arxiv / bibtex / code

This work presents GeoSynth, a diffusion-based model for synthesizing satellite images with global style and image-driven layout control.

LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
Sastry Srikumar, Xin Xing, Dhakal Aayush, Khanal Subash, Ahmad Adeel, and Jacobs Nathan
preprint, 2024
arxiv / bibtex

We introduced a novel approach for species distribution modeling that uses a large-language model to generate a representation of species. This provides flexibility to generate range maps at different levels of the taxonomic hierarchy and for unseen species.

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
Sastry Srikumar, Khanal Subash, Di Huang, Dhakal Aayush and Jacobs Nathan
WACV, 2024
arxiv / bibtex / code

This work presents a flexible framework, with vector embedding and metric learning variants, that supports both species distribution mapping with fine-grained visual classification.

GeoBind: Binding text, image, and audio through satellite images.
Dhakal Aayush, Khanal Subash, Sastry Srikumar, Ahmad Adeel, Jacobs Nathan
IGARSS , 2024
arxiv / bibtex

This work presents a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element.

Causality for inherently explainable transformers: CAT-XPLAIN
Khanal Subash, Brodie Benjamin, Xing Xin, Lin Ai-Ling and Jacobs Nathan
CVPR Workshop, 2022
arxiv / bibtex / code

Add an extra special token (explainable token) into Vision Transformer (ViT), and train it to select the most important patches in the input image.

Advit: Vision transformer on multi-modality pet images for alzheimer disease diagnosis
Xing Xin, Liang Gongbo, Zhang Yu, Khanal Subash, Lin Ai-Ling and Jacobs Nathan.
ISBI, 2022
paper / bibtex

Training ViT on 3D-to-2D converted multi-modal PET images achieves better Alzheimer's disease prediction.

Alzheimer's Disease Classification Using Genetic Data
Khanal Subash, Chen Jin, Jacobs Nathan and Lin Ai-Ling
BIBM Workshop, 2021
paper / bibtex / code

Machine learning on different types of genetic data helps to identify candidate genes for Alzheimer's disease progression.

Hierarchical Probabilistic Embeddings for Multi-View Image Classification
Brodie Benjamin, Khanal Subash, Rafique Muhammad Usman, Greenwell Connor and Jacobs Nathan
IGARSS, 2021
paper / bibtex

Learning a hierarchical, probabilistic embedding space allows one to achieve uncertainty estimate of feature distributions coming from sources with variable bands of information.

Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis
Khanal Subash, Johnson Michael T. and Bozorg Narjess
SLT, 2021
paper / bibtex

This paper compares the difference in articulatory patterns between native (L1) and non-native (L2) Mandarin speakers of English, for the purpose of providing an understanding of mispronunciation behaviors of L2 learners.

Mispronunciation Detection and Diagnosis for Mandarin Accented English Speech
Khanal Subash, Johnson Michael T., Soleymanpour Mohammad and Bozorg Narjes
SpeD, 2021
paper / bibtex

Articulatory features improve the performance of Automatic Speech Recognition (ASR) based Mispronunciation Detection and Diagnosis (MDD) systems.

Mispronunciation Detection and Diagnosis in Mandarin Accented English Speech
Khanal Subash
Theses and Dissertations--Electrical and Computer Engineering, 2020
Thesis / bibtex

The focus of this work was to analyse articulatory patterns of mispronunciation and design of ASR based MDD system.

This website is modified from source code of John Barron's website.