Subash Khanal

I am a PhD candidate in Computer Science at the Washington University in St. Louis , working in Multimodal Vision Research Laboratory led by Dr. Nathan Jacobs.

I have a MS in Electrical Engineering from the University of Kentucky in Lexington. During my masters, I worked with Dr. Michael T. Johnson focusing on speech recognition and signal processing.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github

profile photo

Research

I’m interested in applying deep learning to address various Computer Vision (CV) challenges, with a focus on multimodal data. Currently, I’m developing methods using multimodal machine learning to integrate satellite imagery, audio, and text for creating global-scale soundscape maps.

Publications

PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
Khanal Subash, Xing Eric, Sastry Srikumar, Dhakal Aayush, Xiong Zhexiao, Ahmad Adeel and Jacobs Nathan
ACM Multimedia, 2024
arxiv / bibtex / code

We develop a probabilistic, multi-scale, and metadata-aware embedding space that connects audio, text, and overhead imagery. This enables the creation of dynamic, multi-scale soundscape maps for any geographic region, along with uncertainty estimates for the mapping.

Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Khanal Subash, Sastry Srikumar, Dhakal Aayush and Jacobs Nathan
BMVC, 2023
arxiv / supplementary / bibtex / code

We learn a tri-modal embedding space between audio, text and overhead imagery. This enables us to create soundscape maps over any geographic region, using either audio or textual queries.

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Dhakal Aayush, Ahmad Adeel, Khanal Subash, Sastry Srikumar, Kerner Hannah and Jacobs Nathan
CVPRW (EarthVision), Best Paper Award, 2024
arxiv / bibtex / code

We train a contrastive learning framework, Sat2Cap on a novel large scale dataset. This enables us to create maps using free-form textual descriptions.

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
Sastry Srikumar, Khanal Subash, Dhakal Aayush, and Jacobs Nathan
CVPRW (EarthVision), 2024
arxiv / bibtex / code

This work presents GeoSynth, a diffusion-based model for synthesizing satellite images with global style and image-driven layout control.

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
Sastry Srikumar, Khanal Subash, Di Huang, Dhakal Aayush and Jacobs Nathan
WACV, 2024
arxiv / bibtex / code

This work presents a flexible framework, with vector embedding and metric learning variants, that supports both species distribution mapping with fine-grained visual classification.

GeoBind: Binding text, image, and audio through satellite images.
Dhakal Aayush, Khanal Subash, Sastry Srikumar, Ahmad Adeel, Jacobs Nathan
IGARSS , 2024
arxiv / bibtex

This work presents a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element.

Mixed-View Panorama Synthesis using Geospatially Guided Diffusion.
Xiong Zhexiao, Xing Xin, Workman Scott, Khanal Subash, Jacobs Nathan
preprint, 2024
arxiv / bibtex

This work introduces the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area.

LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
Sastry Srikumar, Xin Xing, Dhakal Aayush, Khanal Subash, Ahmad Adeel, and Jacobs Nathan
preprint, 2024
arxiv / bibtex

We introduced a novel approach for species distribution modeling that uses a large-language model to generate a representation of species. This provides flexibility to generate range maps at different levels of the taxonomic hierarchy and for unseen species.

Causality for inherently explainable transformers: CAT-XPLAIN
Khanal Subash, Brodie Benjamin, Xing Xin, Lin Ai-Ling and Jacobs Nathan
CVPR Workshop, 2022
arxiv / bibtex / code

Add an extra special token (explainable token) into Vision Transformer (ViT), and train it to select the most important patches in the input image.

Advit: Vision transformer on multi-modality pet images for alzheimer disease diagnosis
Xing Xin, Liang Gongbo, Zhang Yu, Khanal Subash, Lin Ai-Ling and Jacobs Nathan.
ISBI, 2022
paper / bibtex

Training ViT on 3D-to-2D converted multi-modal PET images achieves better Alzheimer's disease prediction.

Alzheimer's Disease Classification Using Genetic Data
Khanal Subash, Chen Jin, Jacobs Nathan and Lin Ai-Ling
BIBM Workshop, 2021
paper / bibtex / code

Machine learning on different types of genetic data helps to identify candidate genes for Alzheimer's disease progression.

Hierarchical Probabilistic Embeddings for Multi-View Image Classification
Brodie Benjamin, Khanal Subash, Rafique Muhammad Usman, Greenwell Connor and Jacobs Nathan
IGARSS, 2021
paper / bibtex

Learning a hierarchical, probabilistic embedding space allows one to achieve uncertainty estimate of feature distributions coming from sources with variable bands of information.

Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis
Khanal Subash, Johnson Michael T. and Bozorg Narjess
SLT, 2021
paper / bibtex

This paper compares the difference in articulatory patterns between native (L1) and non-native (L2) Mandarin speakers of English, for the purpose of providing an understanding of mispronunciation behaviors of L2 learners.

Mispronunciation Detection and Diagnosis for Mandarin Accented English Speech
Khanal Subash, Johnson Michael T., Soleymanpour Mohammad and Bozorg Narjes
SpeD, 2021
paper / bibtex

Articulatory features improve the performance of Automatic Speech Recognition (ASR) based Mispronunciation Detection and Diagnosis (MDD) systems.

Mispronunciation Detection and Diagnosis in Mandarin Accented English Speech
Khanal Subash
Theses and Dissertations--Electrical and Computer Engineering, 2020
Thesis / bibtex

The focus of this work was to analyse articulatory patterns of mispronunciation and design of ASR based MDD system.


This website is modified from source code of John Barron's website.