Research
My research focuses on developing methods for Geospatial AI using Deep Learning and Computer Vision. Currently, I am working on multimodal self-supervised learning techniques to integrate satellite imagery, audio, and text for global-scale soundscape mapping.
|
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
Khanal Subash, Xing Eric, Sastry Srikumar, Dhakal Aayush, Xiong Zhexiao, Ahmad Adeel and Jacobs Nathan
ACM Multimedia, 2024
arxiv /
bibtex /
code /
project page
We develop a probabilistic, multi-scale, and metadata-aware embedding space that connects audio, text, and overhead imagery. This enables the creation of dynamic, multi-scale soundscape maps for any geographic region, along with uncertainty estimates for the mapping.
|
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Khanal Subash, Sastry Srikumar, Dhakal Aayush and Jacobs Nathan
BMVC, 2023
arxiv /
supplementary /
bibtex /
code
We learn a tri-modal embedding space between audio, text and overhead imagery. This enables us to create soundscape maps over any geographic region, using either audio or textual queries.
|
TaxaBind: A Unified Embedding Space for Ecological Applications
Sastry Srikumar, Khanal Subash, Dhakal Aayush, Ahmad Adeel and Jacobs Nathan
WACV, 2025
arxiv /
bibtex /
code /
project page
TaxaBind is a suite of multimodal models useful for downstream ecological tasks covering six modalities: ground-level image, geographic location, satellite image, text, audio, and environmental features.
|
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Dhakal Aayush, Ahmad Adeel, Khanal Subash, Sastry Srikumar, Kerner Hannah and Jacobs Nathan
CVPRW (EarthVision), 2024, Best Paper Award
arxiv /
bibtex /
code
We train a contrastive learning framework, Sat2Cap on a novel large scale dataset. This enables us to create maps using free-form textual descriptions.
|
GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
Sastry Srikumar, Khanal Subash, Dhakal Aayush, and Jacobs Nathan
CVPRW (EarthVision), 2024
arxiv /
bibtex /
code
project page
This work presents GeoSynth, a diffusion-based model for synthesizing satellite images with global style and image-driven layout control.
|
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
Sastry Srikumar, Khanal Subash, Di Huang, Dhakal Aayush and Jacobs Nathan
WACV, 2024
arxiv /
bibtex /
code
This work presents a flexible framework, with vector embedding and metric learning variants, that supports both species distribution mapping with fine-grained visual classification.
|
GeoBind: Binding text, image, and audio through satellite images.
Dhakal Aayush, Khanal Subash, Sastry Srikumar, Ahmad Adeel, Jacobs Nathan
IGARSS , 2024
arxiv /
bibtex
This work presents a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element.
|
LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
Sastry Srikumar, Xin Xing, Dhakal Aayush, Khanal Subash, Ahmad Adeel, and Jacobs Nathan
preprint, 2024
arxiv /
bibtex
We introduced a novel approach for species distribution modeling that uses a large-language model to generate a representation of species. This provides flexibility to generate range maps at different levels of the taxonomic hierarchy and for unseen species.
|
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion.
Xiong Zhexiao, Xing Xin, Workman Scott, Khanal Subash, Jacobs Nathan
preprint, 2024
arxiv /
bibtex
This work introduces the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area.
|
Causality for inherently explainable transformers: CAT-XPLAIN
Khanal Subash, Brodie Benjamin, Xing Xin, Lin Ai-Ling and Jacobs Nathan
CVPR Workshop, 2022
arxiv /
bibtex /
code
Add an extra special token (explainable token) into Vision Transformer (ViT), and train it to select the most important patches in the input image.
|
Advit: Vision transformer on multi-modality pet images for alzheimer disease diagnosis
Xing Xin, Liang Gongbo, Zhang Yu, Khanal Subash, Lin Ai-Ling and Jacobs Nathan.
ISBI, 2022
paper /
bibtex
Training ViT on 3D-to-2D converted multi-modal PET images achieves better Alzheimer's disease prediction.
|
Alzheimer's Disease Classification Using Genetic Data
Khanal Subash, Chen Jin, Jacobs Nathan and Lin Ai-Ling
BIBM Workshop, 2021
paper /
bibtex /
code
Machine learning on different types of genetic data helps to identify candidate genes for Alzheimer's disease progression.
|
Hierarchical Probabilistic Embeddings for Multi-View Image Classification
Brodie Benjamin, Khanal Subash, Rafique Muhammad Usman, Greenwell Connor and Jacobs Nathan
IGARSS, 2021
paper /
bibtex
Learning a hierarchical, probabilistic embedding space allows one to achieve uncertainty estimate of feature distributions coming from sources with variable bands of information.
|
Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis
Khanal Subash, Johnson Michael T. and Bozorg Narjess
SLT, 2021
paper /
bibtex
This paper compares the difference in articulatory patterns between native (L1) and non-native (L2) Mandarin speakers of English, for the purpose of providing an understanding of mispronunciation behaviors of L2 learners.
|
Mispronunciation Detection and Diagnosis for Mandarin Accented English Speech
Khanal Subash, Johnson Michael T., Soleymanpour Mohammad and Bozorg Narjes
SpeD, 2021
paper /
bibtex
Articulatory features improve the performance of Automatic Speech Recognition (ASR) based Mispronunciation Detection and Diagnosis (MDD) systems.
|
Mispronunciation Detection and Diagnosis in Mandarin Accented English Speech
Khanal Subash
Theses and Dissertations--Electrical and Computer Engineering, 2020
Thesis /
bibtex
The focus of this work was to analyse articulatory patterns of mispronunciation and design of ASR based MDD system.
|
|