Research Interests
I'm interested in computer vision, machine learning, optimization, graphics and robotics.
|
News
May '25
Maya accepted to VLMs-4-All @ CVPR 2025.
May '25
Robust and Fine-Grained Detection of AI-Generated Texts featured in TuringPost.
May '25
š„ Clash of Civilizations ā Bronze, š CodeClarity ā Most Innovative @ Expedition Aya 2025 (hosted by Cohere for AI). Evaluated political bias and code generation in non-English languages.
Feb '25
⨠INCLUDE accepted as a Spotlight at ICLR 2025.
Jan '25
Served as a reviewer for ACL ARR (Dec 2024 cycle).
Jan '25
š Three papers accepted at COLING 2025:
Aug '23
Served as a reviewer for EMNLP 2023.
|
M-REWARDBENCH: Evaluating Reward Models in Multilingual Settings
Accepted to ACL 2025 (Main Conference) | Rated 4/5
Srishti Gureja, Lester J. Miranda, Shayekh Bin Islam, Rishabh Maheshwary, Drishti Sharma, Gusti Winata
Advisors: Nathan Lambert, Sebastian Ruder, Sara Hooker, Marzieh Fadaee
Paper /
Code /
Dataset /
Leaderboard
This work presents M-REWARDBENCH, the first large-scale benchmark for evaluating reward models (RMs) in multilingual settings, covering 23 languages across 8 scripts and 5 language families. The benchmark assesses four core capabilities: (1) chat, (2) safety, (3) reasoning, and (4) translation, using 2.87k human-aligned preference instances. A total of 25 reward modelsāspanning Classifier, Generative, and Implicit (DPO-trained) typesāare evaluated. Key findings include: (1) Generative RMs (e.g., GPT-4 Turbo) achieve the highest multilingual performance and cross-lingual consistency, with only a 3% average performance drop compared to English, (2) Classifier and Implicit RMs exhibit larger drops (~8ā13%) and greater volatility, especially in subjective categories like chat and safety, and (3) translation quality, language resource availability, and script type (e.g., Latin, Cyrillic) significantly impact RM performance. For translation, the benchmark incorporates easy and hard subsets from the MAPLE dataset across four directions (enāzh, enāde), showing that harder tasks consistently reduce accuracy. Performance improves by 1ā3% when using higher-quality translations (Google Translate over NLLB-3.3B).
Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory QA
4th place @ COLING 2025 (RegNLP Track)
Jebish Purbey, Drishti Sharma, Siddhant Gupta, Khawaja Murad, Siddartha Pullakhandam, Ram Mohan Rao Kadiyala
Paper
It introduces LeSeR (Lexical Reranking of Semantic Retrieval), a hybrid method that first performs dense semantic retrieval using fine-tuned embedding models on query-passage pairs, then reranks results with BM25, a classical lexical retrieval method. This two-stage decoupled pipeline boosts both recall and precision, outperforming standalone dense or lexical models. We experimented with multiple embedding modelsāStella, BGE, CDE, and MPNetāapplying fine-tuning using Multiple Negative Symmetric Ranking (MNSR) Loss. Among these, BGE_MNSR integrated with LeSeR (BGE_LeSeR) achieved the best retrieval performance, with Recall@10 of 0.8201 and mAP@10 of 0.6655, outperforming both dense-only and lexical-only baselines. For answer generation, the top-performing combination was BGE_LeSeR with Qwen2.5 7B, which achieved the highest RePASs score (0.4340), demonstrating strong entailment, obligation coverage, and low contradiction.
Behind Maya: Building a Multilingual Vision Language Model
Accepted @ VLMs4ALL Workshop, CVPR 2025
Drishti Sharma et al.
Paper
This work introduces Maya, an open-source multilingual VLM designed to address the underperformance of existing VLMs in low-resource languages and diverse cultural contexts. It is built on two major contributions: (1) a multilingual image-text pretraining dataset with 4.4M samples across 8 languages (English, Chinese, French, Spanish, Russian, Hindi, Japanese, Arabic), created via a hybrid translation framework using tools like Aya 35B, rigorous prompt engineering (BLEU/N-gram scoring), and balanced sampling from the LLaVA dataset; (2) a multilingual multimodal model architecture that uses SigLIP as the vision encoder (replacing CLIP for its multilingual adaptability and variable patch size support) and Aya-23 8B as the LLM (supporting 23 languages with an 8K context window). Visual features are projected via a 2-layer MLP with GELU to align with language space, inspired by LLaVA 1.5.Maya outperforms PALO-7B on LLaVA-Bench-In-The-Wild, offering a strong multilingual alternative to LLaVA.
Detection of Language, Hate Speech, and Targets using LLMs in Devanagari Script
Accepted @ COLING 2025 (CHiPSAL Track)
Drishti Sharma et al.
Paper
This work presents a modular multilingual NLP system for five Devanagari-script languagesāHindi, Nepali, Marathi, Sanskrit, and Bhojpuriāaddressing three tasks: (1) language identification, (2) hate speech detection, and (3) hate speech target classification. Each task is tackled using a distinct set of models and strategies: (1) for language identification, an ensemble of fine-tuned IndicBERT V2, MuRIL, and Gemma-2 9B models achieved high accuracy via majority voting with fallback logic; (2) for hate speech detection, a separate ensemble combined IndicBERT V2, Gemma-2 9B, and Gemma-2 27B fine-tuned using Odds Ratio Preference Optimization (ORPO), with BERT-based models trained using focal loss (α = 0.35, γ = 4.0) to address severe class imbalance; and (3) for target classification, the best performance was achieved by a single Gemma-2 27B model fine-tuned with ORPO, without ensembling. All decoder-only models were fine-tuned using 4-bit LoRA to ensure efficient training. The Sub-task A ensemble achieved an F1 score of 0.9980, the Sub-task B ensemble reached 0.7652, and the standalone model in Sub-task C attained 0.6804, demonstrating the value of task-specific model design for low-resource, script-sensitive language understanding.
Sequential Learning for Claim Verification and Explanation in Financial Domains
3rd place @ COLING 2025 (FINLP Track)
Drishti Sharma et al.
Paper
This work presents SeQwen, our sequential learning-based system developed for the COLING 2025 Financial Misinformation Detection (FMD) challenge, which focuses on two tasks: (1) classifying financial claims as True, False, or Not Enough Information (NEI), and (2) generating coherent explanations for each classification. We used the FIN-FACT dataset and evaluated multiple open-source LLMsāQwen2.5, Mistral, Llama3, Gemma-2, and Phi-3. We adopted a two-stage sequential fine-tuning approach, where models were first fine-tuned for classification and then for joint classification and explanation generation. Key findings include: (1) Qwen2.5 7B emerged as the best-performing model, (2) sequential fine-tuning (SeQwen) consistently outperformed single-stage joint training, yielding a 7.1% improvement in overall score and substantial gains in explanation quality, and (3) simply extending joint training epochs led to smaller improvements than our staged approach.
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
⨠Spotlight @ ICLR 2025
Drishti Sharma et al.
Paper
This work presents INCLUDE, a large-scale multilingual benchmark designed to evaluate LLMs on regional knowledge and cultural grounding across 44 languages and 15 scripts. INCLUDE comprises 197,243 multiple-choice questions sourced from 1,926 real-world exams administered in 52 countries, covering diverse subjects such as law, history, culture, and general knowledge. It includes two subsetsāINCLUDE-BASE and INCLUDE-LITEāfor scalable evaluation across a wide range of LLMs. Models evaluated include GPT-4o, GPT-3.5, Claude, Gemini, Mistral, Yi, and others. GPT-4o achieves the highest overall accuracy, but all models exhibit substantial performance degradation in underrepresented languages and unfamiliar scripts. Prompt language (English vs. native) and instruction tuning have limited impact on overall accuracy.
Robust and Fine-Grained Detection of AI-Generated Texts
Submitted to ACL 2025
Drishti Sharma et al.
Paper
Trained and benchmarked models for fine-grained AI-generated content detection across 2.4M multilingual samples...
Improving Multilingual Capabilities with Cultural and Local Knowledge in LLMs
Submitted to ACL 2025
Drishti Sharma et al.
Paper
Instruction-tuned 7 LLMs with 485K culturally-informed English-Hindi pairs, yielding consistent improvements...
Intel Projects
Besides my work on the RealSense depth sensors and the publications above, a sampling of my publicly disclosed work
|
|
Intel RealSense 400
Intel
2016-08-15
My responsibilities included system performance, components of the stereo algorithm on the imaging ASIC, and contributions to the design of laser projector pattern.
|
|
Compact VCSEL Projector
Intel
2016-06-27
patent /
patent #2 /
patent #3 /
A low-cost dense, configurable projector system for RGB-D depth sensors.
|
|
Depth Image Enhancement
Intel
2015-08-06
patent /
Algorithms to filter, enhance and clean-up RGB-D data streams.
|
|
Real-time Box Measurement
Intel
2015-04-08
video /
video #2 /
Using a single depth sensor, real-time detection of cuboids, accurate estimation of their dimensions, and even some bin-packing.
|
|
DashPoint: A low-cost, low-power human interface device
Intel
2013-06-07
patent /
patent #2 /
Finger tracking on a microcontroller, with optics tricks and some HCI ideas
|
|
Stereoscopic depth reconstruction with probabilistic pixel correspondence search
Intel
2012-07-24
patent /
A fast method for performing stereo depth maps.
|
Other Projects
These include coursework, side projects and unpublished research work.
|
|
Dice Stacking: A Dynamic Manipulation Task
CMU 16-741 Mechanics of Manipulation
2018-12-05
paper /
video /
code /
With Hunter Goforth, we designed a manipulation task and solved it with imitation learning.
|
|
Introspective Neural Networks
CMU 16-824: Visual Learning and Recognition
2018-05-15
paper /
Using pre-trained neural networks to improve fine grained recognition via style transfer.
|
|
Stochastic Sampling of Parametric Policies
CMU 16-745: Dynamic Optimization
2018-05-05
paper /
Using a very simple algorithm to solve some very simple environments
|
|
Optimizing for Physical Simulation
CMU 16-745: Dynamic Optimization
2018-03-22
code /
With Chris Atkeson and Alex Spitzer. Using optimizers to match an observed trajectory.
|
|
A Maze Bot
Stanford CS225A: Experimental Robotics
2017-06-12
paper /
video /
video #2 /
Making a 6-DoF PUMA arm solve a maze with real-time vision and tracking.
|
|
Learning Implicit Communication Strategies
Stanford CS234: Deep Reinforcement Learning
2017-06-10
Work with Aaron Goodman on used reinforcement learning to discover implicit collusion strategies in the context of an iterated prisonerās dilemma.
|
|
Computational models for text summarization
Stanford CS224N: Natural Language Processing
2017-03-18
paper /
video /
code /
poster /
Work with Ludwig Schubert on simplified encoders stages for text summarization.
|
|
Superresolution Micrscopy
Stanford CS371: Computational Biology in Four Dimensions
2017-03-16
code /
slides /
An implementation of Faster STORM using compressed sensing.
|
|
Automatically building Restaurant Ontologies
Stanford CS270: Modeling Biomedical Systems
2017-03-15
paper /
poster /
Using the Yelp dataset of reviews to model the semantics and relationships between cuisines, businesses and other properties useful for restaurant recommendations.
|
|
Beyond Correlation Networks for the Financial Market
Stanford CS224W: Social and Information Network Analysis
2016-12-07
paper /
Using graph models, we track the development of financial networks over the 20th century.
|
|
Gradient-learned Models for Stereo Matching
Stanford CS231A: Computer Vision, From 3D Reconstruction to Recognition
2016-06-07
paper /
code /
Some re-implementations of standard stereo correspondence algorithms, along with experiments using classification for stereo matching.
|
|
Multimodal Natural Language Inference
Stanford CS224U: Natural Language Understanding
2016-06-06
paper /
video /
We explored how natural language inference tasks can be augmented with visual data.
|
|
CNNs for 3D Model Classification
Stanford CS231n: Convolutional Neural Networks for Visual Recognition
2016-03-08
paper /
poster /
3D shape classification by learning an embedding function into a 2D image and using a pre-trained ImageNet network. At the time, got state-of-the-art results for single-view classification on ShapeNet40.
|
|
Wide-angle Stereo Lenses
Stanford CS448I: Computational Imaging and Display
2016-03-07
paper /
poster /
We introduce various projection functions in the analysis of stereoscopic depth sensors.
|
|
Doctor Bayes
Stanford CS221: Artificial Intelligence
2015-12-12
website /
paper /
code /
poster /
Detecting disease from a short description of symptoms. In some small testing, obtained nearly 90% top 5 accuracy and about 60% top 1 accuracy
|
|
Level-set based tracking and segmentation
Stanford CS279: Structure and Organization of Biomolecules and Cells
2015-12-04
paper /
code /
We implemented a detection and deformable tracking pipeline for red blood cells.
|
|
Dequantization of Depth Data
Other
2015-04-22
code /
An O(1) time algorithm for producing smooth normals for quantized data, such as the Kinect.
|
|
Golf swing monitoring
Other
2011-07-21
Work with Ankur Mehta, built a demonstration platform that used wireless low-weight, low-cost sensor platforms to monitor a golf swing.
|
|
Project Tetra: Collaborative robot state estimation
UC Berkeley EE149: Embedded Systems
2011-07-21
With Humphrey Hu, Ryan Julian, and Eric Yuan, a project to show the efficacy of multiple-robot collaborative state estimation. Using Wiimote cameras, mobile robot platforms, and real-time wireless communication.
|
|
GINA: Low power design
UC Berkeley
2010-08-22
For testing and validating the functionality of the GINA (GuidanceĀ andĀ InertialĀ NAvigation) mote, a 1.6 gram sensor platform.
|
|
GINA: Wireless sensor platform
UC Berkeley
2010-06-22
I helped Anita Flynn and Thomas Watteyne build these small sensors and wrote firmware.
|
|