Shreyas Dixit
A Machine Learning Engineer and Deep Learning Researcher specializing in Multimodal AI, Generative AI, and building accessible technology.
An AI advocate who bridges the gap between advanced research and real-world impact. I am fortunate enough to work with and get mentored by Dr. Amitava Das, Vasu Sharma, Aman Chadha, and Vinija Jain.
Experience
Lead the development of end-to-end AI systems for Embodied AI, integrating computer vision, kinematics, and real-time inference backends.
Designed and implemented an AI pipeline for a dexterous robotic arm achieving 97% task accuracy and 2mm precision for object manipulation tasks under challenging conditions like fog and occlusions.
Developed the AI-Hand Bartender system, enabling autonomous serving through sequence learning and object detection, and created a novel video-based learning mechanism to extract 6D pose data from human demonstrations.
Explored reinforcement learning (RL) in NVIDIA Isaac Sim using algorithms like ALOHA and Diffusion Policy, and implemented real-time camera calibration requiring only six examples.
Engaged in research on computer vision, AI detectability, and LLM hallucination, focusing on watermarking techniques and provenance layers.
Developing "Peccavi", a visual paraphrase attack-safe watermarking technique to ensure traceability and authenticity of synthetic media.
Served as Associate Organizer of the Defactify 4.0 workshop at AAAI 2025, under the guidance of Dr. Amitava Das to enhance AI robustness.
Developed a Stutter Detectionsystem utilizing Wav2Vec2 and Agnostic BERT to detect stuttering timestamps and compute percentages.
Trained models on Indian datasets achieving a WER of 0.31, and built an accessible front-end application for on-device deployment.
Led the Microsoft Learn Student Club at VIIT Pune, fostering a strong technical learning community.
Mentored students and organized machine learning initiatives (Nov 2022 – Aug 2024).
Built an end-to-end system using open-source deep learning models to generate contextually relevant titles for millions of Instagram short-form videos.
Researched, optimized, and deployed the system to production with strong cost and time efficiency.
Education
CGPA: 9.2 / 10.0
B.Tech in Electronics and Telecommunication
Focused on Computer Science fundamentals.
Selected Projects
Research Publications
Assistance Platform for Visually Impaired Person using Image Captioning
2023Inventor: Shreyas Dixit
Invented a real-time multimodal video narration platform designed to assist visually impaired individuals by converting visual data into descriptive audio, enhancing accessibility through computer vision technology. (Patent No. 202321004399)
PECCAVI: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking
2025S Dixit, A Aziz, S Bajpai, V Sharma, A Chadha, V Jain, A Das
Proposed a novel watermarking technique for AI-generated images that remains robust against "visual paraphrase" attacks—subtle distortions that evade traditional detection—while maintaining zero visual distortion in the source media.
Tech Stack
I specialize in bridging the gap between advanced research and production-grade infrastructure. While I am adaptable across the stack, I focus on architecting deterministic agentic systems and high-precision perception pipelines.
Recommendations
Shreyas possesses exceptional research and development skills. I have personally witnessed his passionate dedication and work ethic, seeing him dive deep into complex projects to deliver high-quality results. He approaches development with a level of commitment that truly stands out.
I’ve had the pleasure of working with Shreyas for over a year and can confidently say he is an exceptional professional. He has a strong grasp of computer vision concepts and a genuine passion for research and exploring new ideas. From rapid prototyping to production-ready solutions and clear documentation, Shreyas consistently delivers high-quality work. He is reliable, skilled, and a valuable asset to any team.
Writings & Blogs
I host my thoughts on Medium to share insights on AI systems, product strategy, and technical architecture.

Gradients and Parameters: Monitoring Models at Scale
Aug 4, 2025
Breaking down the two key components of neural networks: parameters (the model's memory) and gradients (the learning signal).

PaliGemma: The Future of Vision-Language AI
Dec 22, 2024
An intuitive dive into PaliGemma's architecture and capabilities, exploring how modern VLMs bridge the gap between vision and language.

Understanding Multi-Headed YOLOv9
May 20, 2024
A comprehensive guide to Object Detection and Segmentation using YOLOv9, analyzing its backbone, neck, and head architecture.

Swin Transformers for Semantic Segmentation
Apr 9, 2024
Understanding and coding Swin Transformers from scratch. A deep dive into shifted windows and hierarchical feature maps.
Life Outside Work
When I’m not working, I’m usually watching Formula 1 or catching a Manchester United match. I love the intensity, the strategy, and the drama that comes with competitive sports — there’s something about it that’s just exciting every single time.

I also enjoy winding down with a good TV show — Suits is definitely one of my favorites (Harvey Specter is hard not to like). But more than anything, I value spending time with friends. Whether it’s talking about football, sharing ideas, or just relaxing together, that’s what really helps me recharge.