Research Projects

Table of Contents

Generative AI and Large Language Models

At AWS GenAI, I lead research on the next generation of multimodal LLMs and agentic systems. My focus spans multi-agent reasoning, post-training and finetuning of multimodal LLMs, synthetic data generation, and agentic document intelligence.

Artificial Intelligence and Machine Learning

I work at the intersection of theoretical artificial intelligence and practical machine learning. I develop learning models that solve problems across a variety of domains, collaborating with domain experts while also designing and modifying the internal structure of various learning models.

  • Multi-modal and Multi-task Learning

    Primary Investigator: Md Mofijul Islam, University of Dhaka

    Representation learning has been widely applied in areas such as computer vision, natural language processing (NLP), and social network analysis. Most representation learning approaches solve inference problems using unimodal data. In recent years, with growing compute, multimodal learning has become central to many inference systems: by bringing in complementary information across modalities, it allows a system to learn stronger representations for each modality. For example, we can jointly learn visual and textual representations for Visual Question Answering systems.

    Outcome: Received the NVIDIA Academic GPU Grant to support this work.

  • Design Optimization and Evolutionary Approaches

    This project designs optimization and evolutionary approaches to solve NP-complete problems across domains — including resource allocation in cloud computing, mitigating overfitting, and improving the reasoning process in learning models.

    • Resource Allocation in Mobile Cloud Computing

      Supervisor: Dr. Md Abdur Razzaque, University of Dhaka

    We used two evolutionary approaches — Genetic Algorithms and Particle Swarm Optimization (specifically Ant Colony Optimization) — to design resource allocation schemes for heterogeneous Mobile Cloud Computing (MCC) environments. These meta-heuristic approaches minimize task execution time and improve resource utilization, which is critical for big-data-driven, resource-constrained cloud applications such as mobile and e-health systems.

    Outcome: IEEE Access 2017, NSysS 2016

    • Optimized Distributed Clustering Model in DSN

      Supervisor: Dr. Md Abdur Razzaque, University of Dhaka

    We developed a dynamic distributed clustering model that minimizes energy consumption and data collection time in directional sensor networks (DSNs) by reducing the number of active directional sensor nodes. The approach effectively increased network lifetime in DSNs.

    Outcome: EURASIP JWCN 2015, IEEE APWiMob 2014

  • Developer Question Answering and Repository Mining

    Supervisor: Md Mofijul Islam, United International University

    This project mines question-answering (QA) data — especially from Stack Overflow — along with data from software project repositories, with the goal of designing learning models that streamline the software development process. As part of this work, we developed an accepted-answer recommendation model, RAiTA, which ranks answers to Stack Overflow questions using textual and meta-features of the question, answer, and comments.

    Outcome: Springer IEMIS 2018

  • Interpretable Machine Learning

    Primary Investigator: Md Mofijul Islam, University of Dhaka

    This project builds tools that help people understand how learning models learn. In recent years, several complex models have achieved strong performance on difficult tasks, yet often fail to explain their internal reasoning — how their layers learn and which features each layer captures. We develop applications that help users understand these black-box learning processes, and we are also building tools to aid in debugging learning models.

    • d-DeVIS: A Gray-Box Interpretable Visual Debugging Approach for Deep Sequence Learning Models

    Deep learning algorithms are frequently treated as black boxes and are difficult to interpret. Their widespread use demands a deeper, more transparent understanding of their internal representations and decision-making. Models trained on sequential data — such as audio and video — have especially intricate internal reasoning due to complex feature distributions. A visual simulator can help trace internal decision-making in response to adversarial inputs, aiding both debugging and model design. We developed d-DeVIS, an interactive web application that visualizes the internal reasoning of a model trained on audio data, letting users interpret model behavior and debug it by interactively generating adversarial audio inputs.

    Outcome: ArXiv, Video Demo, Web Application, Source Code

Natural Language Processing

We design transfer learning approaches to improve a range of computational linguistic tasks, and we also build computational linguistic models and comprehensive datasets for the Bangla language. Relatively few works address Bangla, largely due to the complexity of the language and the scarcity of publicly available datasets.

  • Transfer Learning Approach to Fact Extraction and Statement Validation

    Primary Investigator: Md Mofijul Islam, University of Dhaka

With the proliferation of social media, statement validation has become a crucial problem for the NLP research community. However, progress has been limited by the lack of comprehensive datasets. In this project, we use transfer learning to build fact extraction and checking models from the limited data available.

Outcome: Accepted at IJCCI 2018.

  • Bangla Article Classification

    Supervisor: Md Mofijul Islam, University of Dhaka

We curated a comprehensive dataset of approximately 400,000 Bangla news articles collected from various Bangla news portals, and developed a Bangla article classification model using semantic textual features that outperforms state-of-the-art methods. We plan to extend this dataset to support additional Bangla NLP research problems.

Outcome: ICBSLP 2018 [Dataset & Code] [Web App]

  • Bangla Speech and Speaker Identification

    Primary Investigator: Md Mofijul Islam, University of Dhaka

In this project, we are developing a Bangla speech dataset to enable research on computational learning models for tasks such as Bangla voice recognition and speaker identification. To date, no public Bangla speech dataset has been available for research purposes.

Outcome: Data Collection App

Computational Biology

Supervisor: Dr. Swakkhar Shatabda, United International University

In this project, we collaborate with domain experts to develop learning models for a range of bioinformatics problems.

  • iProtGly-SS: Identifying Protein Lysine Glycation Sites Using Sequence Features

    Glycation is a chemical reaction in which a sugar molecule bonds with a protein without the aid of enzymes, and it is implicated in many diseases — so accurate identification of glycation sites is important. In this work, we designed a supervised learning model, iProtGly-SS, to identify protein lysine glycation sites from features extracted from sequence and secondary-structure information.

    Outcome: Proteins Journal 2018 [Code, Data & Web App]