My current research focuses on natural language processing and modern digital libraries. We apply tools from machine learning, especially deep learning, to address the challenges. Example problems include extraction of keyphrases from documents, extraction of entities and relations from documents, identification of topics and their distribution in text corpora, and automatic question-answering. We are also interested in the theoretical understanding of deep neural networks. In the recent past, I have also worked on mobile computing and wireless networks.
Postdoc, IIT Kharagpur
Advisors: Prof. Partha Pratim Das (IIT Kgp), Prof. Plaban Kumar Bhowmick (IIT Kgp)
PhD (Engineering), Jadavpur University
Advisors: Prof. Samiran Chattopadhyay (JU), Prof. Matangini Chattopadhyay (JU)
BE (Information Technology), Jadavpur University
Project Advisor: Prof. Uttam Kumar Roy (JU)
Previous employment with: Infosys (Bangalore), Interra Systems (Kolkata), Xilinx (Hyderabad), KIIT Deemed University (Bhubaneswar), Jadavpur University (Kolkata) [Guest faculty]
COM-4203/PHD-226 (Machine Learning): 2022 (Spring), 2021 (Spring).
COM-5103/PHD-130 (Advanced Machine Learning): 2021 (Autumn), 2020 (Autumn)
COM-2211 (Data Structures & Algorithms Lab): 2022 (Spring), 2021 (Spring) [shared]
COM-2101B (Data Structures & Algorithms): 2020 (Autumn) [shared]
Candidates interested to pursue research in machine learning / natural language processing are encouraged to apply. Note that, in order to join the PhD program of IACS, the candidate must satisfy the eligibility criteria as mentioned in the advertisement for PhD admission published by the institute from time to time.
[My publications are available in Google Scholar.]
“Libraries store the energy that fuels the imagination,” said Sidney Sheldon. Web-scale digital libraries with their large collections of books, manuscripts, photographs, maps, microfilms, newspapers, periodicals, audio and video tapes simplify and broaden access to information and knowledge. What will digital libraries of the future look like? How can artificial intelligence (AI) contribute to building better libraries and thereby, advance technology-enabled education? How can AI help us navigate our collective knowledge stored in the born-digital and digitized collections spanning centuries of our existence on this planet? These questions are at the root of our scientific investigations where we intertwine AI and information science to reimagine libraries of the future. What follows is a selection of the problems we are currently working on.
1. Scientific literature mining: Scholarly digital libraries allow researchers to discover and read research papers. However, given the huge rate at which papers, especially in science, engineering and medicine, are published today, it is difficult to closely follow even a narrow subfield or discover the right resources with existing search engines. This motivates our interest in new methods of analyzing and indexing scientific papers so that the information tucked away in them is easily discoverable by users. For example, we are investigating ways – inspired by deep learning techniques – to automatically extract keywords from a paper, infer the discourse structure of an article, generate useful summaries of papers, and find semantic similarity between a pair of research papers. We have leveraged semantic similarity between papers to build Surrogator, a tool that retrieves open-access surrogates of access-restricted papers. Specifically, if a paper is behind a paywall but a very similar paper from the same authors is available as an open-access document on the web, the latter is presented to a user who cannot afford to access the former.
Sanyal, D. K., Bhowmick, P. K., Das, P. P., Chattopadhyay, S., & Santosh, T. Y. S. S. (2019). Enhancing access to scholarly publications with surrogate resources. Scientometrics, 121(2), 1129-1164.
2. Topic models for text corpora: Given a large collection of text documents, can we identify the salient themes running through the documents? These salient themes or topics can help readers navigate the collection quicky albeit at a high level. A reader interested in a specific topic can retrieve the documents focused on that topic and analyze them in depth. Several techniques ranging from non-negative matrix factorization to probabilistic graphical models to deep neural networks have been exploited to build topic models for text corpora. However, many of these algorithms do not scale well when the collection is large or there are too many topics. Therefore, their acceptance in real libraries is still limited. We study these algorithms including their performance bottlenecks. We are also interested to apply topic modeling techniques to build navigation systems for large document collections.