Analytics Research Institute Data-driven insight for research and evaluation

TopicVista: Explore NIH Research Topics

Discover how more than 42,000 FY2024 NIH grant awards cluster into research topics and how those topics are distributed across Institutes and Centers (ICs).

42,000+ FY2024 NIH grant abstracts modeled
NIH ICs Topic patterns compared across Institutes and Centers
5 views Topic resolutions from k = 30 to k = 5
PubMedBERT + BERTopic Biomedical embedding and topic modeling pipeline

TopicVista is designed for evaluators, researchers, policy analysts, and others who want to understand the thematic structure of NIH-funded research portfolios. Rather than relying only on predefined categories or keyword counts, TopicVista uses biomedical language embeddings to identify groups of research awards that are similar in meaning, even when they use different terminology.

Start with the 30-topic model for the most detailed view of the NIH research landscape. Then use the broader topic maps below to compare patterns at lower levels of thematic resolution.

Compare Broader Topic Resolutions

Use these alternative views to zoom out from the detailed 30-topic model. Lower values of k group awards into broader thematic areas, which can help reveal larger portfolio-level patterns.

k = 20 Finer thematic distinctions with slightly broader grouping than k = 30.
k = 15 Mid-level topic structure for comparing major research themes.
k = 10 Broad areas of NIH-funded research activity.
k = 5 The broadest view of the portfolio's thematic structure.

Explore 20 Research Topics

This view groups awards into 20 research topics. It preserves substantial topical specificity while making broader cross-IC patterns easier to scan.

How to Explore
  1. Start with a topic. Use the bar chart to identify areas with the most awards.
  2. Follow the topic across NIH. Trace the same row into the bubble chart to see which ICs are most active.
  3. Start with an Institute. Scan up from an IC to discover its strongest research themes.
  4. Spot shared priorities. Look for large bubbles across multiple ICs to identify cross-cutting areas.

Compare 15 Major Research Themes

This view groups awards into 15 research topics. It offers a mid-level view for comparing major patterns across ICs without the full granularity of the k = 20 or k = 30 models.

How to Explore
  1. Start with a topic. Use the bar chart to identify areas with the most awards.
  2. Follow the topic across NIH. Trace the same row into the bubble chart to see which ICs are most active.
  3. Start with an Institute. Scan up from an IC to discover its strongest research themes.
  4. Spot shared priorities. Look for large bubbles across multiple ICs to identify cross-cutting areas.

View 10 Broad Areas of Research

This view groups awards into 10 broad research topics. It is useful for seeing major areas of NIH-funded research activity and comparing how those areas are distributed across ICs.

How to Explore
  1. Start with a topic. Use the bar chart to identify areas with the most awards.
  2. Follow the topic across NIH. Trace the same row into the bubble chart to see which ICs are most active.
  3. Start with an Institute. Scan up from an IC to discover its strongest research themes.
  4. Spot shared priorities. Look for large bubbles across multiple ICs to identify cross-cutting areas.

See the Big Picture Across 5 Research Themes

This view groups awards into 5 broad research topics. It provides the broadest summary of the NIH portfolio and is useful for a quick overview of the dataset's major thematic structure.

How to Explore
  1. Start with a topic. Use the bar chart to identify areas with the most awards.
  2. Follow the topic across NIH. Trace the same row into the bubble chart to see which ICs are most active.
  3. Start with an Institute. Scan up from an IC to discover its strongest research themes.
  4. Spot shared priorities. Look for large bubbles across multiple ICs to identify cross-cutting areas.

Learn How TopicVista Works

1 Prepare NIH text Clean and combine project titles and abstracts.
2 Create embeddings Use PubMedBERT to represent biomedical meaning.
3 Model topics Use BERTopic to organize awards into themes.
4 Cluster documents Reduce dimensions with UMAP and cluster with KMeans.
5 Review labels Combine keyword extraction, OpenAI-assisted naming, and expert review.

TopicVista was developed using FY2024 NIH RePORTER ExPORTER project and abstract data, focusing on extramural NIH-funded research projects. After filtering, deduplication, and preprocessing, the final dataset included more than 42,000 unique grant abstracts. Project titles and abstracts were cleaned and combined to create the text used for topic modeling.

The topic modeling workflow combined PubMedBERT and BERTopic to identify scientifically meaningful groups of NIH awards.

First, project titles and abstracts were processed using PubMedBERT, a transformer-based language model trained on biomedical literature. PubMedBERT converted each award into a high-dimensional embedding that captures the semantic content of the text, allowing awards with similar scientific focus to be represented near one another even when they use different terminology.

These embeddings were then analyzed using BERTopic, a topic modeling framework designed to identify themes within large collections of documents. Within the BERTopic pipeline, embeddings were reduced in dimensionality using UMAP and grouped into topic clusters using KMeans clustering. Separate topic models were generated for multiple levels of granularity (k = 5, 10, 15, 20, and 30), enabling exploration of both broad scientific domains and more specialized research areas.

Topic labels were developed through a combination of keyword extraction, OpenAI-assisted topic naming, and manual expert review to improve interpretability and scientific relevance.

Model performance was evaluated using quantitative measures, including topic coherence and silhouette scores, as well as qualitative review. The resulting visualizations display the distribution of NIH grants across identified research topics and NIH Institutes and Centers (ICs).

Further information and data underlying these visualizations are available in the GitHub Data Repository.

Keep These Considerations in Mind

TopicVista is intended as an exploratory tool. The topics are generated from model-based analysis of grant titles and abstracts, and they should be interpreted alongside substantive expertise and knowledge of NIH programs.

The visualizations show award counts by topic and IC. They do not, by themselves, represent funding amounts, award size, scientific impact, or program priority.

Embedding-based topic models can reveal patterns that may be difficult to detect through predefined categories alone, but the results depend on modeling choices, preprocessing decisions, and interpretation of the generated topic labels.

Bring TopicVista to Your Portfolio

Want to map the scientific landscape of your own funding portfolio, research program, or organization? TopicVista can be adapted to visualize topic structure, portfolio overlap, and areas of strategic opportunity. Contact the Analytics Research Institute at info@the-ari.com to learn more.

Use or Reference TopicVista

Analytics Research Institute. TopicVista: NIH Topic Explorer. GitHub repository, 2026. https://github.com/analyticsresearchinstitute/nih-topic-explorer.