Home / Methodology
Published, attackable methodology

How OpenNIH ranks institutions

OpenNIH ranks NIH funding— never “research impact” or “quality.” The score is computed only from NIH’s own records. This page states the data scope, the exact formula, and the known limitations so any reader can reproduce or contest a number.

Data scope

This site is built from the full NIH ExPORTER corpus: roughly 2.98 million project-year rows across fiscal years 1985–2026, summing about $840.8 billionin recorded NIH award dollars. Nearly half of all rows carry no recorded award amount (early years, contracts, and non-NIH agencies); those display as “Not reported,” never a fabricated $0, and the site counts project rows rather than funding wherever dollars are absent.

Source-of-truth principle.NIH official data is the only source. There are no external crosswalks and no editorial regrouping of entities; ambiguities are disclosed here, not “corrected” with outside input.

Data currency

NIH RePORTER updates weekly. This public-good instance may lag NIH depending on when its latest sync completed, so current records should be interpreted with the reported data currency in view.

The home page and official source-status endpoint report the source as-of date for this running instance when that metadata is available.

The ranking formula

Each institution is scored on four dimensions. Each dimension’s raw value is min-max normalized across all ranked institutions to a [0, 100] score, then combined with the published weights below:

DimensionWeightDefinition
Funding Scale40%Total recorded NIH award dollars for the entity.
Portfolio Breadth30%Count of peer-reviewed RPG-research project records. No activity-code “diversity” bonus — within RPG research the distinct codes are near-equivalent single-PI grants, so a bonus would reward labeling, not breadth.
New Grant Activity20%Share of new (Type 1) awards, scaled by distinct PI count. “Type 1” is the leading digit of the standard NIH project number; contract-style numbers (e.g. N01…) are not type-coded and count as not-new.
Average Award Size10%Mean recorded award dollars per project row.

Dollar-less fiscal years (why some dimensions drop out)

When a fiscal year records no award dollars — as the earliest ExPORTER years do — the two dollar dimensions (Funding Scale and Average Award Size) are identical for every institution and carry no discriminating signal. Including them would inject a fixed, meaningless offset into every score.

So OpenNIH marks such dimensions inactive and computes the composite over only the dimensions that vary, with their weights renormalized to sum to 100%. In such a year the composite is therefore driven by Portfolio Breadth and New Grant Activity alone, and the displayed score is reproducible from the displayed dimensions. The rankings page states which dimensions were active for the year you are viewing.

Entity resolution

A ranking is only as good as its entity resolution. Institutions are aggregated on a stable canonical NIH entity id derived from the organization’s NIH DUNS combined with its NIH-recorded organization name. Combining the two means a shared-DUNS placeholder is split by name, so two distinct institutions that happen to share one DUNS (for example Yale and Harvard Medical School) stay separate ranked rows — never silently merged. When NIH records no DUNS, the entity falls back to a name-and-state key (the unlinked tail disclosed above), which may fragment one institution across name variants rather than over-merge distinct ones.

The competitive ranking compares only peer-reviewed, competed RPG-research grants (the R-series; R01 baseline). NIH intramural records (Z01 and ZIA), contracts (N01), cooperative agreements (U), centers (P), career/fellowship/training (K/F/T), small-business (R41–R44), and non-NIH awards are all excluded from the ranking: they are not mutually comparable to research project grants, so folding them onto one dollar/count axis would inflate an institution. The excluded mechanisms remain available in grant search and trends.

Coverage & undercounting disclosure

Institutions with no NIH organization identifier fall back to a name-and-state slug and never merge into a DUNS-linked entity, so their totals may be fragmented and undercounted. We disclose the split rather than hide it:

AI-assisted summaries

The optional AI summaries are a draft aid, not a data source. The model is constrained to the supplied local rows, returns its output through a structured tool call (so scores and lists are validated fields, not text parsed from prose), and is told to treat supplied data as data, not instructions. Authoritative counts, dates, names, and amounts always come from the displayed OpenNIH data.

Limitations

  • No citation, publication, or research-outcome data is used; scores measure recorded NIH award data and portfolio shape only.
  • In fiscal years that record no award dollars (the earliest ExPORTER years), the two dollar dimensions carry no signal and are marked inactive for that year, as disclosed above.
  • Primary-institution attribution follows NIH’s records; sub-awards and consortium splits are not separately attributed.
  • The unlinked (no-DUNS) tail may fragment an institution’s totals across name variants, as disclosed above.
  • Entity resolution groups on NIH DUNS + NIH organization name. Its accuracy on name variants (the same institution recorded under different names) has not yet been independently evaluated against a gold standard — a planned generalization evaluation.
  • The ranking score is not research impact, research quality, citation influence, publication output, patents, or clinical outcomes.
  • PI pages are profiles of grants and publications linked through official NIH data, not public cross-PI rankings.
  • Institution identity follows the loaded NIH organization identifiers and OpenNIH canonicalization; unresolved or unlinked rows are disclosed rather than corrected with third-party crosswalks.
  • The runtime must show a verifier-clean /data page before official sidecar coverage can be claimed for that deployment.
  • A non-clean /data page must leave source artifact names and fiscal-year/project-number exceptions visible instead of hiding them behind aggregate counts.
  • Saved live sentinel/sample parity reports are bounded live NIH RePORTER checks for selected rows, not exhaustive proof that every NIH row still matches live upstream; source-status and /data disclose the exact RePORTER projects/search and publications/search criteria for those selected checks.
  • Local fallback paths are intentionally labeled; fallback means a figure can still render, not that every displayed value came from the official sidecars.
  • NIH RePORTER publications/search sidecars verify PMID, core-project, and application links; title, journal, year, citation count, and RCR fields are hidden unless a separate metadata source is loaded.
  • Data-currency metadata uses the manifest-recorded retrieval timestamp when present. Older manifests without that field remain labeled rather than converted from local file mtimes into false retrieval claims.

Website Figure and Result Provenance

Every public surface should state which API and loaded data source backs the number or row shown to users.

Surface
Runtime API
Provenance
Home summary
GET /api/official/data-summary
Counts and awards are derived from the loaded projects parquet; award totals use the NIH RePORTER award overlay when the official-detail sidecar is present, with local fallback disclosed by the funding-basis label.
Data coverage
GET /api/official/source-status
Shows whether the projects parquet, official detail sidecar, official publication sidecar, and sidecar manifest are loaded and verifier-clean for this runtime. It also renders loaded source artifact basenames, checked manifest identity, fiscal-year/project-pair coverage, missing or extra project-number exceptions, official-detail field population, official publication-link counts, and saved live sentinel/sample parity report filenames with checked project counts, deterministic sample size/seed, eligible/covered/missed strata, mismatch counts, publication-link gaps, and exact NIH RePORTER projects/search plus publications/search source-query evidence.
Search results
POST /api/search/grants
Grant rows expose official award, direct cost, indirect cost, admin IC, UEI, and NIH RePORTER links when the detail sidecar is loaded; filters and sorts use official award amounts before local fallback.
PI profiles
GET /api/pi/{id}; GET /api/pi/{id}/grants; GET /api/official/projects; GET /api/pi/{id}/publications; GET /api/pi/{id}/collaborators
The profile and grant list start from the project parquet; official grant panels are fetched from the official-detail endpoint, PMID links come through the PI publications endpoint with official publication sidecar and metadata-source status, and collaboration-network rows use official-detail principal_investigators_json when present with local fallback disclosed.
Project details
GET /api/official/projects; GET /api/official/project-publications
The project page exposes every modeled NIH RePORTER official-detail field for each loaded detail row and lists project-level PubMed links from the official publication sidecar.
AI analysis
POST /api/ai/analyze-pi; POST /api/ai/grant-competitiveness; POST /api/ai/field-insights
PI analysis prompts include official-detail award amounts and abstract excerpts when the detail sidecar is loaded. Grant competitiveness computes topic-matched counts and funding stats from official-detail awards, RePORTER terms, and abstract text when loaded. Field insights computes topic matches, statistics, distributions, and trend data from the same official-detail overlay. Rendered results label whether they used the NIH RePORTER award overlay with local fallback or local project award amounts.
Rankings
GET /api/rankings/*
Institution ranking totals use the official award overlay when present; dimensions remain funding and portfolio measures only.
Trends
GET /api/trends/*
Funding trends, activity-code funding, and topic trends use official award amounts and official RePORTER terms when available, with local project fields as fallback.
Trust and corrections
GET /trust; POST /api/corrections
The public trust surface states independence, conflict-of-interest, corrections, data-currency, and methods-board status. Correction reports are validated into a tracked GitHub issue draft and remain separate from source data until reviewed against official NIH evidence.

Source of truth

NIH official data is the source of truth. Project rows come from NIH ExPORTER project files. Official detail and publication sidecars are built from NIH RePORTER API v2 responses and are validated against the local project universe by the sidecar coverage gate. Publication sidecars are link tables for PMIDs, core projects, and application IDs.

Public result surfaces expose shared source links to the original NIH RePORTER website, the NIH RePORTER API, OpenNIH data coverage, and this methodology page so rendered figures can be traced back to both the official source and local verification evidence.

OpenNIH does not use third-party data providers or external institution crosswalks for public NIH funding claims in this workflow. Missing or unresolved official rows stay visible as coverage status instead of being silently replaced.

← Back to rankings