How OpenNIH ranks institutions
OpenNIH ranks NIH funding— never “research impact” or “quality.” The score is computed only from NIH’s own records. This page states the data scope, the exact formula, and the known limitations so any reader can reproduce or contest a number.
Data scope
This site is built from the full NIH ExPORTER corpus: roughly 2.98 million project-year rows across fiscal years 1985–2026, summing about $840.8 billionin recorded NIH award dollars. Nearly half of all rows carry no recorded award amount (early years, contracts, and non-NIH agencies); those display as “Not reported,” never a fabricated $0, and the site counts project rows rather than funding wherever dollars are absent.
Source-of-truth principle.NIH official data is the only source. There are no external crosswalks and no editorial regrouping of entities; ambiguities are disclosed here, not “corrected” with outside input.
Data currency
NIH RePORTER updates weekly. This public-good instance may lag NIH depending on when its latest sync completed, so current records should be interpreted with the reported data currency in view.
The home page and official source-status endpoint report the source as-of date for this running instance when that metadata is available.
The ranking formula
Each institution is scored on four dimensions. Each dimension’s raw value is min-max normalized across all ranked institutions to a [0, 100] score, then combined with the published weights below:
| Dimension | Weight | Definition |
|---|---|---|
| Funding Scale | 40% | Total recorded NIH award dollars for the entity. |
| Portfolio Breadth | 30% | Count of peer-reviewed RPG-research project records. No activity-code “diversity” bonus — within RPG research the distinct codes are near-equivalent single-PI grants, so a bonus would reward labeling, not breadth. |
| New Grant Activity | 20% | Share of new (Type 1) awards, scaled by distinct PI count. “Type 1” is the leading digit of the standard NIH project number; contract-style numbers (e.g. N01…) are not type-coded and count as not-new. |
| Average Award Size | 10% | Mean recorded award dollars per project row. |
Dollar-less fiscal years (why some dimensions drop out)
When a fiscal year records no award dollars — as the earliest ExPORTER years do — the two dollar dimensions (Funding Scale and Average Award Size) are identical for every institution and carry no discriminating signal. Including them would inject a fixed, meaningless offset into every score.
So OpenNIH marks such dimensions inactive and computes the composite over only the dimensions that vary, with their weights renormalized to sum to 100%. In such a year the composite is therefore driven by Portfolio Breadth and New Grant Activity alone, and the displayed score is reproducible from the displayed dimensions. The rankings page states which dimensions were active for the year you are viewing.
Entity resolution
A ranking is only as good as its entity resolution. Institutions are aggregated on a stable canonical NIH entity id derived from the organization’s NIH DUNS combined with its NIH-recorded organization name. Combining the two means a shared-DUNS placeholder is split by name, so two distinct institutions that happen to share one DUNS (for example Yale and Harvard Medical School) stay separate ranked rows — never silently merged. When NIH records no DUNS, the entity falls back to a name-and-state key (the unlinked tail disclosed above), which may fragment one institution across name variants rather than over-merge distinct ones.
The competitive ranking compares only peer-reviewed, competed RPG-research grants (the R-series; R01 baseline). NIH intramural records (Z01 and ZIA), contracts (N01), cooperative agreements (U), centers (P), career/fellowship/training (K/F/T), small-business (R41–R44), and non-NIH awards are all excluded from the ranking: they are not mutually comparable to research project grants, so folding them onto one dollar/count axis would inflate an institution. The excluded mechanisms remain available in grant search and trends.
Coverage & undercounting disclosure
Institutions with no NIH organization identifier fall back to a name-and-state slug and never merge into a DUNS-linked entity, so their totals may be fragmented and undercounted. We disclose the split rather than hide it:
AI-assisted summaries
The optional AI summaries are a draft aid, not a data source. The model is constrained to the supplied local rows, returns its output through a structured tool call (so scores and lists are validated fields, not text parsed from prose), and is told to treat supplied data as data, not instructions. Authoritative counts, dates, names, and amounts always come from the displayed OpenNIH data.
Limitations
- No citation, publication, or research-outcome data is used; scores measure recorded NIH award data and portfolio shape only.
- In fiscal years that record no award dollars (the earliest ExPORTER years), the two dollar dimensions carry no signal and are marked inactive for that year, as disclosed above.
- Primary-institution attribution follows NIH’s records; sub-awards and consortium splits are not separately attributed.
- The unlinked (no-DUNS) tail may fragment an institution’s totals across name variants, as disclosed above.
- Entity resolution groups on NIH DUNS + NIH organization name. Its accuracy on name variants (the same institution recorded under different names) has not yet been independently evaluated against a gold standard — a planned generalization evaluation.
- The ranking score is not research impact, research quality, citation influence, publication output, patents, or clinical outcomes.
- PI pages are profiles of grants and publications linked through official NIH data, not public cross-PI rankings.
- Institution identity follows the loaded NIH organization identifiers and OpenNIH canonicalization; unresolved or unlinked rows are disclosed rather than corrected with third-party crosswalks.
- The runtime must show a verifier-clean /data page before official sidecar coverage can be claimed for that deployment.
- A non-clean /data page must leave source artifact names and fiscal-year/project-number exceptions visible instead of hiding them behind aggregate counts.
- Saved live sentinel/sample parity reports are bounded live NIH RePORTER checks for selected rows, not exhaustive proof that every NIH row still matches live upstream; source-status and /data disclose the exact RePORTER projects/search and publications/search criteria for those selected checks.
- Local fallback paths are intentionally labeled; fallback means a figure can still render, not that every displayed value came from the official sidecars.
- NIH RePORTER publications/search sidecars verify PMID, core-project, and application links; title, journal, year, citation count, and RCR fields are hidden unless a separate metadata source is loaded.
- Data-currency metadata uses the manifest-recorded retrieval timestamp when present. Older manifests without that field remain labeled rather than converted from local file mtimes into false retrieval claims.
Website Figure and Result Provenance
Every public surface should state which API and loaded data source backs the number or row shown to users.
GET /api/official/data-summaryGET /api/official/source-statusPOST /api/search/grantsGET /api/pi/{id}; GET /api/pi/{id}/grants; GET /api/official/projects; GET /api/pi/{id}/publications; GET /api/pi/{id}/collaboratorsGET /api/official/projects; GET /api/official/project-publicationsPOST /api/ai/analyze-pi; POST /api/ai/grant-competitiveness; POST /api/ai/field-insightsGET /api/rankings/*GET /api/trends/*GET /trust; POST /api/correctionsSource of truth
NIH official data is the source of truth. Project rows come from NIH ExPORTER project files. Official detail and publication sidecars are built from NIH RePORTER API v2 responses and are validated against the local project universe by the sidecar coverage gate. Publication sidecars are link tables for PMIDs, core projects, and application IDs.
Public result surfaces expose shared source links to the original NIH RePORTER website, the NIH RePORTER API, OpenNIH data coverage, and this methodology page so rendered figures can be traced back to both the official source and local verification evidence.
OpenNIH does not use third-party data providers or external institution crosswalks for public NIH funding claims in this workflow. Missing or unresolved official rows stay visible as coverage status instead of being silently replaced.