Skip to main content

4 posts tagged with "NRDS"

View All Tags

The NextGen Research DataStream (NRDS): A Reproducible Numerical Prediction System for Accelerating Research to Operations in Hydrology

· 10 min read
Jordan Laser
Software Engineer at Lynker
Arpita Patel
Assistant Director of DevOps and IT
Harsha Vemula
DevOps Engineer at Alabama Water Institute

Technological advances are evolving water prediction capabilities at a ludicrous pace. From revolutionary machine learning algorithms to dramatic advances in computational hardware, the potential for making accurate hydrologic predictions has never been higher. To meet this new potential, the hydrologic community continuously generates models and approaches based on cutting edge research that could potentially benefit operational systems. However, many of these innovations lack a path to operational deployment.

The NextGen Research Datastream (NRDS) provides a mechanism by which these ideas can be refined and make their way into operations.

Developed by Lynker and the Alabama Water Institute (a Cooperative Institute for Research to Operations in Hydrology partnership), the NRDS facilitates the actualization a research idea from the community in a scalable and deployable numerical prediction system. To evaluate each of these modeling concepts, NRDS deploys prototype models to generate a continuous “datastream”. These outputs can then be evaluated and made more accurate. This cycle of streamlined deployment and iterative design lets these prototypes mature into a product that can be picked up by an operational forecasting team.

To enable this process to be done rapidly and smoothly, the entire system is designed with reproducibility and iterative improvement as core principles. The NRDS is an automated numerical prediction system generating regular stream flow forecasts that uses the NextGen Water Resources Modeling Framework (NextGen) as the core modeling engine and NextGen In A Box (NGIAB) as the simulation environment. This system generates forecasts across the contiguous United States (CONUS) on CIROH's operational cyberinfrastructure backbone: the research-to-operations (R2O) Hybrid Cloud (R2OHC) platform, with deployment on the AWS cloud. What makes the NRDS exciting is that the entire system is open-sourced, reproducible, publicly browsable, and potentially editable by anyone in the hydrologic community.

Why the NRDS Matters

For decades, operational hydrologic modeling has been a largely closed process. National-scale models like the National Water Model (NWM) are developed and maintained by federal agencies where HPC access is limited. While these models frequently produce insights and forecasts of excellent quality, the simulations themselves have proven difficult to reproduce and improve upon from within the hydrologic research community, hindering the knowledge flow from research to operations. By contrast, the NRDS is totally public and open. This provides both researchers and national agencies with immediate access, reducing the latency between the discovery of new and potentially valuable knowledge and its operational implementation.

Deployed on AWS cloud from the ngen-datastream repository, the NRDS regularly ingests meteorological forcing data, processes it through the NextGen framework using community-configured parameters, and produces streamflow predictions for river reaches across the country. The forcing data, model configuration, infrastructure, and simulation outputs are all publicly available through the NRDS S3 data portal and the NRDS visualizer. Researchers can browse the results, compare them against observations, and reproduce or run their own simulations with the exact tooling that supports the NRDS. This transparency and usability enables researchers to readily propose improvements to any component of the NRDS system stack.

This is a fundamentally new paradigm for how national-scale water prediction can work: an open, iterative, community-driven feedback loop where regional expertise flows directly into an operational-like system.

Figure 1. An overview diagram of the NRDS architecture. NWM data from the NOAA is used as a forcing source, which is converted into catchment-averaged forcing files by the Forcing Processor. Meanwhile, DattreamCLI draws from the Community Hydrofabric and community parameters to generate BMI configurations. Together, these are used to execute a model run in NextGen in a Box, the outputs of which are stored in an AWS S3 bucket for browsing via datastream.ciroh.org.
Figure 1. An overview diagram of the NRDS architecture.

How the Community Can Contribute

This open contribution model is perhaps the most exciting aspect of the NRDS. The configuration files and NextGen model formulation that drive the daily simulations are publicly hosted and directly referenced during every execution. This means that community-proposed improvements to parameterization can flow directly into the operational system just by modifying these files.

The process is intentionally straightforward. A researcher who has improved upon the parameterization of a particular VPU realization file can navigate to the ngen-datastream Issues page on GitHub, open a new issue using the dedicated "Propose NRDS/NextGen Research DataStream parameter update" template, and submit their updated configuration along with supporting evidence. The CIROH team then reviews these contributions; once validated, they are merged and deployed into the live NRDS system.

This contribution pathway was a central theme at the 2025 CIROH Developers Conference (DevCon 2025), where a dedicated workshop walked participants through the complete process of proposing a configuration change. Attendees received access to virtual machines on NSF JetStream2 with pre-installed tooling, enabling them to run local simulations, compare results, and prepare a contribution — all within a single workshop session.

The NRDS is not designed to be a product delivered by a central team. It is a platform designed to absorb and amplify the collective expertise of the hydrologic community.

Multiple Datastreams Live and Scaling

The long-term potential of the NRDS is further amplified by its scalability in deploying additional model formulations to generate regular forecasts. The system now runs multiple concurrent datastreams, each using a different hydrologic modeling approach within the NextGen framework to generate streamflow forecasts.

Currently, the NRDS operates a CFE-NOM datastream across all CONUS and parallelized across Vector Processing Units (VPUs), which pairs the Conceptual Functional Equivalent (CFE) rainfall-runoff model with the Noah-OWP-Modular land surface model. Alongside this, an LSTM DataStream has been deployed for all CONUS, bringing cutting-edge machine learning-based streamflow prediction into an operational-like system. Both of these datastreams route the NextGen outputs using T-Route.

The most recent release, v2.2.0 (February 2026), deployed the LSTM model into the production NRDS system — a milestone that brings differentiable and machine learning-based hydrology into an automated, cloud-based production environment. This builds on the broader CIROH effort to integrate models like δHBV 2.0 from Penn State's MHPI group.

There is also a Routing-Only DataStream for VPU 03W. This datastream focuses on channel routing NWM outputs using T-Route only, a novel experiment by Quinn Lee.

The Broader Ecosystem

The NRDS is deeply interconnected with the broader suite of CIROH tools that are making NextGen modeling accessible. NextGen In A Box (NGIAB) provides the containerized runtime environment that researchers can use locally to test configurations before proposing them to the NRDS. The DataStreamCLI (now in its own repository) automates the complete workflow from data preprocessing to NextGen execution. The ForcingProcessor (also split out) handles the conversion of gridded meteorological data into the catchment-averaged forcings that NextGen requires. Once the model outputs are generated, TEEHR (Tools for Exploratory Evaluation in Hydrologic Research) provides standardized evaluation capabilities for comparing model outputs against observations (dashboard available here).

Together, these tools form a coherent pipeline: prepare data with the ForcingProcessor and DataStreamCLI, run simulations locally with NGIAB, evaluate results with TEEHR, and — if improvements are found — contribute them back to the NRDS for operational deployment.

What's Next

The trajectory of the NRDS points toward an increasingly capable and diverse operational system. Active development efforts include implementing the hourly, deep learning-based differentiable model δHBV 2.0 MTS, which was recently embedded into the NGIAB-NRDS ecosystem through a collaboration between Penn State and the Alabama Water Institute. This model has demonstrated continental-scale streamflow forecasting capabilities that rival the current National Water Model.

There is also ongoing work to make the NRDS outputs more accessible and useful to downstream consumers. The NRDS Visualizer now provides a web-based interface for browsing and exploring daily simulation results. The TEEHR Evaluation Dashboard now provides rapid evaluation of these simulation results, enabling researchers to gauge model performance as events are occurring and weigh the performance of each datastream (NextGen model-formulation) relative to other datastreams.

With NRDS ecosystem quickly reaching maturity, now is the time for hydrologic researchers to put their research ideas to the test and deploy them in this operational-like environment.

Get Involved

The NextGen Research DataStream represents a new kind of infrastructure for water science: one that is as much a social and organizational innovation as it is a technical one. By making national-scale hydrologic simulations open, reproducible, and community-editable, the NRDS creates the conditions for a distributed network of research hydrologists to incrementally improve the accuracy and reliability of streamflow predictions across the country.

Whether you are a graduate student exploring NextGen for the first time, a river forecast center hydrologist with deep regional expertise, or a researcher developing novel modeling approaches, the NRDS offers a concrete pathway from your work to operational impact.

Here's how to get started:

  • NRDS on the NextGen in a Box product portfolio website: ngiab.ciroh.org/#/nrds
  • Explore the data: Browse daily outputs at datastream.ciroh.org or the NRDS Visualizer
  • Examine the NRDS deployment status timeline to see which datastreams have been deployed, when, and at what scale: NRDS Status
  • Propose an improvement: Use the NRDS Issues page to submit your idea for a new datastream or an edit to an existing deployment.
  • Join the conversation: Participate in GitHub Discussions to connect with the community
  • Read the documentation: Visit the CIROH Hub NRDS page for comprehensive documentation on the NRDS system and broader ecosystem.
  • Try out NRDS tools: Clone the various NRDS repositories to experiment with the NRDS workflow.
    • DataStreamCLI (Figure 2): the on-server workflow tool used in NRDS simulations. This tool offers the ability to reproduce NRDS simulations, with a modular design for integrating research configurations and processing components.
    • ForcingProcessor (Figure 3): a scalable tool for processing NWM operational data files into NextGen inputs. This tool provides the processing to generate NRDS forcings.
Figure 2. The DataStreamCLI workflow. From left to right: 'LynkerSpatial Hydrofabric via hfsubset', 'National Water Model Forcings processing', 'NextGen and BMI config file generation', 'File and directory validation', 'NextGen in a Box', 'Data file hashing and metadata', 'Evaluation by TEEHR'.
Figure 2. The DataStreamCLI workflow.
Figure 3. An animated gif depicting forcing data before and after subsetting via ForcingProcessor. On the right, the processed data visibly matches the NWM original, but is neatly cut out to match the boundaries of the NGEN run.
Figure 3. Forcing data before and after subsetting via ForcingProcessor.

The future of water prediction is open. Come build it with us.


The NextGen Research DataStream is developed and maintained by CIROH at the University of Alabama, with funding under award NA22NWS4320003 from the NOAA Cooperative Institute Program. Learn more at ciroh.ua.edu.

Hourly Differentiable Modeling Arrives in the NGIAB-NRDS NextGen Ecosystem

· 9 min read
Leo Lonzarich
Graduate Researcher
Quinn Lee
Programmer Analyst
Josh Cunningham
Software Engineer
Benjamin Lee
Development Operations Engineer
Arpita Patel
Assistant Director of DevOps and IT

In October 2025, Penn State's Multi-scale Hydrology, Processes and Intelligence group (MHPI), led by Dr. Chaopeng Shen, and the Alabama Water Institute (AWI), led by Steve Burian and Arpita Patel, achieved a milestone R2O effort: the preliminary integration of δHBV 2.0 [4] -- a daily-scale, high-resolution, distributed differentiable model -- into a NextGen ecosystem. This resulted in the first adoption of a differentiable model into NextGen In A Box (NGIAB) [2] and provided an opportunity for CIROH researchers to fine-tune the δHBV 2.0 architecture for NextGen operation.

Having proven viability for daily timescale predictions on high-resolution river networks [4], MHPI researchers recently adapted δHBV 2.0 into a multi-timescale architecture designed to parameterize HBV and simulate streamflow at hourly intervals, at scale, across the NextGen HydroFabric. This new model, δHBV 2.0 MTS (Multi-TimeScale) [5], is a fusion of a daily and hourly δHBV 2.0 model designed to efficiently handle ML training with high geospatial and temporal complexity. (See MTS Architecture for more details about this construction.)

With δHBV 2.0 MTS maintaining similar forecasting skill compared to its daily-scale counterpart [5], Penn State and AWI were once again reunited in a joint effort to embed hourly scale differentiable modeling within AWI's operational ecosystem as a demonstration of model viability and to facilitate open access to its runtime.

Differentiable Models

δHBV 2.0 and δHBV 2.0 MTS differentiable model constructions are briefly outlined here to contextualize the development efforts. For further detail, see each model's respective citation. At their core, differentiable models embed traditional process-based equations (here, the HBV rainfall-runoff model) inside a machine learning training loop. Because these models are designed to be differentiable (e.g., in PyTorch), gradients flow end-to-end from the loss function back through the physical equations and into the neural networks that supply their parameters. This lets the model learn optimal parameterizations directly from observed data while still obeying mass-balance and storage constraints encoded in HBV -- combining interpretability and physical consistency of process-based hydrology with the flexibility of deep learning.

Moving Hydrologic Prediction Forward — A software integration meeting at the Alabama Water Institute

· 10 min read
Martyn Clark
Professor of Hydrology at University of Calgary
James Halgren
Assistant Director of Science
Matthew Denno
Senior Engineering Applications Developer at RTI International
Arpita Patel
Assistant Director of DevOps and IT
Josh Cunningham
Software Engineer
Quinn Lee
Programmer Analyst
Sam Lamont
Environmental Applications Developer at RTI International
Darri Eythorsson
Postdoctoral Researcher at University of Calgary
Cyril Thebault
Postdoctoral Associate at University of Calgary
Sifan A. Koriche
Research [Hydrologic] Scientist
Group photo from the software integration meeting at the Alabama Water Institute

Last week, at the invitation and expert coordination of James Halgren, teams from RTI International (Sam Lamont and Matt Denno) and the University of Calgary (Darri Eythorsson, Cyril Thebault, and Martyn Clark) met at AWI for an intensive working session focused on weaving recent CIROH research into AWI’s fork of the NOAA Office of Water Prediction (OWP) Next Generation Water Resources Modeling Framework (nicknamed “NextGen”). James took the lead in developing the agenda, lining up the right scientific and technical expertise and ensuring that the week targeted the most critical software integration challenges. Throughout the visit, the RTI and UCalgary teams collaborated closely with AWI software engineers Quinn Lee, Josh Cunningham, hydrologic scientist Sifan A. Koriche, and James himself. The days were filled with whiteboards, deep technical conversations, and strategic planning around the future of NextGen water prediction. This recap captures the key themes and the momentum that carried through the week.

Building Bridges: CIROH–Penn State Collaboration Formalizes Differentiable Modeling for NRDS

· 6 min read
Leo Lonzarich
Graduate Researcher
Quinn Lee
Programmer Analyst
Josh Cunningham
Software Engineer
Arpita Patel
Assistant Director of DevOps and IT
James Halgren
Assistant Director of Science

Almost from the start, 2025 has been a banner year in hydrologic modeling, with advancements in capabilities on both sides of the aisle of CIROH's research-to-operations (R2O) pipeline.

  • From the research skunkworks, Penn State's MHPI group, led by Dr. Chaopeng Shen introduced a new generation of distributed, differentiable hydrologic models spearheaded by δHBV 2.0. Capable of high-resolution, continental-scale streamflow forecasting across the CONUS Hydrofabric, δHBV 2.0 fuses process-based modeling and machine learning to enable efficient parameter calibration and interpretable predictions at scale -- with demonstrated viability as a National Water Model 3.0 successor.