Skip to main content

8 posts tagged with "AWS"

View All Tags

The NextGen Research DataStream (NRDS): A Reproducible Numerical Prediction System for Accelerating Research to Operations in Hydrology

· 10 min read
Jordan Laser
Software Engineer at Lynker
Arpita Patel
Assistant Director of DevOps and IT
Harsha Vemula
DevOps Engineer at Alabama Water Institute

Technological advances are evolving water prediction capabilities at a ludicrous pace. From revolutionary machine learning algorithms to dramatic advances in computational hardware, the potential for making accurate hydrologic predictions has never been higher. To meet this new potential, the hydrologic community continuously generates models and approaches based on cutting edge research that could potentially benefit operational systems. However, many of these innovations lack a path to operational deployment.

The NextGen Research Datastream (NRDS) provides a mechanism by which these ideas can be refined and make their way into operations.

Developed by Lynker and the Alabama Water Institute (a Cooperative Institute for Research to Operations in Hydrology partnership), the NRDS facilitates the actualization a research idea from the community in a scalable and deployable numerical prediction system. To evaluate each of these modeling concepts, NRDS deploys prototype models to generate a continuous “datastream”. These outputs can then be evaluated and made more accurate. This cycle of streamlined deployment and iterative design lets these prototypes mature into a product that can be picked up by an operational forecasting team.

To enable this process to be done rapidly and smoothly, the entire system is designed with reproducibility and iterative improvement as core principles. The NRDS is an automated numerical prediction system generating regular stream flow forecasts that uses the NextGen Water Resources Modeling Framework (NextGen) as the core modeling engine and NextGen In A Box (NGIAB) as the simulation environment. This system generates forecasts across the contiguous United States (CONUS) on CIROH's operational cyberinfrastructure backbone: the research-to-operations (R2O) Hybrid Cloud (R2OHC) platform, with deployment on the AWS cloud. What makes the NRDS exciting is that the entire system is open-sourced, reproducible, publicly browsable, and potentially editable by anyone in the hydrologic community.

Why the NRDS Matters

For decades, operational hydrologic modeling has been a largely closed process. National-scale models like the National Water Model (NWM) are developed and maintained by federal agencies where HPC access is limited. While these models frequently produce insights and forecasts of excellent quality, the simulations themselves have proven difficult to reproduce and improve upon from within the hydrologic research community, hindering the knowledge flow from research to operations. By contrast, the NRDS is totally public and open. This provides both researchers and national agencies with immediate access, reducing the latency between the discovery of new and potentially valuable knowledge and its operational implementation.

Deployed on AWS cloud from the ngen-datastream repository, the NRDS regularly ingests meteorological forcing data, processes it through the NextGen framework using community-configured parameters, and produces streamflow predictions for river reaches across the country. The forcing data, model configuration, infrastructure, and simulation outputs are all publicly available through the NRDS S3 data portal and the NRDS visualizer. Researchers can browse the results, compare them against observations, and reproduce or run their own simulations with the exact tooling that supports the NRDS. This transparency and usability enables researchers to readily propose improvements to any component of the NRDS system stack.

This is a fundamentally new paradigm for how national-scale water prediction can work: an open, iterative, community-driven feedback loop where regional expertise flows directly into an operational-like system.

Figure 1. An overview diagram of the NRDS architecture. NWM data from the NOAA is used as a forcing source, which is converted into catchment-averaged forcing files by the Forcing Processor. Meanwhile, DattreamCLI draws from the Community Hydrofabric and community parameters to generate BMI configurations. Together, these are used to execute a model run in NextGen in a Box, the outputs of which are stored in an AWS S3 bucket for browsing via datastream.ciroh.org.
Figure 1. An overview diagram of the NRDS architecture.

How the Community Can Contribute

This open contribution model is perhaps the most exciting aspect of the NRDS. The configuration files and NextGen model formulation that drive the daily simulations are publicly hosted and directly referenced during every execution. This means that community-proposed improvements to parameterization can flow directly into the operational system just by modifying these files.

The process is intentionally straightforward. A researcher who has improved upon the parameterization of a particular VPU realization file can navigate to the ngen-datastream Issues page on GitHub, open a new issue using the dedicated "Propose NRDS/NextGen Research DataStream parameter update" template, and submit their updated configuration along with supporting evidence. The CIROH team then reviews these contributions; once validated, they are merged and deployed into the live NRDS system.

This contribution pathway was a central theme at the 2025 CIROH Developers Conference (DevCon 2025), where a dedicated workshop walked participants through the complete process of proposing a configuration change. Attendees received access to virtual machines on NSF JetStream2 with pre-installed tooling, enabling them to run local simulations, compare results, and prepare a contribution — all within a single workshop session.

The NRDS is not designed to be a product delivered by a central team. It is a platform designed to absorb and amplify the collective expertise of the hydrologic community.

Multiple Datastreams Live and Scaling

The long-term potential of the NRDS is further amplified by its scalability in deploying additional model formulations to generate regular forecasts. The system now runs multiple concurrent datastreams, each using a different hydrologic modeling approach within the NextGen framework to generate streamflow forecasts.

Currently, the NRDS operates a CFE-NOM datastream across all CONUS and parallelized across Vector Processing Units (VPUs), which pairs the Conceptual Functional Equivalent (CFE) rainfall-runoff model with the Noah-OWP-Modular land surface model. Alongside this, an LSTM DataStream has been deployed for all CONUS, bringing cutting-edge machine learning-based streamflow prediction into an operational-like system. Both of these datastreams route the NextGen outputs using T-Route.

The most recent release, v2.2.0 (February 2026), deployed the LSTM model into the production NRDS system — a milestone that brings differentiable and machine learning-based hydrology into an automated, cloud-based production environment. This builds on the broader CIROH effort to integrate models like δHBV 2.0 from Penn State's MHPI group.

There is also a Routing-Only DataStream for VPU 03W. This datastream focuses on channel routing NWM outputs using T-Route only, a novel experiment by Quinn Lee.

The Broader Ecosystem

The NRDS is deeply interconnected with the broader suite of CIROH tools that are making NextGen modeling accessible. NextGen In A Box (NGIAB) provides the containerized runtime environment that researchers can use locally to test configurations before proposing them to the NRDS. The DataStreamCLI (now in its own repository) automates the complete workflow from data preprocessing to NextGen execution. The ForcingProcessor (also split out) handles the conversion of gridded meteorological data into the catchment-averaged forcings that NextGen requires. Once the model outputs are generated, TEEHR (Tools for Exploratory Evaluation in Hydrologic Research) provides standardized evaluation capabilities for comparing model outputs against observations (dashboard available here).

Together, these tools form a coherent pipeline: prepare data with the ForcingProcessor and DataStreamCLI, run simulations locally with NGIAB, evaluate results with TEEHR, and — if improvements are found — contribute them back to the NRDS for operational deployment.

What's Next

The trajectory of the NRDS points toward an increasingly capable and diverse operational system. Active development efforts include implementing the hourly, deep learning-based differentiable model δHBV 2.0 MTS, which was recently embedded into the NGIAB-NRDS ecosystem through a collaboration between Penn State and the Alabama Water Institute. This model has demonstrated continental-scale streamflow forecasting capabilities that rival the current National Water Model.

There is also ongoing work to make the NRDS outputs more accessible and useful to downstream consumers. The NRDS Visualizer now provides a web-based interface for browsing and exploring daily simulation results. The TEEHR Evaluation Dashboard now provides rapid evaluation of these simulation results, enabling researchers to gauge model performance as events are occurring and weigh the performance of each datastream (NextGen model-formulation) relative to other datastreams.

With NRDS ecosystem quickly reaching maturity, now is the time for hydrologic researchers to put their research ideas to the test and deploy them in this operational-like environment.

Get Involved

The NextGen Research DataStream represents a new kind of infrastructure for water science: one that is as much a social and organizational innovation as it is a technical one. By making national-scale hydrologic simulations open, reproducible, and community-editable, the NRDS creates the conditions for a distributed network of research hydrologists to incrementally improve the accuracy and reliability of streamflow predictions across the country.

Whether you are a graduate student exploring NextGen for the first time, a river forecast center hydrologist with deep regional expertise, or a researcher developing novel modeling approaches, the NRDS offers a concrete pathway from your work to operational impact.

Here's how to get started:

  • NRDS on the NextGen in a Box product portfolio website: ngiab.ciroh.org/#/nrds
  • Explore the data: Browse daily outputs at datastream.ciroh.org or the NRDS Visualizer
  • Examine the NRDS deployment status timeline to see which datastreams have been deployed, when, and at what scale: NRDS Status
  • Propose an improvement: Use the NRDS Issues page to submit your idea for a new datastream or an edit to an existing deployment.
  • Join the conversation: Participate in GitHub Discussions to connect with the community
  • Read the documentation: Visit the CIROH Hub NRDS page for comprehensive documentation on the NRDS system and broader ecosystem.
  • Try out NRDS tools: Clone the various NRDS repositories to experiment with the NRDS workflow.
    • DataStreamCLI (Figure 2): the on-server workflow tool used in NRDS simulations. This tool offers the ability to reproduce NRDS simulations, with a modular design for integrating research configurations and processing components.
    • ForcingProcessor (Figure 3): a scalable tool for processing NWM operational data files into NextGen inputs. This tool provides the processing to generate NRDS forcings.
Figure 2. The DataStreamCLI workflow. From left to right: 'LynkerSpatial Hydrofabric via hfsubset', 'National Water Model Forcings processing', 'NextGen and BMI config file generation', 'File and directory validation', 'NextGen in a Box', 'Data file hashing and metadata', 'Evaluation by TEEHR'.
Figure 2. The DataStreamCLI workflow.
Figure 3. An animated gif depicting forcing data before and after subsetting via ForcingProcessor. On the right, the processed data visibly matches the NWM original, but is neatly cut out to match the boundaries of the NGEN run.
Figure 3. Forcing data before and after subsetting via ForcingProcessor.

The future of water prediction is open. Come build it with us.


The NextGen Research DataStream is developed and maintained by CIROH at the University of Alabama, with funding under award NA22NWS4320003 from the NOAA Cooperative Institute Program. Learn more at ciroh.ua.edu.

AWS re:Invent 2025: Key Insights for Research and Cyberinfrastructure

· 4 min read
Arpita Patel
Assistant Director of DevOps and IT
Scott Hendrickson
Sr Solutions Architect WWPS Education at AWS
A photo from AWS re:Invent 2025

AI, DevOps and the Future of Cloud Infrastructure

AWS re:Invent did not disappoint! I spent the first week of December at Amazon Web Services' flagship conference in Las Vegas. The event delivered cutting-edge technical insights, showcased the rapid evolution of cloud computing and AI, and provided countless opportunities to connect with industry leaders.

The energy across all five conference venues was more vibrant than I ever imagined it would be.

Moving Hydrologic Prediction Forward — A software integration meeting at the Alabama Water Institute

· 10 min read
Martyn Clark
Professor of Hydrology at University of Calgary
James Halgren
Assistant Director of Science
Matthew Denno
Senior Engineering Applications Developer at RTI International
Arpita Patel
Assistant Director of DevOps and IT
Josh Cunningham
Software Engineer
Quinn Lee
Programmer Analyst
Sam Lamont
Environmental Applications Developer at RTI International
Darri Eythorsson
Postdoctoral Researcher at University of Calgary
Cyril Thebault
Postdoctoral Associate at University of Calgary
Sifan A. Koriche
Research [Hydrologic] Scientist
Group photo from the software integration meeting at the Alabama Water Institute

Last week, at the invitation and expert coordination of James Halgren, teams from RTI International (Sam Lamont and Matt Denno) and the University of Calgary (Darri Eythorsson, Cyril Thebault, and Martyn Clark) met at AWI for an intensive working session focused on weaving recent CIROH research into AWI’s fork of the NOAA Office of Water Prediction (OWP) Next Generation Water Resources Modeling Framework (nicknamed “NextGen”). James took the lead in developing the agenda, lining up the right scientific and technical expertise and ensuring that the week targeted the most critical software integration challenges. Throughout the visit, the RTI and UCalgary teams collaborated closely with AWI software engineers Quinn Lee, Josh Cunningham, hydrologic scientist Sifan A. Koriche, and James himself. The days were filled with whiteboards, deep technical conversations, and strategic planning around the future of NextGen water prediction. This recap captures the key themes and the momentum that carried through the week.

DevCon 2025: A DevOps and Cyberinfrastructure Success Story

· 3 min read
Arpita Patel
DevOps Manager and Enterprise Architect

The recent DevCon 2025 event showcased not just cutting-edge development practices, but also demonstrated how modern DevOps principles and cloud infrastructure can seamlessly support large-scale technical workshops. Our team had the privilege of providing IT infrastructure and support for over 200 attendees, creating a robust learning environment through an exemplary public-private partnership.

Image of CIROH's Research Cyberinfrastructure and DevOps team. On the left, two graphs are shown depicting usage for the Google Cloud-2i2c and Jetstream2 environments.

CIROH's Research Cyberinfrastructure and DevOps team.
Left to right, top to bottom:
Manjila Singh, Arpita Patel, Nia Minor, Trupesh Patel, James Halgren; Benjamin Lee.

DevCon 2025: Hydroinformatics and Research CyberInfrastructure Keynote

· 5 min read
Arpita Patel
DevOps Manager and Enterprise Architect

Last week, I had the incredible opportunity to co-present a keynote at the CIROH Developers Conference (DevCon 2025), which attracted over 200 attendees. This presentation, which I presented alongside Dan Ames, focused on "CIROH HydroInformatics and Research Cyberinfrastructure." It was a fantastic experience to share insights into the powerful tools and technologies that CIROH engineers, students, researchers have been developing to advance hydrological research and operations.


Pennsylvania State University Researchers Leverage CIROH Cyberinfrastructure for Advanced Hydrological Modeling

· 3 min read
Arpita Patel
DevOps Manager and Enterprise Architect
Yalan Song
Research Assistant Professor
Tadd Bindas
Graduate Researcher

Pennsylvania State University (PSU) researchers have been leveraging CIROH Cyberinfrastructure to tackle complex hydrological modeling challenges. This post highlights their innovative approach using the Wukong computing platform in conjunction with Amazon S3 bucket storage to efficiently process and analyze large-scale environmental datasets. 🚀

CIROH Cloud User Success Story

· 3 min read
Arpita Patel
DevOps Manager and Enterprise Architect

This month, we are excited to showcase two case studies that utilized our cyberinfrastructure tools and services. These case studies demonstrate how CIROH's cyberinfrastructure is being utilized to support hydrological research and operational advancements.

1. ngen-datastream and NGIAB

ngen-datastream image

CIROH Research CyberInfrastructure Update

· 2 min read
Arpita Patel
DevOps Manager and Enterprise Architect

We're excited to share some recent developments and updates from CIROH's Research CyberInfrastructure team:

Cloud Infrastructure

  • CIROH's Google Cloud Account is now fully operational and managed by our team. You can find more information here.
  • We're in the process of migrating our 2i2c JupyterHub to CIROH's Google Cloud account.
  • We've successfully deployed the Google BigQuery API (developed by BYU and Google) for NWM data in our cloud. To access this API, please contact us at ciroh-it-admin@ua.edu. Please refer to NWM BigQuery API to learn more.