-
Notifications
You must be signed in to change notification settings - Fork 0
ESGF_Architecture
The ESGF architecture is that of a global system of distributed Nodes, which interoperate which other according to a peer-to-peer paradigm. This means that there is not a rigid distinction of roles between different Nodes, rather each node can expose different services according to how it is configured, and can act as the provider or the consumer of services depending on the situation. In a peer-to-peer system, Nodes can join or leave the federation dynamically, without affecting the operations of the other Nodes. This is in stark contrast to a traditional architecture, where the server represents a single point of failure for the operations of multiple clients. There are two main characteristics that make ESGF a peer-to-peer system:
-
The modularity and configurability of the ESGF software stack, which allows each Node to expose a graduated set of services depending on the specific site requirements.
-
The establishment of federation protocols that allow the exchange of information from Node to Node on an equalitarian basis, without the existence of special central locations where the information is aggregated.
These two characteristics are described in more detail below.
A common ESGF Software Stack is deployed at each Node in the federation to provide services for data, metadata and user management. The installation can be configured to install all or part of the available services, depending on the site needs, and possibly to replicate some of the services across multiple servers at the same site. Specifically, the following _ flavors _ of ESGF Node can be installed:
- Data Node : includes services for publishing and serving data, namely:
* The ESGF Node Manager . The ESGF Node Manager is a web application that mediates the peer-to-peer interaction among all the Nodes in the federation. Its main purpose is create and expose the ESGF Registry, a document that contains critical inter-operability information such as the name and type of each Node, its available services and URL endpoints, its CA certificate, etc.
* The ESGF Publisher , and associated Postgres relational database. The ESGF Publisher is a desktop application that allows to publish data into a Node. The publishing workflow starts with extracting metadata from files on disk, storing it on the Node database, creating THREDDS XML catalogs and finally publishing the catalogs to the Node publishing service. Postgres is a popular freely available relational database that is used in ESGF to store all metadata harvested from the ESGF publisher, as well as user account information.
* The Thredds Data Server , configured with the ESGF security filters . The Thredds Data Server (TDS), developed by Unidata, represents the standard mechanism through which an ESGF Node delivers its data to the clients. The TDS includes functionality for serving data in a variety of forms and protocols: full files HTTP download, OpenDAP sub-setting, GIS products via WMS and WCS, etc. The ESGF installation procedure configures the TDS with a set of special ESGF filters that intercept any data request, and enforce the access control policies established for that dataset by interacting with the appropriate ESGF Security Services deployed throughout the federation.
* The ESGF Security Services . The ESGF Security framework includes functionality for distributed access control throughout the federation. It is composed of _ client-side _ components (the access filters and Openid Relying Party) that protect access to the data, and _ server-side _ components (the Attribute and Authorization services) that can be queried to gather all necessary information to make an access control decision. The framework supports access both by browsers (via OpenID authentication), and desktop clients and libraries (via X509 certificates).
* The GridFTP server. The GridFTP server, developed by the Globus alliance, is a high performance protocol for reliable data transfer. It includes a server, deployed on an ESGF Node, and a client-side library that the user must deploy on their desktop.
- IdP Node : includes services for authenticating users:
* The OpenID Identity Provider web application. The OpenID Identity Provider (IdP) allows users to register and authenticate with the system, including Single-Sign-On functionality for browser-based access throughout the federation.
* The Globus SimpleCA and My Proxy server. The My Proxy server, developed by NCSA, is used to issue short term certificates that can be used by client libraries and toolkits to authenticate the user during a data product request. The certificates are signed by the locally installed Globus Simple Certificate Authority (CA).
- Index Node : includes the applications necessary to index and search metadata:
* The Apache Solr engine. Apache Solr is a high performance, scalable web application for storing and searching metadata.
* The ESGF Search back-end services. The ESGF Search module includes facilities for harvesting external metadata repositories (such as the THREDDS XML catalogs produced by the ESGF Publisher), and for searching the distributed metadata indexes deployed within the federation.
* The ESGF Web Portal application. The ESGF Web Portal is a web application that contains the user interface to many of the other ESGF modules. It exposes web pages for registering users, searching for data, downloading data etc.
- Compute Node : includes services for data analysis and visualization, namely:
* The Live Access Server . The Live Access Server (LAS), developed by NOAA/PMEL, is an analysis and visualization engine that allows users to request advanced data and imaging products from multiple ESGF Nodes at once. Internally, it relies on the TDS catalogs and OpenDAP services for configuration and remote data access. It can be configured with a pluggable visualization engine such as Ferret (the default), NCL or CDAT.
- Future modules :
* ESGF Dashboard . The ESGF Dashboard is a web application intended for system administrators to monitor the status of all services deployed at each Node.
Interoperability among all Nodes in the ESGF federation is based on a peer-to- peer paradigm for exchanging information about services, trusts, and metadata holdings. Specifically, the following protocols and mechanism make all the Nodes in the federation work together as a whole:
-
The ESGF Registry . The ESGF Registry contains all relevant information about each Node in the federation: its type, the URL endpoints of the services it exposes, its public certificates, and so on. This information is not kept in a central location, rather it is continually exchanged among all Nodes so that each Node always has a local up-to-date copy of the state of the whole federation.
-
Single-Sign-On . Because all ESGF Nodes trust each other's certificate authorities, a user can register and authenticate at any of the Nodes, and be granted credentials that are honored throughout the federation. The type of credentials granted depends on how the user is accessing the system:
* if using a web browser, the _ OpenID _ protocol is used to exchange authentication information between the site where the user authenticates, and any other site
* if using a desktop client, an _ X509 _ short term certificate is transmitted by the client to any server that requests the user to authenticate
-
Distributed Access Control . The data served from each Node may need to be protected by policies that are administered at another Node. The ESGF security infrastructure supports this model by establishing mutual trust among all the constituents Nodes, and by transmitting security information (Attribute and Authorization statements) as signed documents encoded as SAML (The Security Assertion Markup Language).
-
Metadata Exchange . All Nodes in the federation continually exchange search and discovery metadata about their data holdings. As a consequence, when users initiate a search at any one site, they are able to discover resources of interest through the whole federation.
Traditionally, the data and metadata services deployed throughout the ESGF system have been made available to users through a standard web browser. Increasingly though the ESGF collaboration is working towards enabling direct access to these services via rich desktop clients and toolkits, which allow scripted and more powerful access. Specifically, the following clients are being developed.
-
UV-CDAT . UV-CDAT is a high-performace visualization client that allows the user to query the ESGF data catalogs via any metadata category, and either download the selected files, or create visualization plots.
-
Data Mover Light . Data Mover Light (DML) is a high performance desktop client that allows bulk download of data files via either HTTP or GridFTP.
-
Climate Data Exchange . The Climate Data Exchange (CDX) is the combination of server-side components and a client-side toolkit library that work together to expose the ESGF distributed data holdings as if it was a local file system, and to issue data processing commands that are executed on the servers, returning only the data results to the user desktop.

