Skip to content
Matthew Harris edited this page Oct 9, 2013 · 6 revisions

ESG Gateway and ESGF p2p Node Interoperability

The ESGF p2p Node is an evolution of the ESG Data Node based on an improved architecture that is more modular, supports peer-to-peer communication, and contains expanded functionality to support observational data. Any existing ESG Data Node can be upgraded to become a p2p ESGF Node, and be operated within a federation that contains other Nodes and Gateways. Specifically, the following functionality allows interoperability (based on standards such as OpenID, SAML and OpenDAP) among all components of the Earth System Grid Federation:

  • Users can register at either a p2p Node or a Gateway, and through mutual trust and white-listing, they can use their assigned OpenID to log-in at any other site in the federation, including Data Nodes running an OpenID Relying Party to secure data.

ESGF_Interoperability_2.png

  • A data provider can publish data simultaneously to a Gateway and a Node, thus allowing a user to start a search at either interface. _ Double publishing _ can be accomplished in two ways:

    • Because the p2p Node exposes the same publishing API employed by the ESGF Publisher and the Gateway, the data provider may decide to run the last step of the ESG publisher twice, pointing to a Gateway first and a p2p Node later - only the publishing service URL needs to change in between the two steps.
    • Alternatively, the data provider can use the p2p Node software to harvest the full hierarchy of THREDDS catalogs in one step, once they have been generated by any method and are being served by the TDS. As a consequence, a single Data Node can serve data that has been published to a Gateway and/or a p2p Node, in any combination.

ESGF_Interoperability_1.png

  • Note: at this time, there is no direct exchanged of metadata records between Gateways and p2p Nodes: metadata must be published to both systems independently. In the future, metadata exchange mechanisms can be built if necessary, for example based on a modular OAI component.
  • Both the Node and the Gateway use the same ESGF software module ("esgf-security") for the Attribute and Authorization Services, meaning that the services deployed at a Node can query (through SAML/SOAP) the services deployed at a Gateway, and vice-versa. This guarantees that the same data can be accessed from a Node wether a user has registered at a Node or a Gateway, with the proper access control, and recording the proper usage. In particular, the following use cases are possible:
    • The Authorization filter at a Data Node queries the Authorization Service at a Gateway, which retrieves the user attributes from the local database, or another Gateway Attribute Service (this is the traditional use case within ESG system).
    • The Authorization Filter at a Data Node queries the Authorization Service at the same p2p Node, which retrieves user access control attributes from a p2p Attribute Service.
    • The Authorization filter at a Data Node queries the Authorization Service at the same p2p Node, which retrieves user access control attributes from a Gateway Attribute Service. For example, the following workflow may take place:
      • Data is published to an ESGF p2p Node and protected with the "CMIP5 Research" attribute.
      • A user registers a the PCMDI Gateway for the "CMIP5 Research" attribute
      • User requests data from the ESGF Data Node
      • The ESGF Data Node Authentication Filter redirects the user to the Data Node ORP, where the user enters the PCMDI Gateway openid.
      • The ESGF Data Node Authorization filter requests authorization from the ESGF p2p Node Authorization Service, which consults its local policy file and retrieves the user attributes from the PCMDI Gateway.
    • The Authorization filter at a Data Node queries the Authorization Service at a Gateway, which retrieves the user attributes from a p2p Attribute Service. Additionally, the Authorization filter at a Data Node can be configured to query multiple Authorization Services sequentially, so that all the use cases discussed previously are not exclusive: a single Data Node can enforce access control to different collections via all of the previous workflows simultaneously.

ESGF_Interoperability_3.png

  • Both a Node and a Gateway publish their configuration information to a federation wide registry, and query the same registry to import information about their peers.

The following diagram summarizes the interoperability of services and operations between a p2p Node and a Gateway. ESGF_Integration.png


Gateway -> P2P User Migration

With every installation of and IDP P2P Node there is a small tool that allows for the migration of users from the Gateway to the ESGF (IDP) P2P Node. The tool is a script named esgf-user-migrate located in /usr/local/bin.


Preparation

Before you can run this program you must be able to communicate from the P2P IDP host ( _ target _ ), where you will be running this migration code, to the Gateway postgres database ( _ source _ ). To make sure your Gateway database is available to be connected to, make sure you have the following entries in the following postgres configuration files:

  1. Allow connections to the database from IDP host...

Postgres configuration file pg_hba.conf (usu. found in /usr/local/psql/data/). Create an entry for the IP address of the ESGF P2P IDP Node - to allow ingress connections.

# host    all         all         128.117.0.0/16        md5
host    all         all         <ip address of idp node>/32    md5

#Examples:
#host    all         all         128.115.57.35/32      md5
#host    all         all         198.128.145.129/32    md5

Also be sure you have an account on the source (gateway) database that allows read access to the tables in the database "gateway-esg".

  1. Bind the postgres server to an IP address accessible from the IDP node

Postgres configuration file postgres.conf (usu. found in /usr/local/psql/data/). Create an entry for the IP address of the gateway's available IP address .

listen_addresses = 'localhost,<gateway's open ip address>'      
port = 5432                             # (change requires restart)

Example:
#listen_addresses = 'localhost,198.128.145.243' 
#port = 5432                            # (change requires restart)

NOTE: The corresponding files on the P2P node should NOT be edited! There are serious security implications that may render the node's host vulnerable!

Okay, now your gateway's database is able to be connected to from the ESGF P2P IDP node! :-)


Using The Migration Tool

To run the program simply, login to your ESGF P2P IDP Node and issue the following (providing the credentials setup from the previous step):

%> esgf-user-migrate -U <username> -h <fqdn of gateway host> -p <port of gateway database> [return]
Then Enter password at prompt

_ A note regarding "transition". For the sake of a 'layered' transition strategy is may be desirable for the IDP node to support the gateway's OpenIDs for a short transition window. If this is desired set the environment variable "verbatim_migration" to "true" before running the migration tool. After the transition window closes, accounts with the gateway openid scheme should be wholly and completely purged from the IDP database. _

_ Alternatively, your organization my just perform the migration and inform users of their new openid ( http://<idp host>/esgf- idp/openid/) thus not incurring the cost of post transition window clean up and less overall confusion (IMHO). _ -gavin

There is a --help option to the command that describes the options available:

%> esgf-user-migrate --help                              

   Usage:

      -----------------------------------------------------
      > esgf-user-migrate [--help] [--force] [--devel] [--verbose] [--debug] [-p <port>] [-d <database>] -U <username> -h <host> 
      (enter password at prompt)
      -----------------------------------------------------

      (required args)
      -U - username of source database containing user credentials to migrate
      -h - fully qualified domain name of host running database

      (optional args)
      -p - port on which that database is listening (default 5432)
      -d - database name (default gateway-esg)

      --devel   - pull wrapped jar file from development distribution area
      --force   - force download of wrapped jar into tools dir regardless of checksum validation
      --verbose - provide more output
      --debug   - provide debug output
      --help    - this usage output
      --version - version information

      Ex:
      %> esgf-user-migrate --verbose -U gavin -h pcmdi3.llnl.gov

      (be sure that the source db exposes the port to this host: check pg_hba.conf file for details)

The program will connect to the gateway database and immediately begin migrating over users. In seconds the operation is complete. All users, groups, group registrants, and roles are transferred over to the ESGF IDP P2P Node where you are executing this program.

First roles are migrated:

Migrated role #1: none
Migrated role #2: default
Migrated role #3: publisher
Migrated role #4: admin
Migrated role #5: super
[INFO] esg.node.util.migrate.UserMigrationTool: Migrated [5] role records

Then groups are migrated:

Migrated group #1: Nobody
Migrated group #2: Guest
Migrated group #3: User
Migrated group #5: CCSM
Migrated group #6: PCM
Migrated group #7: NARCCAP
...
Migrated group #33: IAP-LASG
[INFO] esg.node.util.migrate.UserMigrationTool: Migrated [33] group records

Then users are migrated:

Migrated User #1: guest --> https://pcmdi9.llnl.gov/esgf-idp/openid/guest
Migrated User #2: testUser --> https://pcmdi9.llnl.gov/esgf-idp/openid/testUser
Migrated User #3: c.ward --> https://pcmdi9.llnl.gov/esgf-idp/openid/c.ward
Migrated User #4: fwang2 --> https://pcmdi9.llnl.gov/esgf-idp/openid/fwang2
Migrated User #5: tpmaxwel --> https://pcmdi9.llnl.gov/esgf-idp/openid/tpmaxwel
Migrated User #6: aez900 --> https://pcmdi9.llnl.gov/esgf-idp/openid/aez900
Migrated User #7: rgmiller --> https://pcmdi9.llnl.gov/esgf-idp/openid/rgmiller
...
Migrated User #1887: rentong228 --> https://pcmdi9.llnl.gov/esgf-idp/openid/rentong228
Migrated User #1888: mxchen --> https://pcmdi9.llnl.gov/esgf-idp/openid/mxchen
Migrated User #1889: rcerezo --> https://pcmdi9.llnl.gov/esgf-idp/openid/rcerezo
Migrated User #1890: seonhwa --> https://pcmdi9.llnl.gov/esgf-idp/openid/seonhwa
[INFO] esg.node.util.migrate.UserMigrationTool: Migrated [1890] user records

Then finally permissions are migrated:

Migrated Permission #58: [Angelet_1221] [CMIP5 Research] [default]
Migrated Permission #59: [k-ishihara] [CMIP5 Research] [default]
Migrated Permission #60: [chenliang] [CMIP5 Research] [default]
...
Migrated Permission #180: [seonhwa] [user] [default]
Migrated Permission #181: [seonhwa] [CMIP5 Research] [default]
[INFO] esg.node.util.migrate.UserMigrationTool: Migrated [181] permission records

All the OpenIDs now reference this new host you are currently on.
Passwords are preserved . Users don't have to change their passwords. Also registrants to groups won't have to re-register.

Clone this wiki locally