Skip to content

Metadata: Making Changes

Laura Bookman edited this page Jul 1, 2015 · 5 revisions

Workflow: Sites

Workflow will be decided based on quality and source of metadata being modified.<br>

Using the Sites

State Submissions on stategeothermaldata.org

  • Items are submitted at the State Geothermal Data Submission Site
  • Select Member files
  • Update and make edits to the appropriate state and year folder here: $DIR\SubContractors (or directory you know files belong)
  • Create or update Task
  • Create or update an entry at the Django Admin site
  • return and Un-publish the submitted memeber file in the first step at the State Geothermal Data Submission Site.

Processing Submitted Metadata

  • Using the USGIN Task Site
  • Log into as your user and view 'My Assigned Tasks'
  • Select item and Open
  • Follow instructions and review comments within task by other users
  • If your task is to harvest and close out a metadata compilation navigate to the items repository page, linked in the task, and download Metadata Compilation excel/zipped file. You can also grab the latest version off the network.

Harvesting Metadata to the AASG Repository

Harvesting in bulk to the repository will require a CSV or EXCEL file be in the Geothermal Metadata Compilation Content Model with all required fields populated.

  • Review the obtained metadata compilation and remove any blank/empty rows, saving just the ‘metadata template ‘as a CSV file for processing
  • The description field must contain at least 50 characters or it will not publish on the repository site.
  • You can make further edits to the csv file by –stuffing csv file to cushions application or using OpenRefine.
  • When edits are done, place the CSV online in the TEMP-CSV folder WAF
  • Log on by using WinSCP, putty, or any preferred connector.
  • After metadata CSV is online navigate to the WAF using your browser and copy the link to the online csv file.

NOTE : If the metadata you are entering already has a compilation collection made, review the repository page describing the compilation and remember what other collections that resource lives in; you will need to add collections during harvest process. If there is not a compilation collection made (usually the name of the metadata compilation file originally submitted), create one in which to house these resources, under the appropriate Source Organization.

  • Log in to AASG Repository and select ‘Harvest Remote Metadata' tab from the navigation bar below header
  • Select 'CSV’ bullet
  • CORKY! : 'Add harvested metadata to a collection, the text box will require you enter items using only your keyboard; type and use your arrow keys then the enter/return key to select. Mouse clicks will not add collections to all of the resources.
  • 'Enter the metadata URL' requires the URL to your CSV file (not entire waf), right click and paste the link you copied earlier.
  • Click Harvest Button

NOTE: the page will spin for several minutes depending on the size of the csv file. You can use the development tools in chrome to watch the network for any problems during this process. DO NOT attempt to go back during the harvest process OR after you have been redirected to the list of resources harvested.

Publish Metadata in AASG Repository

Check that Metadata has been harvested and publish all new resources.

  • Navigate to the admin DJango site for the AASG Repository and State Tracking
  • Select a collection you harvested the csv file to
  • From the action box select 'Mark selected resources as published'
  • Check the small box next to bold "title' text to select all the resources and publish them in bulk
  • Select go button
  • Double check that all resources have been validated and published by reviewing the buttons under the published column.
  • These all should be green with a white check mark if you see red with an X, select 'Edit Resource'
  • Re-Validate your metadata and make the changes necessary for validation, select ‘publish’, and save document.

NOTE Do not delete items directly through the DJango admin site. Delete items manually or by using node in the couchdb (delete whole collection, by id, or all unpublished).

CSV to XML

This awesome python tool can be found on the usgin/csvtometadata Github repository.

Running the CSV to XML tool

  • Make sure you have all libraries and tools necessary to run module.
  • The CSV tool will only work with the [USGIN Geothermal Metadata Compilation] (http://schemas.usgin.org/models/#Metadata). All validation requirements apply.
  • Make sure you have entered unique metadata identifiers or are using identifiers already assigned to the resources.
  • Use This excel script to generate unique GUIDS
  • Create a new ‘output’ folder for your generated xml files
  • Using an editor (preferable IDLE or command line) open runcsvtoxml.py
  • Enter the path to the csv file you want to process and then enter the path to your new ‘output’ folder
  • From the IDLE window press F5 to run.
  • Using command line, navigate to csvtometadata directory and type ‘python runcsvtoxml.py’ to run.

Valid XML to WAF

  • Collect the valid xml files and move them onto a web accessible folder (WAF)
  • Windows users can log on to WAF Directory using WinSCP, putty, or any preferred connector.
  • Create a new directory and assign ‘resource’ user as the owner.
  • Harvest WAF into preferred catalog.

Bulk Updating in CouchDb

Our State CouchDB instance holds all of the AASG Repository items and because these two systems are linked it is convenient for bulk editing to use couchdb instead of modifying individual resources in the repository. Bulk updates can apply to almost all of the information held within couchdB, allowing the user to change items such as contact information for a particular organization, adding keywords, or even modifying urls. When edits are made to resources in Couchdb, these changes are immediately reflected in the AASG Repository.

  • You will need to be familiar with basic JavaScript in order to make changes. You will also need access to the AZGS Network to view the State Couchdb instance.
  • Creating and understanding views in couchdb
  • Follow these guidelines if AZGS development
  • Replicate the records database in local or live instance before making large changes.
  • Create a view in ‘records’ database that applies to the metadata you'll want to bulk update.
  • Using node.js, write a script that calls a view in couchdb and changes/adds according to key value pairs.

Geoportal

AASG Repository Metadata is harvested into both the geothermal catalog and the State Catalog for staging purposes. All metadata is eventually harvested into the USGIN Catalog. The Geothermal Catalog is the only Catalog to get harvested into NGDS.

Creating a Harvest Source in Geoportal

  • First, understand where you are harvesting from: WAF, CSW, THREDDS Server, ArcGIS Rest services, etc.
  • If you are harvesting XML from a Web Accessible Folder (WAF) you will need the URL to the directory containing the xml files.
  • Harvesting from a CSW endpoint requires the full capabilities URL. You will also need to select the correct profile from the drop down menu provided.
  • Harvesting ArcGIS ESRI Services requires you provide the REST and SOAP Urls.
  • Harvesting from a THREDDS Server catalog requires the catalog.xml URL.
  • In the Admin interface, select the ‘Add’ text under the Geoportal navigation bar.
  • Select the ‘Register resource on the network’ bulleted item and Proceed
  • Select an option from the listed Protocol Types
  • Enter URLs required and provide a useful title
  • Read down and select or deselect any of the options listed and save.

Synching Metadata into the Geoportal

  • After you have created a harvest source in the Geoportal and any time after updating items in other repositories and catalogs linked to Geoportal – you need to sync.

  • You have the option, when you create a harvest source, to set up an automatic sync for every day, once a week, once a month, etc.

  • New harvest sources need to be approved before you can sync any new metadata into the Geoportal catalog.

  • Return to Administration view

  • If you do not see the newly added item first on the list, select an item from the protocol drop down menu and search.

  • Select the check box to the left of the new items title, select approve from the menu, and execute action.

  • Now that the item is approved mouse over and select the ‘synchronize content’ icon with blue arrows.

  • You can watch the process by selecting ‘show document acquired from this repository.’ The process will take longer, the more records there are to harvest.

  • When there is no longer a red X labeling the harvest source, check the history to obtain information on all harvest jobs. This can be a helpful tool when validating metadata.

Updating Metadata in the Geoportal

When updating metadata in the Geoportal, know where the metadata is being harvested from. You can select the harvest sources and obtain the URL if needed.

  • WAFS & XML - If the waf where the resource is housed is available on the server you can either choose to reprocess the original csv with changes and update the WAF or open the XML file from putty/ WinSCP and make changes directly to the xml.

Note If you are having trouble finding the resource within the WAF, open up the Geoportal xml document for that resource and copy the fileIdentifier, navigate to waf directory in browser and select Ctrl+F, plug in the identifier and you should get a highlighted resource. Now you should be able to find the resource on the server.

  • When all updates have been made to an outside source and the Catalog needs to be updated with the changes, return to the Administration page, select a protocol type, select your harvest source, and then select the ‘synchronize content’ icon with blue arrows.
  • This process should be the same for all Geoportal instances

NGDS CKAN CATALOG

  • You will need to be an admin for your Node in order to create a new harvest source.
  • Login as admin and select harvested source
  • choose to harvest from a new source; this can either be a CSW or NGDS Ckan to Ckan harvest Source.
  • add description and an icon for the organization hosting the CSW
  • Let the harvester finish before trying to add any additional harvest sources.

NOTE The harvester does not have a max or minimum amount of time in which it needs to finish, if you are curious you can check on the progress by selecting the job status.