Contains the solr schema and synonym files we use
Here are the fields that we are using, with some commentary where needed:
LayerId should be a unique id. We prefix the full layer name with an institution identifier, so that it is computable. It just makes things a little easier if we need to replace the layer. You can use whatever scheme seems appropriate to you, as long as the id is unique to the index and is confined to alphanumerics, underscores, and periods. <field name="ExternalLayerId" type="string" indexed="true" stored="true" multiValued="false”/> This value is meant for associating the record with an existing library catalog/opac.
<!-- the esri name of the layer used by OpenLayer when it requests the layer from GeoServer
comes from the FGDC tag ftname -->
<field name="Name" type="text_en" indexed="true" stored="true" multiValued="false”/>
‘Name’ should match up with the layer name in GeoServer, not including the workspace name.
<!-- some layers are part of a large collection, e.g., City of Cambridge 2009
currently not set, Solr has no concept of collections -->
<field name="CollectionId" type="text_en" indexed="true" stored="true" multiValued="false”/>
not currently used, but if your layer is part of a collection, you might want to set it for future use.
<!-- the name of the institution holding the actual data, such as Tufts
this string is set by the FGDC to Solr translation code -->
<field name="Institution" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- duplicate of the above that isn't tokenized. this field can be used as the sort field -->
<field name="InstitutionSort" type="string" indexed="true" stored="false" multiValued="false”/>
“Institution” should be a single word representation of your institution’s name. InstitutionSort is a copy field, so it gets set for you.
<!-- either Public or Restricted, contols if non-logged users can see the data
currently this fields is not set we plan to compute it based on the FGDC tag useconst (the user constraint)-->
<field name="Access" type="text_en" indexed="true" stored="true" multiValued="false”/>
<!-- the type of data the layer holds: Point, Vector, Polygon, Raster
it is computed from the FGDC tags direct, sdtstype and srccitea -->
<field name="DataType" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- duplicate of the above that isn't tokenized. this field can be used as the sort field -->
<field name="DataTypeSort" type="string" indexed="true" stored="false" multiValued="false"/>
<!-- is the layer on-line or off-line (e.g., on a DVD)
currently this isn't being set -->
<field name="Availability" type="text_en" indexed="true" stored="true" multiValued="false”/>
values are “online” and “offline”.
<!-- the text displayed to the user. from the FGDC tag title-->
<field name="LayerDisplayName" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- duplicate of the above that isn't tokenized. this field can be used as the sort field -->
<field name="LayerDisplayNameSort" type="string" indexed="true" stored="false" multiValued="false"/>
<!-- matches without using synonyms -->
<field name="LayerDisplayNameSynonyms" type="text_en_synonymsStateLcsh" indexed="true" stored="false" multiValued="false”/>
“LayerDisplayName” is the title for the layer. “Sort” and “Synonyms” are copy fields, so they are set for you at index time.
<!-- from the FGDC tag publish -->
<field name="Publisher" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- duplicate of the above that isn't tokenized. this field can be used as the sort field -->
<field name="PublisherSort" type="string" indexed="true" stored="false" multiValued="false”/>
Again, “Sort” is a copy field.
<!-- from the FGDC tag origin -->
<field name="Originator" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- duplicate of the above that isn't tokenized. this field can be used as the sort field -->
<field name="OriginatorSort" type="string" indexed="true" stored="true" multiValued="false”/>
“Originator” is the author of the data. Can be a person or organization. Default to the first or primary author if there are multiple.
<!-- this field contains a JSON hashtable that roughly equates to protocolName: serviceEndpoint -->
<field name="Location" type="string" indexed="true" stored="true" multiValued="false"/>
Location should be a JSON string. “wms”: [“http://wmsendpoint”] //accepts a list for client-side load-balancing “wfs”: “http://wfsendpoint” //by default, OGP uses wfs for downloading vector data “wcs”: “http://wcsendpoint” //by default OGP uses wcs for downloading raster data "tilecache”: [“http://tilecacheendpoint”] //accepts a list; most tile caches are not full wms proxies, so we need both links “serviceStart”: “http://servicestartendpoint” //Harvard runs a service that starts wms and wfs services “fileDownload”: “http://filedownload.zip” //if you have a link to a zip file containing your data, you can put it here
<!-- from the FGDC tag abstract -->
<field name="Abstract" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- a string containing all the keywords from the FGDC tag themekey.
note that this is not a multi-valued Solr field -->
<field name="ThemeKeywords" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- matches without using synonyms -->
<field name="ThemeKeywordsExact" type="text_ws" indexed="true" stored="false" multiValued="false"/>
<!-- used to match with synonyms derived from ISO keywords -->
<field name="ThemeKeywordsSynonymsLcsh" type="text_en_synonymsLcsh" indexed="true" stored="false" multiValued="false"/>
<field name="ThemeKeywordsSynonymsIso" type="text_en_synonymsIso" indexed="true" stored="false" multiValued="false"/
<!-- a string containing all the keywords form the FGDC tag placekey
note that this is not a multi-valued Solr field -->
<field name="PlaceKeywords" type="text_en" indexed="true" stored="true" multiValued="false"/>
<field name="PlaceKeywordsSynonyms" type="text_en_synonymsState" indexed="true" stored="true" multiValued="false"/>
the “ThemeKeywords” and “PlaceKeywords” fields should be space delimited lists. The next version of the schema will likely allow multivalued fields. The rest of the keyword fields are copy fields.
<!-- lat/lon bounding box stored in degrees. -->
<!-- from the FGDC tag southbc -->
<field name="MinY" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<!-- from the FGDC tag northbc -->
<field name="MaxY" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<!-- from the FGDC tag westbc -->
<field name="MinX" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<!-- from the FGDC tag eastbc -->
<field name="MaxX" type="tdouble" indexed="true" stored="true" multiValued="false”/>
decimal degrees, lat/lon values.
<!-- the following numeric fields are computed by the FGDC to Solr code.
They are computed from the above bounding box information and stored in degrees.
They are used by the geospatial filters.-->
<field name="CenterX" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<field name="CenterY" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<field name="HalfWidth" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<field name="HalfHeight" type="tdouble" indexed="true" stored="true" multiValued="false"/>
<field name="Area" type="tdouble" indexed="true" stored="true" multiValued="false"/>
If you’re using your own script/code to generate solr documents, you’ll have to calculate these fields to allow spatial search.
<!-- computed from the FGDC tag caldate (if available) or begdate. the default value is year 1-->
<field name="ContentDate" type="tdate" indexed="true" stored="true" default="NOW" multiValued="false"/>
<!-- the projection code is passed to GeoServer-->
<field name="SrsProjectionCode" type="text_en" indexed="true" stored="true" multiValued="false"/>
<!-- the name of the ESRI workspace the layer resides in. it may be needed
by client side code. Currently not used. —>
not currently used, but enter it if you have it.
<field name="WorkspaceName" type="text_en" indexed="false" stored="true" multiValued="false”/>
This is the GeoServer workspace name. If you don’t have one, enter it as an empty string.
<!-- has this field been accurately georeferenced and can it be laid over top of a basemap -->
<field name="GeoReferenced" type="boolean" indexed="false" stored="true" multiValued="false" default="true"/>
<!-- the complete text from the FGDC file. it is not indexed by Solr but retreived from Solr
when the user requests to see the FGDC file -->
<field name="FgdcText" type="string" indexed="false" stored="true" multiValued="false"/>
“FgdcText” should really be called “MetadataText”. It should contain the full metadata document as a string value.