Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,4 @@ cython_debug/

# PyPI configuration file
.pypirc
.Rproj.user
106 changes: 62 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,91 @@
# gen3-metadata
User friendly tools for downloading and manipulating gen3 metadata


## 1. Set up python venv
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

## 2. Create config file
## Python Installation
```bash
echo credentials_path=\"/path/to/credentials.json\" > .env
git clone https://github.com/AustralianBioCommons/gen3-metadata.git
bash build.sh
```

## 3. Load library
```bash
pip install -e .
```


## 4. Usage Example
## Usage Example
- Notebook can be found in the `example_notebook.ipynb` file
- Make sure to select .venv as the kernel in the notebook

```python
import os
from gen3_metadata.parser import Gen3MetadataParser

# Set up credentials path
key_file = os.getenv('credentials_path')
from gen3_metadata.gen3_metadata_parser import Gen3MetadataParser

# Initialize the Gen3MetadataParser
# Initialise
key_file = "path/to/credentials.json"
gen3metadata = Gen3MetadataParser(key_file)

# authenticate
# Authenticate
gen3metadata.authenticate()

# Fetch data for different categories
gen3metadata.fetch_data("program1", "AusDiab_Simulated", "subject")
gen3metadata.fetch_data("program1", "AusDiab_Simulated", "demographic")
gen3metadata.fetch_data("program1", "AusDiab_Simulated", "medical_history")
# Fetching data and returning as dataframe
program_name = "program1"
project_code = "project1"
node_label="medical_history"
pd_data = gen3metadata.fetch_data_pd(program_name, project_code, node_label=node_label)
pd_data

# Fetching data and returning as json
json_data = gen3metadata.fetch_data_json(program_name, project_code, node_label=node_label)
json_data
```

# Convert fetched data to a pandas DataFrame
gen3metadata.data_to_pd()

# Print the keys of the data sets that have been fetched
print(gen3metadata.data_store.keys())
## Running Tests

# Return a json of one of the datasets
gen3metadata.data_store["program1/AusDiab_Simulated/subject"]
The tests are written using the `pytest` framework.

# Return the pandas dataframe of one of the datasets
gen3metadata.data_store_pd["program1/AusDiab_Simulated/subject"]
```bash
pytest -vv tests/
```

The fetched data is stored in a dictionary within the `Gen3MetadataParser` instance.
Each category of data fetched is stored as a key-value pair in this dictionary,
where the key is the category name and the value is the corresponding data.
This allows for easy access and manipulation of the data after it has been fetched.
---

# Installation of the R version of gen3-metadata

You can install the gen3metadata R tool from
[GitHub](https://github.com/) with:

``` r
if (!require("devtools")) install.packages("devtools")
devtools::install_github("AustralianBioCommons/gen3-metadata", subdir = "gen3metadata-R")
```

## 5. Running Tests
The package depends on several other packages, which should hopefully be installed automatically.
If case this doesn't happen, run:
``` r
install.packages(c("httr", "jsonlite", "jose", "glue"))
```

The tests are written using the `pytest` framework.
Then all you need to do is load the package.

```bash
pytest tests/
``` r
library("gen3metadata")
```

## Usage Example

This is a basic example to authenticate and load some data.

``` r
# Load the library
library("gen3metadata")

# Set the path to the credentials file
key_file_path <- "path/to/credentials.json"

# Create the Gen3 Metadata Parser object
gen3 <- Gen3MetadataParser(key_file_path)

# Authenticate the object
gen3 <- authenticate(gen3)

# Load some data
dat <- fetch_data(gen3,
program_name = "program1",
project_code = "AusDiab",
node_label = "subject")
```
7 changes: 7 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

#!/bin/bash
python -m venv .venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
86 changes: 86 additions & 0 deletions example_notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example Workflow\n",
"- Make sure to run `bash build.sh` or follow the instructions in the README.md to build the package\n",
"- Make sure to select .venv as the kernel in the notebook"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from gen3_metadata.gen3_metadata_parser import Gen3MetadataParser"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Initialise\n",
"key_file = \"path/to/credentials.json\"\n",
"gen3metadata = Gen3MetadataParser(key_file)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Authenticate\n",
"gen3metadata.authenticate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# fetching data and returning as dataframe\n",
"program_name= \"program1\"\n",
"project_code= \"AusDiab_Simulated\"\n",
"gen3metadata.fetch_data_pd(program_name, project_code, node_label= \"medical_history\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# fetching data and returning as json\n",
"gen3metadata.fetch_data_json(program_name, project_code, node_label= \"medical_history\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
24 changes: 24 additions & 0 deletions gen3metadata-R/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Package: gen3metadata
Type: Package
Title: Gen3 metadata tools
Version: 0.1.0
Author: Corey Giles [aut, cre]
Authors@R: c(person("Corey", "Giles", role = c("aut", "cre"),
email = "Corey.Giles@Baker.edu.au",
comment = c(ORCID = "0000-0002-6050-1259")))
Maintainer: Corey Giles <Corey.Giles@Baker.edu.au>
Description: User friendly tools for downloading and manipulating gen3 metadata
License: GPL-3
Encoding: UTF-8
LazyData: true
Language: en-AU
RoxygenNote: 7.3.2
Imports:
httr,
jsonlite,
jose,
glue
Suggests:
testthat (>= 3.0.0),
webmockr
Config/testthat/edition: 3
16 changes: 16 additions & 0 deletions gen3metadata-R/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Generated by roxygen2: do not edit by hand

S3method(authenticate,gen3_metadata)
S3method(fetch_data,gen3_metadata)
S3method(print,gen3_metadata)
export(Gen3MetadataParser)
export(authenticate)
export(fetch_data)
importFrom(glue,glue)
importFrom(httr,GET)
importFrom(httr,POST)
importFrom(httr,add_headers)
importFrom(httr,content)
importFrom(httr,http_error)
importFrom(jose,jwt_split)
importFrom(jsonlite,fromJSON)
40 changes: 40 additions & 0 deletions gen3metadata-R/R/gen3_metadata.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#' Create a Gen3 metadata parser object
#'
#' This function creates a new Gen3 metadata parser object by loading
#' credentials from a key file. Use this object to interact with the Gen3 API.
#'
#' @param key_file Character string path to the JSON key file containing Gen3 credentials
#'
#' @return A gen3_metadata object with credentials and base URL configured
#'
#' @export
Gen3MetadataParser <- function(key_file) {

# Create the object to store data
obj <- list(
key_file = key_file,
base_url = "",
credentials = list(
api_key = "",
key_id = ""
),
header = NULL
)

# Load the key file
creds <- load_key_file(key_file)

# Set the credentials in the object
obj$credentials$api_key <- creds$api_key
obj$credentials$key_id <- creds$key_id

# Get the base URL from the API key
obj$base_url <- get_base_url(obj$credentials$api_key)

# Set the class of the object
class(obj) <- "gen3_metadata"

# Return the object
return(obj)

}
31 changes: 31 additions & 0 deletions gen3metadata-R/R/generics.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#' Authenticate with Gen3 API
#'
#' Generic function to authenticate a gen3_metadata object with the Gen3 API
#' and obtain an access token for subsequent requests.
#'
#' @param gen3_metadata A gen3_metadata object
#'
#' @return The authenticated gen3_metadata object (invisibly)
#'
#' @export
authenticate <- function(gen3_metadata) {
UseMethod("authenticate")
}

#' Fetch data from Gen3 API
#'
#' Generic function to fetch data from a specific node in the Gen3 submission API
#' for a given program and project.
#'
#' @param gen3_metadata An authenticated gen3_metadata object
#' @param program_name Character string name of the program
#' @param project_code Character string code of the project
#' @param node_label Character string label of the node to fetch data from
#' @param api_version Character string API version (default: "v0")
#'
#' @return Data frame containing the fetched data
#'
#' @export
fetch_data <- function(gen3_metadata, program_name, project_code, node_label, api_version) {
UseMethod("fetch_data")
}
31 changes: 31 additions & 0 deletions gen3metadata-R/R/get_base_url.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#' Extract base URL from Gen3 API key
#'
#' This function extracts the base URL from a Gen3 API key JWT token
#' by parsing the 'iss' (issuer) field from the payload.
#'
#' @param api_key Character string containing the Gen3 API key (JWT token)
#'
#' @return Character string containing the base URL
#'
#' @importFrom jose jwt_split
get_base_url <- function(api_key) {

# Check if the API key is provided
if (is.null(api_key) || api_key == "") {
stop("API key must be provided.")
}

# Extract the payload from the JWT
payload <- jose::jwt_split(api_key)$payload

# Validate the payload
if (!"iss" %in% names(payload)) {
stop("The JWT payload must contain 'iss'.")
}

# Extract the base URL from the payload
base_url <- sub("/user$", "", payload$iss)

# Return the base url
return(base_url)
}
Loading