Skip to content

sbthandras/tailor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tailor

status R build status Coverage DOI

tailor is a Tool for Adapter-domain Identification and Linking Of RBPs.

Installation

Install from Github (development version)

install.packages("devtools")
devtools::install_github("sbthandras/tailor")

Usage

Identify shared adapter domain between two RBPs

# load package
library(tailor)

# load example data set
data(rbps)

# align first two RBPs
ps <- position_scores(
  pattern = "MN395291-1", 
  subject = "ON513429-1", 
  id_var = "Core_ORF",
  seq_var = "translation",
  submat = "BLOSUM80",
  data = rbps
)

ps$position_scores |> head(5)
#> # A tibble: 5 × 4
#>   pattern subject identity score
#>   <chr>   <chr>   <lgl>    <int>
#> 1 M       M       TRUE         6
#> 2 N       N       TRUE         6
#> 3 I       I       TRUE         5
#> 4 L       L       TRUE         4
#> 5 R       R       TRUE         6

# find breakpoints along the alignment 
bps <- find_breakpoints(ps)

bps
#>   pattern_id subject_id start end mean_score pident
#> 1 MN395291-1 ON513429-1     1 152      5.138  0.947
#> 2 MN395291-1 ON513429-1   153 172      2.900  0.550
#> 3 MN395291-1 ON513429-1   173 630     -0.389  0.199
#> 4 MN395291-1 ON513429-1   631 645     -5.733  0.000
#> 5 MN395291-1 ON513429-1   646 913     -0.668  0.131

# identify the conserved N-terminal domain if it exists
adapter <- find_adapter(bps)

adapter
#>   pattern_id subject_id start end mean_score pident
#> 1 MN395291-1 ON513429-1     1 172      4.878  0.901

Note, for positions 1-152 and 153-172 the scores were quite different, but both ranges were classified as conserved, so they were merged (default). To investigate this, let’s plot the position scores:

plot(ps)

Depending on the method used for detecting breakpoints, the length of the identified adapter may differ. By default, find_breakpoints() uses method = "cemean" in which the function estimates the number of breakpoints and their locations using the Cross-Entropy Method from the breakpoint package. With method = "plateau" we get a different result:

ps |> find_breakpoints(method = "plateau", type = "ewma") |> find_adapter()
#>   pattern_id subject_id start end mean_score pident
#> 1 MN395291-1 ON513429-1     1 157      4.656  0.936

Note that with this method we detect a shorter conserved N-terminal domain.

Link RBPs by shared adapter domains

# load adapters of the example data set
data(adapters)

# convert to pident matrix and order by the original data frame
amat <- adapter_matrix(adapters, ids = rbps$Core_ORF)

# assign RBPs to clusters based on the distance matrix
clusters <- cluster_adapters(amat)

# visualise pident matrix with cluster assignments
plot(amat, clusters = clusters)

Citation

If you use tailor in a publication, please cite our paper:

András Asbóth, Tamás Stirling, Orsolya Méhi, Gábor Apjok, Victor Klein-Sousa, Nicholas MI Taylor, Hiba Hadj Mehdi, Balázs Papp, Eszter Ari, Bálint Kintses (2026) A global map of receptor-binding protein compatibility for the programmable design of Klebsiella and Acinetobacter phages. https://doi.org/10.64898/2026.02.20.706991

Meta

About

tailor is a Tool for Adapter-domain Identification and Linking Of RBPs.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages