diff --git a/docs/htdocs/img/protein_function_encoding.png b/docs/htdocs/img/protein_function_encoding.png index 730abc3c15..2ebc9c954c 100644 Binary files a/docs/htdocs/img/protein_function_encoding.png and b/docs/htdocs/img/protein_function_encoding.png differ diff --git a/docs/htdocs/info/docs/tools/vep/online/results.html b/docs/htdocs/info/docs/tools/vep/online/results.html index 949fe6a888..c32d56f303 100644 --- a/docs/htdocs/info/docs/tools/vep/online/results.html +++ b/docs/htdocs/info/docs/tools/vep/online/results.html @@ -192,14 +192,6 @@
SIFT and PolyPhen - these columns can - contain both text (e.g. a SIFT prediction) and a number - (e.g. a frequency value). Ensembl VEP allows you to filter on - either part of this.
For example, you may enter - "is" and "deleterious" for SIFT to - return deleterious predictions, or "<" and - "0.1" to find results with a SIFT score less than - 0.1.
# find variants with AF > 0.1 in AFR or EUR but not EAS or SAS --filter "(AFR_AF > 0.1 or EUR_AF > 0.1) and (EAS_AF < 0.1 and SAS_AF < 0.1)"- -
For fields that contain string and number components, filter_vep will - try and match the relevant part based on the operator in use. For example, - using --sift b in Ensembl VEP gives strings that look like - "tolerated(0.46)". This will give a match to either of the following - filters:
- -# match string part ---filter "SIFT is tolerated" - -# match number part ---filter "SIFT < 0.5"
Note that for numeric fields, such as the *AF allele frequency fields, filter_vep does not consider the absence of a value for that field as equivalent to a 0 value. For example, if you wish to find rare variants by finding those where the allele frequency is less than 1% or absent, you should use the following:
--filter "AF < 0.01 or not AF"+ +
You can use regex to do pattern match. For example, using --sift b + in Ensembl VEP gives field value that has both string and number and looks like "tolerated(0.46)". + To match on the string part, you can do the following:
+ +# match string part +--filter "SIFT match tolerated"
For the Consequence field it is possible to use the Sequence Ontology to match terms diff --git a/docs/htdocs/info/docs/tools/vep/script/vep_tutorial.html b/docs/htdocs/info/docs/tools/vep/script/vep_tutorial.html index 05b622f09e..4580376dc3 100644 --- a/docs/htdocs/info/docs/tools/vep/script/vep_tutorial.html +++ b/docs/htdocs/info/docs/tools/vep/script/vep_tutorial.html @@ -219,7 +219,7 @@
./filter_vep -i variant_effect_output.txt -filter "SIFT is deleterious" | grep -v "##" | head -n5 +./filter_vep -i variant_effect_output.txt -filter "SIFT match deleterious" | grep -v "##" | head -n5 #Uploaded_variation Location Allele Gene Feature ... Extra rs2231495 22:17188416 C ENSG00000093072 ENST00000262607 ... SIFT=deleterious(0.05) @@ -246,7 +246,7 @@Run Ensembl VEP
./vep -i examples/homo_sapiens_GRCh38.vcf --cache --force_overwrite --sift b --canonical --symbol --tab --fields Uploaded_variation,SYMBOL,CANONICAL,SIFT -o STDOUT | \ -./filter_vep --filter "CANONICAL is YES and SIFT is deleterious" +./filter_vep --filter "CANONICAL is YES and SIFT match deleterious" ... diff --git a/ensembl/htdocs/info/genome/variation/prediction/protein_function.html b/ensembl/htdocs/info/genome/variation/prediction/protein_function.html index 69b9a71726..f87ee0f61a 100644 --- a/ensembl/htdocs/info/genome/variation/prediction/protein_function.html +++ b/ensembl/htdocs/info/genome/variation/prediction/protein_function.html @@ -390,17 +390,19 @@Prediction data format
("VAX" order, or "v" format if using the perl pack subroutine) unsigned short value. The top three bits of this short are used to encode the qualitative prediction, and the bottom ten bits are used to encode the prediction score. To decode the qualitative -prediction you should mask off all bits except the top three, and shift the resulting short -right by 13 bits and treat this as an integer between zero and four. The corresponding prediction can then +prediction you should mask off all bits except the top two, and shift the resulting short +right by 14 bits and treat this as an integer between zero and three. The corresponding prediction can then be looked up in the table below. To decode the prediction score you should mask off the top six bits and the resulting value can be treated as a number between zero and 1000, which should be divided by 1000 to give a three decimal place score (casting to a floating point type -if necessary). Bits 11-13 are not used, except to encode the "same as reference" dummy prediction +if necessary). Bits 11-14 are not used, except to encode the "same as reference" dummy prediction 0xFFFF.Note, for CADD prediction scores you do not need to divide by 1000 and use the value as is. For ESM1b scores you need to multiply by 100 and subtract by 50 (value * 100 - 50) after dividing by 1000. + Also, for AlphaMissense and ESM1b, the qualitative prediction is encoded using top three bits, and should be + decoded accordingly.
diff --git a/ensembl/htdocs/info/genome/variation/species/species_data_types.html b/ensembl/htdocs/info/genome/variation/species/species_data_types.html index 5fc83e06ca..03f35de3f3 100644 --- a/ensembl/htdocs/info/genome/variation/species/species_data_types.html +++ b/ensembl/htdocs/info/genome/variation/species/species_data_types.html @@ -997,7 +997,7 @@
Species supported this way:
- - -+ @@ -1173,6 +1173,7 @@Species supported this way:
81 K +-