-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hi all!!
I've been trying to use the function gdcRNADownload() to download RNAseq data from TCGA but no matter what RNAseq type I try, I always get the same error:
Successfully downloaded: 0
Warning message:
In read.table(paste(url, "&return_type=manifest", sep = ""), header = TRUE, :
incomplete final line found by readTableHeader on 'https://api.gdc.cancer.gov/files?filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CHOL%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:%22Transcriptome%20Profiling%22%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:%22Gene%20Expression%20Quantification%22%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:%22HTSeq%20-%20Counts%22%7D%7D]%7D&pretty=true&format=JSON&size=10000&expand=analysis,analysis.input_files,associated_entities,cases,cases.diagnoses,cases.diagnoses.treatments,cases.demographic,cases.project,cases.samples,cases.samples.portions,cases.samples.portions.analytes,cases.samples.portions.analytes.aliquots,cases.samples.portions.slides&return_type=manifest'
It only happens with RNAseq type of data. I can download miRNAs data without problems. Initially I was working on a Macbook air with M1 chip:
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_1.4.0 readxl_1.4.0 tibble_3.1.6 oligo_1.56.0
[5] Biostrings_2.60.2 GenomeInfoDb_1.28.4 XVector_0.32.0 IRanges_2.26.0
[9] S4Vectors_0.30.2 oligoClasses_1.54.0 GEOquery_2.60.0 Biobase_2.52.0
[13] BiocGenerics_0.38.0 edgeR_3.34.1 limma_3.48.3 GDCRNATools_1.13.1
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.2 RSQLite_2.2.12
[4] AnnotationDbi_1.54.1 htmlwidgets_1.5.4 grid_4.1.1
[7] BiocParallel_1.26.2 scatterpie_0.1.7 munsell_0.5.0
[10] codetools_0.2-18 preprocessCore_1.54.0 DT_0.22
[13] colorspace_2.0-3 GOSemSim_2.18.1 filelock_1.0.2
[16] knitr_1.38 rstudioapi_0.13 ggsignif_0.6.3
[19] DOSE_3.18.3 pathview_1.32.0 MatrixGenerics_1.4.3
[22] KEGGgraph_1.52.0 GenomeInfoDbData_1.2.6 KMsurv_0.1-5
[25] polyclip_1.10-0 bit64_4.0.5 farver_2.1.0
[28] downloader_0.4 vctrs_0.4.0 treeio_1.16.2
[31] generics_0.1.2 xfun_0.30 BiocFileCache_2.0.0
[34] affxparser_1.64.1 R6_2.5.1 graphlayouts_0.8.0
[37] locfit_1.5-9.5 bitops_1.0-7 cachem_1.0.6
[40] fgsea_1.18.0 gridGraphics_0.5-1 DelayedArray_0.18.0
[43] assertthat_0.2.1 promises_1.2.0.1 scales_1.1.1
[46] ggraph_2.0.5 enrichplot_1.12.3 gtable_0.3.0
[49] tidygraph_1.2.1 rlang_1.0.2 genefilter_1.74.1
[52] splines_4.1.1 rstatix_0.7.0 lazyeval_0.2.2
[55] broom_0.7.12 BiocManager_1.30.16 reshape2_1.4.4
[58] abind_1.4-5 backports_1.4.1 httpuv_1.6.5
[61] qvalue_2.24.0 clusterProfiler_4.0.5 tools_4.1.1
[64] ggplotify_0.1.0 ggplot2_3.3.5 affyio_1.62.0
[67] ellipsis_0.3.2 gplots_3.1.1 ff_4.0.5
[70] RColorBrewer_1.1-3 Rcpp_1.0.8.3 plyr_1.8.7
[73] progress_1.2.2 zlibbioc_1.38.0 purrr_0.3.4
[76] RCurl_1.98-1.6 prettyunits_1.1.1 ggpubr_0.4.0
[79] viridis_0.6.2 cowplot_1.1.1 zoo_1.8-9
[82] SummarizedExperiment_1.22.0 ggrepel_0.9.1 magrittr_2.0.3
[85] data.table_1.14.2 DO.db_2.9 survminer_0.4.9
[88] matrixStats_0.61.0 hms_1.1.1 patchwork_1.1.1
[91] mime_0.12 xtable_1.8-4 XML_3.99-0.9
[94] gridExtra_2.3 compiler_4.1.1 biomaRt_2.48.3
[97] KernSmooth_2.23-20 crayon_1.5.1 shadowtext_0.1.1
[100] htmltools_0.5.2 ggfun_0.0.6 later_1.3.0
[103] tzdb_0.3.0 tidyr_1.2.0 geneplotter_1.70.0
[106] aplot_0.1.3 DBI_1.1.2 tweenr_1.0.2
[109] dbplyr_2.1.1 MASS_7.3-56 rappdirs_0.3.3
[112] Matrix_1.4-1 car_3.0-12 readr_2.1.2
[115] cli_3.2.0 igraph_1.3.0 km.ci_0.5-2
[118] GenomicRanges_1.44.0 pkgconfig_2.0.3 xml2_1.3.3
[121] foreach_1.5.2 ggtree_3.0.4 annotate_1.70.0
[124] yulab.utils_0.0.4 digest_0.6.29 graph_1.70.0
[127] cellranger_1.1.0 fastmatch_1.1-3 survMisc_0.5.5
[130] tidytree_0.3.9 curl_4.3.2 shiny_1.7.1
[133] gtools_3.9.2 rjson_0.2.21 lifecycle_1.0.1
[136] nlme_3.1-157 GenomicDataCommons_1.16.0 jsonlite_1.8.0
[139] carData_3.0-5 viridisLite_0.4.0 fansi_1.0.3
[142] pillar_1.7.0 lattice_0.20-45 KEGGREST_1.32.0
[145] fastmap_1.1.0 httr_1.4.2 survival_3.3-1
[148] GO.db_3.13.0 glue_1.6.2 png_0.1-7
[151] iterators_1.0.14 bit_4.0.4 Rgraphviz_2.36.0
[154] ggforce_0.3.3 stringi_1.7.6 blob_1.2.2
[157] DESeq2_1.32.0 org.Hs.eg.db_3.13.0 caTools_1.18.2
[160] memoise_2.0.1 dplyr_1.0.8 ape_5.6-2
But I also have the same issue when I try to execute the same function in the cluster:
sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Springdale Linux 7.9 (Verona)
Matrix products: default
BLAS/LAPACK: /ifs/data/fg2532_lab/jc5737/Conda_env/lib/libopenblasp-r0.3.18.so
locale:
[1] C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] stringr_1.4.0 readxl_1.4.0 tibble_3.1.6
[4] oligo_1.58.0 Biostrings_2.62.0 GenomeInfoDb_1.30.1
[7] XVector_0.34.0 IRanges_2.28.0 S4Vectors_0.32.4
[10] oligoClasses_1.56.0 GEOquery_2.62.2 Biobase_2.54.0
[13] BiocGenerics_0.40.0 edgeR_3.36.0 limma_3.50.1
[16] GDCRNATools_1.14.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 tidyselect_1.1.2
[3] RSQLite_2.2.12 AnnotationDbi_1.56.2
[5] htmlwidgets_1.5.4 grid_4.1.3
[7] BiocParallel_1.28.3 scatterpie_0.1.7
[9] munsell_0.5.0 preprocessCore_1.56.0
[11] codetools_0.2-18 DT_0.22
[13] colorspace_2.0-3 GOSemSim_2.20.0
[15] filelock_1.0.2 knitr_1.38
[17] ggsignif_0.6.3 DOSE_3.20.1
[19] pathview_1.34.0 MatrixGenerics_1.6.0
[21] KEGGgraph_1.54.0 GenomeInfoDbData_1.2.7
[23] KMsurv_0.1-5 polyclip_1.10-0
[25] bit64_4.0.5 farver_2.1.0
[27] downloader_0.4 vctrs_0.4.0
[29] treeio_1.18.1 generics_0.1.2
[31] xfun_0.30 BiocFileCache_2.2.1
[33] affxparser_1.66.0 R6_2.5.1
[35] graphlayouts_0.8.0 locfit_1.5-9.5
[37] bitops_1.0-7 cachem_1.0.6
[39] fgsea_1.20.0 gridGraphics_0.5-1
[41] DelayedArray_0.20.0 assertthat_0.2.1
[43] promises_1.2.0.1 scales_1.1.1
[45] ggraph_2.0.5 enrichplot_1.14.2
[47] gtable_0.3.0 tidygraph_1.2.1
[49] rlang_1.0.2 genefilter_1.76.0
[51] splines_4.1.3 rstatix_0.7.0
[53] lazyeval_0.2.2 broom_0.7.12
[55] BiocManager_1.30.16 reshape2_1.4.4
[57] abind_1.4-5 backports_1.4.1
[59] httpuv_1.6.5 qvalue_2.26.0
[61] clusterProfiler_4.2.2 tools_4.1.3
[63] ggplotify_0.1.0 ggplot2_3.3.5
[65] affyio_1.64.0 ellipsis_0.3.2
[67] gplots_3.1.1 ff_4.0.5
[69] RColorBrewer_1.1-3 Rcpp_1.0.8.3
[71] plyr_1.8.7 progress_1.2.2
[73] zlibbioc_1.40.0 purrr_0.3.4
[75] RCurl_1.98-1.6 prettyunits_1.1.1
[77] ggpubr_0.4.0 viridis_0.6.2
[79] zoo_1.8-9 SummarizedExperiment_1.24.0
[81] ggrepel_0.9.1 magrittr_2.0.3
[83] data.table_1.14.2 DO.db_2.9
[85] survminer_0.4.9 matrixStats_0.61.0
[87] hms_1.1.1 patchwork_1.1.1
[89] mime_0.12 xtable_1.8-4
[91] XML_3.99-0.9 gridExtra_2.3
[93] compiler_4.1.3 biomaRt_2.50.3
[95] KernSmooth_2.23-20 crayon_1.5.1
[97] shadowtext_0.1.1 htmltools_0.5.2
[99] ggfun_0.0.6 later_1.3.0
[101] tzdb_0.3.0 tidyr_1.2.0
[103] geneplotter_1.72.0 aplot_0.1.3
[105] DBI_1.1.2 tweenr_1.0.2
[107] dbplyr_2.1.1 MASS_7.3-56
[109] rappdirs_0.3.3 Matrix_1.4-1
[111] car_3.0-12 readr_2.1.2
[113] cli_3.2.0 parallel_4.1.3
[115] igraph_1.3.0 GenomicRanges_1.46.1
[117] pkgconfig_2.0.3 km.ci_0.5-6
[119] xml2_1.3.3 foreach_1.5.2
[121] ggtree_3.2.1 annotate_1.72.0
[123] yulab.utils_0.0.4 digest_0.6.29
[125] graph_1.72.0 cellranger_1.1.0
[127] fastmatch_1.1-3 survMisc_0.5.6
[129] tidytree_0.3.9 curl_4.3.2
[131] shiny_1.7.1 gtools_3.9.2
[133] rjson_0.2.21 lifecycle_1.0.1
[135] nlme_3.1-157 GenomicDataCommons_1.18.0
[137] jsonlite_1.8.0 carData_3.0-5
[139] viridisLite_0.4.0 fansi_1.0.3
[141] pillar_1.7.0 lattice_0.20-45
[143] KEGGREST_1.34.0 fastmap_1.1.0
[145] httr_1.4.2 survival_3.3-1
[147] GO.db_3.14.0 glue_1.6.2
[149] png_0.1-7 iterators_1.0.14
[151] bit_4.0.4 Rgraphviz_2.38.0
[153] ggforce_0.3.3 stringi_1.7.6
[155] blob_1.2.2 DESeq2_1.34.0
[157] org.Hs.eg.db_3.14.0 caTools_1.18.2
[159] memoise_2.0.1 dplyr_1.0.8
[161] ape_5.6-2
So I don't know how to solve the problem because when I try to troubleshoot the gdcRNADownload() function and follow line by line the code, it says that one of the inner functions (gdcGetURL()) it's not found. So I don't know where the error comes from because I can't access the URL containing the RNAseq data. It might even be a format problem with the downloaded data. I know this issue was reported before but given there was no follow-through, I thought a new threat might bring a bit more attention. Sorry guys and thanks a lot for your help!
Josu