Search

Search Results

Zenodo Logo
Zenodo
Tsuji, Jackson M.; Shaw, Nicolette A.; Nagashima, Sakiko; Venkiteswaran, Jason; Schiff, Sherry; Watanabe, Tomohiro; Fukui, Manabu; Hanada, Satoshi; Tank, Marcus; Neufeld, Josh D. 2023-12-06 Supplementary data files associated with Tsuji et al., 2023, "Anoxygenic phototrophic Chloroflexota member uses a Type I reaction center". These files are used by code in a corresponding GitHub repository (https://github.com/jmtsuji/Ca-Chlorohelix-allophototropha-RCI) that shows how various analyses that are presented in the paper were conducted. Files included: HPLC-based spectroscopy data ("...-HPLC-run1.tsv.gz" or "...-HPLC-run2.tsv.gz") -- hyper-spectral data files, generated by a diode array detector, that are associated with pigment analyses in the paper. See the corresponding Github repo for how these files are analyzed. Supplementary data about "Ca. Chloroheliales"-associated RCI: I_TASSER_homology_models_full_output.tar.gz -- Gzipped tarball containing the full output from I-TASSER for homology models of key phototrophy-related genes encoded by 'Candidatus Chlorohelix allophototropha' and 'Candidatus Chloroheliales bin L227-5C'. After unpacking the tarball, view a summary of the I-TASSER output for each gene by clicking on the 'index.html' file in that gene's folder. Boreal Shield lake survey data: lake_survey_MAGs.tar.gz -- Gzipped tarball containing the full collection of 756 metagenome-assembled genomes (MAGs) recovered from the Boreal Shield lake survey, corresponding to those mentioned in Supplementary Data 3. The FastA nucleotide genome sequences, FastA nucleotide predicted protein-coding gene sequences, FastA amino acid predicted protein sequences, and Genome Flat Files (GFFs) for all genomes are provided in the fna, ffn, faa, and gff subdirectories, respectively. lake_survey_MAGs_eggnog_annotations.tar.gz -- Gzipped tarball containing annotations (produced via EggNOG) for all predicted proteins among the 756 MAGs recovered from lake metagenome data. Because proteins were pre-clustered prior to annotation, a "orf2gene" file inside the tarball maps the gene clusters to the ORF IDs used for each genome. lake_survey_MAGs_featureCounts.tsv.gz -- GZipped tab-separated table containing the mapping statistics of metatranscriptome reads on all protein-coding genes from the 756 MAGs recovered from lake metagenome data. lake_survey_Ca_Chloroheliales_MAGs_info.tar.gz -- A subset of information from the previous three files specific to genome bins ELA319 and ELA729, which represent RCI-encoding "Ca. Chloroheliales" members. Intermediate files involved in some of the genome assembly work in this paper: Capt_S15_sequencer_data_raw.tar.gz -- Gzipped tarball containing the raw Illumina MiSeq output data for the 'Candidatus Chlorohelix allophototropha' subculture 15 sequencing run. The run represents a read cloud sequencing run relying on TELL-Seq technology. Indices can be parsed directly from raw output data using the Tell-Read pipeline. scaffold.full.fasta.gz -- the assembled scaffolds generated using Tell-Read and Tell-Link on the above raw MiSeq output data. Ca_Chloroheliaceae_bin_L227_5C_prokka_ORFs.faa.gz -- predicted open reading frames (ORFs) from the curated genome of 'Candidatus Chloroheliales bin L227-5C'. These ORFs were predicted using prokka and were used for some of the analyses presented in the paper. Most analyses used the annotations available on NCBI (generated by PGAP) for this strain. Changelog: - v1.0.0: First version - v1.0.1: Added uncurated genome bin files - v2.0.0: Added environmental survey data - v3.1.0: Added spectroscopy data and curated existing files https://creativecommons.org/licenses/by/4.0/legalcode