AssemblyQC

AssemblyQC is a Nextflow pipeline which evaluates assembly quality with well established tools and presents the results in a unified html report.

Reference:

Rashid, U., Wu, C., Shiller, J., Smith, K., Crowhurst, R., Davy, M., Chen, T.-H., Carvajal, I., Bailey, S., Thomson, S., & Deng, C.H. (2024). AssemblyQC: A Nextflow pipeline for reproducible reporting of assembly quality. Bioinformatics. DOI 10.1093/bioinformatics/btae477. GitHub https://github.com/Plant-Food-Research-Open/assemblyqc.

Pipeline Parameters

Only displaying parameters that differ from the pipeline defaults.

{
    "Input/output options": {
        "input": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/assemblysheetv2.csv",
        "outdir": "results",
        "tags": "plant-food-research-open,assemblyqc,test_full"
    },
    "General stats options": {
        "gfastats_skip": "false"
    },
    "NCBI FCS options": {
        "ncbi_fcs_adaptor_skip": "false",
        "ncbi_fcs_adaptor_empire": "euk",
        "ncbi_fcs_gx_skip": "false",
        "ncbi_fcs_gx_tax_id": "35717",
        "ncbi_fcs_gx_db_path": "/workspace/ComparativeDataSources/NCBI/FCS/GX/r2023-01-24"
    },
    "tidk options": {
        "tidk_skip": "false",
        "tidk_repeat_seq": "TTTGGG"
    },
    "BUSCO options": {
        "busco_skip": "false",
        "busco_mode": "genome",
        "busco_lineage_datasets": "fungi_odb10 hypocreales_odb10"
    },
    "LAI options": {
        "lai_skip": "false"
    },
    "Kraken 2 options": {
        "kraken2_skip": "false",
        "kraken2_db_path": "/workspace/ComparativeDataSources/kraken2db/k2_pluspfp_20240904"
    },
    "HiC options": {
        "hic": "SRR8238190"
    },
    "Merqury options": {
        "merqury_skip": "false"
    },
    "Synteny options": {
        "synteny_skip": "false",
        "synteny_mummer_skip": "false",
        "synteny_plotsr_skip": "false",
        "synteny_xref_assemblies": "https://raw.githubusercontent.com/plant-food-research-open/assemblyqc/dev/assets/xrefsheet.csv"
    },
    "Institutional config options": {
        "config_profile_name": "Plant&Food profile",
        "config_profile_description": "Plant&Food profile using SLURM in combination with Apptainer"
    },
    "Generic options": {
        "trace_report_suffix": "2025-09-23_12-35-07"
    },
    "Core Nextflow options": {
        "runName": "cheesy_roentgen",
        "containerEngine": "apptainer",
        "launchDir": "/powerplant/workspace/hrauxr/assemblyqc",
        "workDir": "/powerplant/workspace/hrauxr/assemblyqc/work",
        "projectDir": "/powerplant/workspace/hrauxr/assemblyqc",
        "userName": "hrauxr",
        "profile": "pfr,apptainer,test_full",
        "configFiles": "/workspace/hrauxr/.nextflow/config, /powerplant/workspace/hrauxr/assemblyqc/nextflow.config, /powerplant/workspace/hrauxr/assemblyqc/pfr/profile.config"
    }
}

Pipeline Tools

Following is a non-exhaustive list of tools used to generate this report.

{
    "FCS-adaptor": "0.5.0",
    "KronaTools": "2.7.1",
    "LTR_FINDER_parallel": "v1.1",
    "LTR_HARVEST_parallel": "v1.1",
    "LTR_retriever": "v2.9.9",
    "Nextflow": "24.10.6",
    "assemblathon_stats": "github/PlantandFoodResearch/assemblathon2-analysis/a93cba2",
    "awk": "1.3.4 20200120",
    "biopython": 1.75,
    "busco": "5.8.3",
    "bwa": "0.7.18-r1243-dirty",
    "circos": "v0.69-8",
    "curl": "8.5.0",
    "dnadiff": 1.3,
    "fa-lint": "1.2.0",
    "fastp": "0.24.0",
    "fcs_gx": "0.5.5",
    "genometools": "1.6.5",
    "gfastats": "1.3.10",
    "gffread": "0.12.7",
    "grep": "(GNU grep) 3.4",
    "gunzip": 1.13,
    "hic_qc.py": "1.3.1",
    "hictk": "hictk-v2.1.4-bioconda",
    "juicebox_scripts": "0.1.0",
    "kraken2": "2.1.2",
    "lai": "beta3.2",
    "ltr_finder": "v1.07",
    "merqury": 1.3,
    "meryl": "1.4.1",
    "minimap2": "2.29-r1283",
    "nucmer": "4.0.0rc1",
    "pandas": "2.1.1",
    "perl": "5.32.1",
    "pigz": 2.6,
    "plant-food-research-open/assemblyqc": "v3.0.0",
    "plotly": "5.20.0",
    "plotsr": "1.1.1",
    "python": "3.10.2",
    "samblaster": "0.1.26",
    "samtools": "1.19.2",
    "sed": "(GNU sed) 4.7",
    "seqkit": "v2.9.0",
    "sratools": "3.1.0",
    "syri": "1.7.0",
    "tidk": "0.2.41",
    "ubuntu": "24.04.1l",
    "yahs": "1.2.2"
}

FCS-adaptor detects adaptor and vector contamination in genome sequences.

Reference:

https://github.com/ncbi/fcs

Version: 0.5.0

Summary

Assembly	Contaminated?
FI1	No

FCS-GX detects contamination from foreign organisms in genome sequences.

Reference:

Alexander Astashyn, Eric S Tvedte, Deacon Sweeney, Victor Sapojnikov, Nathan Bouk, Victor Joukov, Eyal Mozes, Pooja K Strope, Pape M Sylla, Lukas Wagner, Shelby L Bidwell, Karen Clark, Emily W Davis, Brian Smith-White, Wratko Hlavina, Kim D Pruitt, Valerie A Schneider, Terence D Murphy bioRxiv 2023.06.02.543519; doi: 10.1101/2023.06.02.543519, GitHub: https://github.com/ncbi/fcs

Version: 0.5.5

DB Version: 2023-01-24

Note:

This report dynamically loads '*.fcs.gx.krona.html' files from the 'ncbi_fcs_gx' folder under the output directory. These files should also be moved when moving the report's HTML file.

Summary

Assembly	Contaminated?
FI1	No

A script to calculate a basic set of metrics from a genome assembly.

Reference:

https://github.com/KorfLab/Assemblathon

Version: github/PlantandFoodResearch/assemblathon2-analysis/a93cba2

Warning:

Contig-related stats are based on the assumption that the assemblathon_stats_n_limit (100) parameter is specified correctly. If you are not certain of the value of the n_limit parameter, please ignore the contig-related stats.

FI1

Stat	Value
Assembly	GCA_003814445.1_ASM381444v1_genomic.fna
Number of scaffolds	8
Total size of scaffolds	35023690
Longest scaffold	7872678
Shortest scaffold	52960
Number of scaffolds > 1K nt	8
Percentage of scaffolds > 1K nt	100.0
Number of scaffolds > 10K nt	8
Percentage of scaffolds > 10K nt	100.0
Number of scaffolds > 100K nt	7
Percentage of scaffolds > 100K nt	87.5
Number of scaffolds > 1M nt	7
Percentage of scaffolds > 1M nt	87.5
Number of scaffolds > 10M nt	0
Percentage of scaffolds > 10M nt	0.0
Mean scaffold size	4377961
Median scaffold size	3434925
N50 scaffold length	6201951
L50 scaffold count	3
scaffold %A	28.15
scaffold %C	21.88
scaffold %G	21.83
scaffold %T	28.15
scaffold %N	0.0
scaffold %non-ACGTN	0.0
Number of scaffold non-ACGTN nt	0
Percentage of assembly in scaffolded contigs	0.0
Percentage of assembly in unscaffolded contigs	100.0
Average number of contigs per scaffold	1.0
Mean length of breaks (>=100Ns) between contigs in scaffold	0
Number of contigs	8
Number of contigs in scaffolds	0
Number of contigs not in scaffolds	8
Total size of contigs	35023690
Longest contig	7872678
Shortest contig	52960
Number of contigs > 1K nt	8
Percentage of contigs > 1K nt	100.0
Number of contigs > 10K nt	8
Percentage of contigs > 10K nt	100.0
Number of contigs > 100K nt	7
Percentage of contigs > 100K nt	87.5
Number of contigs > 1M nt	7
Percentage of contigs > 1M nt	87.5
Number of contigs > 10M nt	0
Percentage of contigs > 10M nt	0.0
Mean contig size	4377961
Median contig size	3434925
N50 contig length	6201951
L50 contig count	3
contig %A	28.15
contig %C	21.88
contig %G	21.83
contig %T	28.15
contig %N	0.0
contig %non-ACGTN	0.0
Number of contig non-ACGTN nt	0

A fast and exhaustive tool for summary statistics.

Reference:

Giulio Formenti, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier Fedrigo, Erich D Jarvis, Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4214–4216, 10.1093/bioinformatics/btac460

Version: 1.3.10

FI1

Stat	Value
Total scaffold length	35023690
Average scaffold length	4377961.25
Scaffold N50	6201951
Scaffold auN	5781567.55
Scaffold L50	3
Largest scaffold	7872678
Smallest scaffold	52960
# contigs	8
Total contig length	35023690
Average contig length	4377961.25
Contig N50	6201951
Contig auN	5781567.55
Contig L50	3
Largest contig	7872678
Smallest contig	52960
# gaps in scaffolds	0
Total gap length in scaffolds	0
Average gap length in scaffolds	0.00
Gap N50 in scaffolds	0
Gap auN in scaffolds	0.00
Gap L50 in scaffolds	0
Largest gap in scaffolds	0
Smallest gap in scaffolds	0
Base composition (A:C:G:T)	9857662:7662657:7645812:9857559
GC content %	43.71
# soft-masked bases	10431104
# segments	8
Total segment length	35023690
Average segment length	4377961.25
# gaps	0
# paths	8
Scaffold N10	7872678
Scaffold N20	7872678
Scaffold N30	7605136
Scaffold N40	7605136
Scaffold N50	6201951
Scaffold N60	6201951
Scaffold N70	3434925
Scaffold N80	3417637
Scaffold N90	3252422
Scaffold N100	52960
Scaffold L10	1
Scaffold L20	1
Scaffold L30	2
Scaffold L40	2
Scaffold L50	3
Scaffold L60	3
Scaffold L70	4
Scaffold L80	5
Scaffold L90	6
Scaffold L100	8
Contig N10	7872678
Contig N20	7872678
Contig N30	7605136
Contig N40	7605136
Contig N50	6201951
Contig N60	6201951
Contig N70	3434925
Contig N80	3417637
Contig N90	3252422
Contig N100	52960
Contig L10	1
Contig L20	1
Contig L30	2
Contig L40	2
Contig L50	3
Contig L60	3
Contig L70	4
Contig L80	5
Contig L90	6
Contig L100	8
Gap N10	0
Gap N20	0
Gap N30	0
Gap N40	0
Gap N50	0
Gap N60	0
Gap N70	0
Gap N80	0
Gap N90	0
Gap N100	0
Gap L10	0
Gap L20	0
Gap L30	0
Gap L40	0
Gap L50	0
Gap L60	0
Gap L70	0
Gap L80	0
Gap L90	0
Gap L100	0

A tool to calculate a basic set of statistics about features contained in GFF3 files.

Reference:

Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):645-56. doi: 10.1109/TCBB.2013.68. PMID: 24091398.

Version: 1.6.5

FI1

Stat	Value
parsed genome node DAGs	7165
sequence regions	8 (total length: 35023690)
multi-features	5951
genes	7137
protein-coding genes	7034
mRNAs	7034
protein-coding mRNAs	7034
exons	20368
CDSs	20265
introns	13231
rRNAs	3
regions	8
tRNAs	98
transcripts	2

BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs.

Reference:

Manni M., Berkeley M.R., Seppey M., Simao F.A., Zdobnov E.M. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. arXiv:2106.11799 [q-bio] [Internet]. Available from: arxiv.org/abs/2106.11799

Version: 5.8.3

Summary

Assembly	Lineage	Percentages
FI1	fungi_odb10	C:98.4%[S:97.9%,D:0.5%],F:0.1%,M:1.5%,n:758
FI1	hypocreales_odb10	C:96.3%[S:96.2%,D:0.1%],F:0.5%,M:3.2%,n:4494

Event	Value
Search Percentages	C:98.4%[S:97.9%,D:0.5%],F:0.1%,M:1.5%,n:758

Event	Frequency
Complete BUSCOs (C)	746
Complete and single-copy BUSCOs (S)	742
Complete and duplicated BUSCOs (D)	4
Fragmented BUSCOs (F)	1
Missing BUSCOs (M)	11
Total BUSCO groups searched	758

Parameter	Value
Version	5.8.3
Lineage create on	2024-01-08
mode	euk_genome_met
predictor	metaeuk

Dependency	Version
hmmsearch	3.4
bbtools	None
metaeuk	7.bba0d80
python	sys.version_info(major=3, minor=10, micro=16, releaselevel='final', serial=0)

Event	Value
Search Percentages	C:96.3%[S:96.2%,D:0.1%],F:0.5%,M:3.2%,n:4494

Event	Frequency
Complete BUSCOs (C)	4326
Complete and single-copy BUSCOs (S)	4322
Complete and duplicated BUSCOs (D)	4
Fragmented BUSCOs (F)	23
Missing BUSCOs (M)	145
Total BUSCO groups searched	4494

Parameter	Value
Version	5.8.3
Lineage create on	2024-01-08
mode	euk_genome_met
predictor	metaeuk

Dependency	Version
hmmsearch	3.4
bbtools	None
metaeuk	7.bba0d80
python	sys.version_info(major=3, minor=10, micro=16, releaselevel='final', serial=0)

BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. GFFREAD is used to obtain protein sequences from assembly FASTA and annotation GFF3 files.

Reference:

Manni M., Berkeley M.R., Seppey M., Simao F.A., Zdobnov E.M. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. arXiv:2106.11799 [q-bio] [Internet]. Available from: arxiv.org/abs/2106.11799

Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020 Apr 28;9:ISCB Comm J-304. doi: 10.12688/f1000research.23297.2. PMID: 32489650; PMCID: PMC7222033.

Version: 5.8.3 (BUSCO), 0.12.7 (GFFREAD)

Summary

Annotation	Lineage	Percentages
FI1	fungi_odb10	C:90.1%[S:89.6%,D:0.5%],F:0.7%,M:9.2%,n:758
FI1	hypocreales_odb10	C:87.5%[S:87.4%,D:0.1%],F:0.6%,M:11.9%,n:4494

Event	Value
Search Percentages	C:90.1%[S:89.6%,D:0.5%],F:0.7%,M:9.2%,n:758

Event	Frequency
Complete BUSCOs (C)	683
Complete and single-copy BUSCOs (S)	679
Complete and duplicated BUSCOs (D)	4
Fragmented BUSCOs (F)	5
Missing BUSCOs (M)	70
Total BUSCO groups searched	758

Parameter	Value
Version	5.8.3
Lineage create on	2024-01-08
mode	proteins
predictor	None

Dependency	Version
hmmsearch	3.4
python	sys.version_info(major=3, minor=10, micro=16, releaselevel='final', serial=0)

Event	Value
Search Percentages	C:87.5%[S:87.4%,D:0.1%],F:0.6%,M:11.9%,n:4494

Event	Frequency
Complete BUSCOs (C)	3931
Complete and single-copy BUSCOs (S)	3927
Complete and duplicated BUSCOs (D)	4
Fragmented BUSCOs (F)	26
Missing BUSCOs (M)	537
Total BUSCO groups searched	4494

Parameter	Value
Version	5.8.3
Lineage create on	2024-01-08
mode	proteins
predictor	None

Dependency	Version
hmmsearch	3.4
python	sys.version_info(major=3, minor=10, micro=16, releaselevel='final', serial=0)

A toolkit to identify and visualise telomeric repeats for the Darwin Tree of Life genomes.

Reference:

https://github.com/tolkit/telomeric-identifier

Version: 0.2.41

FI1: a posteriori sequence

Searched sequence: AACCCTAACCCTAACCCTAACCCT

LTR Assembly Index (LAI) is a reference-free genome metric that evaluates assembly continuity using LTR-RTs. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Correcting for LTR-RT amplification dynamics, LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics such as BUSCO. LAI = Raw LAI + 2.8138 × (94 – whole genome LTR identity). The LAI is set to 0 when raw LAI = 0 or the adjustment produces a negative value. Raw LAI = (Intact LTR element length / Total LTR sequence length) * 100

Reference:

Shujun Ou, Jinfeng Chen, Ning Jiang, Assessing genome assembly quality using the LTR Assembly Index (LAI), Nucleic Acids Research, Volume 46, Issue 21, 30 November 2018, Page e126, 10.1093/nar/gky730

Version: beta3.2

Summary

Assembly	Results
FI1	Intact: 0.0113, Total: 0.2065, Raw LAI: 5.50, LAI: 4.84

Kraken2 assigns taxonomic labels to sequencing reads for metagenomics projects.

Reference:

Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). 10.1186/s13059-019-1891-0

Version: 2.1.2

Note:

This report dynamically loads '*.kraken2.krona.html' files from the 'kraken2' folder under the output directory. These files should also be moved when moving the report's HTML file.

FI1

Hi-C contact mapping experiments measure the frequency of physical contact between loci in the genome. The resulting dataset, called a “contact map,” is represented using a two-dimensional heatmap where the intensity of each pixel indicates the frequency of contact between a pair of loci.

References:

fastp Chen, Yanqing Zhou, Yaru Chen, Jia Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, September 2018, Pages i884–i890, 10.1093/bioinformatics/bty560

BWA Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv: 1303.3997.

SAMBLASTER Gregory G. Faust, Ira M. Hall, SAMBLASTER: fast duplicate marking and structural variant read extraction, Bioinformatics, Volume 30, Issue 17, September 2014, Pages 2503–2505, 10.1093/bioinformatics/btu314

SAMtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li, Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, 10.1093/gigascience/giab008

YaHS Chenxi Zhou, Shane A McCarthy, Richard Durbin, YaHS: yet another Hi-C scaffolding tool, Bioinformatics, Volume 39, Issue 1, January 2023, btac808, 10.1093/bioinformatics/btac808

hictk Roberto Rossini, Jonas Paulsen, hictk: blazing fast toolkit to work with .hic and .cool files Bioinformatics, Volume 40, Issue 7, July 2024, btae408, 10.1093/bioinformatics/btae408

Juicebox.js Robinson JT, Turner D, Durand NC, Thorvaldsdóttir H, Mesirov JP, Aiden EL. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst. 2018 Feb 28;6(2):256-258.e1. 10.1016/j.cels.2018.01.001. Epub 2018 Feb 7. PMID: 29428417; PMCID: PMC6047755.

Version: 2.4.3

Notes:

The Hi-C contact map is only loaded when the report is opened through a HTTP(s) server and all the necessary permissions are in place. The contact map '.hic' file is stored in the 'hic' folder under the output directory. It can be visualized with Juicebox.
This report dynamically loads content from the 'hic' folder under the output directory including '*.hic', '*.html', and 'hicqc/'. These files and folders should also be moved when moving the report's HTML file.

FI1

HiC QC report

fastp log

Detecting adapter sequence for read1...
>Illumina TruSeq Adapter Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Detecting adapter sequence for read2...
>Illumina TruSeq Adapter Read 2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Read1 before filtering:
total reads: 26408294
total bases: 2112663520
Q20 bases: 2033771115(96.2657%)
Q30 bases: 2001190936(94.7236%)

Read2 before filtering:
total reads: 26408294
total bases: 2112663520
Q20 bases: 1978972764(93.6719%)
Q30 bases: 1933151999(91.5031%)

Read1 after filtering:
total reads: 25405615
total bases: 2031716126
Q20 bases: 1966130766(96.7719%)
Q30 bases: 1937083561(95.3422%)

Read2 after filtering:
total reads: 25405615
total bases: 2031715879
Q20 bases: 1935938955(95.2859%)
Q30 bases: 1897487240(93.3933%)

Filtering result:
reads passed filter: 50811230
reads failed due to low quality: 1584466
reads failed due to too many N: 22150
reads failed due to too short: 398742
reads with adapter trimmed: 723915
bases trimmed due to adapters: 37726206

Duplication rate: 7.78525%

Insert size peak (evaluated by paired-end reads): 129

JSON report: SRR8238190.fastp.json
HTML report: SRR8238190.fastp.html

fastp --in1 SRR8238190_1.fastq.gz --in2 SRR8238190_2.fastq.gz --out1 SRR8238190_1.fastp.fastq.gz --out2 SRR8238190_2.fastp.fastq.gz --json SRR8238190.fastp.json --html SRR8238190.fastp.html --failed_out SRR8238190.paired.fail.fastq.gz --unpaired1 SRR8238190_1.fail.fastq.gz --unpaired2 SRR8238190_2.fail.fastq.gz --thread 6 --detect_adapter_for_pe --qualified_quality_phred 20 --length_required 50
fastp v0.24.0, time used: 70 seconds

Circos facilitates the identification and analysis of similarities and differences arising from comparisons of genomes. The genome-wide alignments are performed with MUMMER.

References:

Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., ... & Marra, M. A. (2009). Circos: an information aesthetic for comparative genomics. Genome research, 19(9), 1639-1645. 10.1101/gr.092759.109

Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. 10.1371/journal.pcbi.1005944

Versions: v0.69-8 (CIRCOS), 4.0.0rc1 (MUMMER)

Notes:

Alignments within a distance of 1000000bp have been bundled together.
After bundling, any bundle smaller than 1000000bp has been filtered out.
The sequence labels shown on the plot are based on the labelling file provided to the pipeline. These labels may or may not be same as the sequence IDs in the corresponding FASTA files.

FI1 : JAD : all

The genome-wide alignments are performed with MUMMER.

References:

Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., ... & Marra, M. A. (2009). Circos: an information aesthetic for comparative genomics. Genome research, 19(9), 1639-1645. https://doi.org/10.1101/gr.092759.109

Version: 4.0.0rc1 (MUMMER)

Notes:

Alignments within a distance of 1000000bp have been bundled together.
After bundling, any bundle smaller than 1000000bp has been filtered out.
The sequence labels shown on the plot are based on the labelling file provided to the pipeline. These labels may or may not be same as the sequence IDs in the corresponding FASTA files.

FI1 : JAD : all

Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes. For this, it uses the genomic structural annotations between multiple chromosome-level assemblies. The genome-wide alignments are performed with Minimap2.

References:

Goel M, Schneeberger K. 2022. plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 2022 May 13;38(10):2922-2926. doi: 10.1093/bioinformatics/btac196. PMID: 35561173; PMCID: PMC9113368.

Goel M, Sun H, Jiao WB, Schneeberger K. 2019. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019 Dec 16;20(1):277. doi: 10.1186/s13059-019-1911-0. PMID: 31842948; PMCID: PMC6913012.

Li H. 2021. New strategies to improve minimap2 alignment accuracy, Bioinformatics, Volume 37, Issue 23, December 2021, Pages 4572–4574, doi: 10.1093/bioinformatics/btab705

Versions: 1.1.1 (PLOTSR), 1.7.0 (SYRI), 2.29-r1283 (MINIMAP2)

Note:

This report dynamically loads '*.on.*.all/' folders from the 'synteny' folder under the output directory. These folders should also be moved when moving the report's HTML file.

Error: Syri failed to detect structural rearrangements for following comparisons: TT_2021a with reference to JAD. This may be due to known Syri limitations. See: GitHub/Syri/Limitations

Sequence labels

Labels	JAD	TT_2021a	FI1
Chr1	JADWOS010000003.1	CP083245.1	CP031385.1
Chr2	JADWOS010000004.1	CP083246.1	CP031386.1
Chr3	JADWOS010000005.1	CP083247.1	CP031387.1
Chr4	JADWOS010000006.1	CP083248.1	CP031388.1
Chr5	JADWOS010000007.1	CP083249.1	CP031389.1
Chr6	JADWOS010000008.1	CP083250.1	CP031390.1
Chr7	JADWOS010000009.1	CP083251.1	CP031391.1

Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.

References:

Rhie, A., Walenz, B.P., Koren, S. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245 (2020). doi: 10.1186/s13059-020-02134-9

Version: 1.3

FI1

Completeness stats

Assembly	Region	Found	Total	% Covered
FI1	all	26653235	26743412	99.6628

Consensus quality QV stats

Assembly	No Support	Total	QV	Error %
FI1	3468	35023530	53.2648	4.71542e-06

Spectra-asm

FI1 spectra-cn