Getting genome assembly data using NCBI Datasets command line tools

The objective of this Notebook is to demonstrate how to use NCBI Datasets command line tools to explore and download genome assembly sequence and metadata.

Getting started

First, we'll download and grant execute permissions for the datasets command line tools. Datasets has two command line tools

  • The datasets tool is used to query and download sequence, annotation and metadata for all domains of life.
  • The dataformat tool is used to convert metadata downloaded from NCBI Datasets from JSON lines format to other formats.
In [1]:
%%bash
printf "Downloading CLI tools...\n"
for app in datasets dataformat
do
    curl --silent --remote-name "https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/${app}"
    chmod +x ${app}
    printf "[size: %s] %s v%s\n" $(du --human-readable ${app}) $(./${app} version)
done
Downloading CLI tools...
[size: 11M] datasets v11.7.0
[size: 13M] dataformat v11.7.0

We'll also download the command line tool jq to parse the datasets JSON Lines data reports into a readable format.

In [1]:
%%bash
curl --silent --location --output jq 'https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64'
chmod +x jq
printf "Downloaded %s" $(./jq --version)
Downloaded jq-1.6

Getting help

To get help in using the tools or any sub-commands specify --help after the command:

In [1]:
!./datasets --help
datasets is a command-line tool that is used to query and download biological sequence data
across all domains of life from NCBI databases.

Refer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.

Usage
  datasets [command]

Data Retrieval Commands
  summary              print a summary of a gene or genome dataset
  download             download a gene, genome or coronavirus dataset as a zip file
  rehydrate            rehydrate a downloaded, dehydrated dataset

Miscellaneous Commands
  completion           generate autocompletion scripts
  version              print the version of this client and exit
  help                 Help about any command

Flags
  -h, --help   help for datasets

Use datasets help <command> for detailed help about a command.

Getting genome metadata

To begin, we'll use the Datasets summary genome command to explore all the available RefSeq genomes for a group of organisms.

Genome summaries can be accessed in four ways:

  • accession: an NCBI Assembly accession
  • organism: an organism or a taxonomical group name
  • taxid: using an NCBI Taxonomy identifier, at any level.
  • BioProject: using an NCBI BioProject accession

In this example, we'll view metadata for all Crustacea genome assemblies using taxon name. Additionally, we'll limit our search to genome annotated by NCBI's RefSeq group using the --refseq flag. To make the JSON output easy to read we'll use the command line parser jq.

In [1]:
!./datasets summary genome taxon Crustacea --refseq | ./jq .
{
  "assemblies": [
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "8160265",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "60912986",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "15723551",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "5579866",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "6762708",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Mar 16, 2020",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Daphnia_magna/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_003990815.1",
        "assembly_category": "representative genome",
        "assembly_level": "Chromosome",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA490418",
                "title": "Daphnia magna strain:SK Genome sequencing and assembly"
              }
            ]
          }
        ],
        "chromosomes": [
          "LG1",
          "LG2",
          "LG3",
          "LG4",
          "LG5",
          "LG6",
          "LG7",
          "LG8",
          "LG9",
          "LG10",
          "Un",
          "MT"
        ],
        "contig_n50": 14466,
        "display_name": "ASM399081v1",
        "estimated_size": "131337948",
        "org": {
          "assembly_counts": {
            "node": 3,
            "subtree": 3
          },
          "key": "35525",
          "parent_tax_id": "6668",
          "rank": "SPECIES",
          "sci_name": "Daphnia magna",
          "sex": "pooled male and female",
          "strain": "SK",
          "tax_id": "35525",
          "title": "Daphnia magna"
        },
        "seq_length": "122937721",
        "submission_date": "2019-01-07"
      }
    },
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "10610954",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "175309680",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "15496064",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "6791917",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "9576030",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Dec 21, 2017",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Eurytemora_affinis/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_000591075.1",
        "assembly_category": "representative genome",
        "assembly_level": "Scaffold",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA203087",
                "parent_accessions": [
                  "PRJNA163973"
                ],
                "title": "Eurytemora affinis strain:Atlantic clade Genome sequencing and assembly"
              },
              {
                "accession": "PRJNA163973",
                "parent_accessions": [
                  "PRJNA163993"
                ],
                "title": "i5k Arthropod Genome Pilot Project"
              },
              {
                "accession": "PRJNA163993",
                "title": "i5k initiative"
              }
            ]
          }
        ],
        "chromosomes": [
          "Un"
        ],
        "contig_n50": 67724,
        "display_name": "Eaff_2.0",
        "estimated_size": "324965786",
        "org": {
          "assembly_counts": {
            "node": 2,
            "subtree": 2
          },
          "key": "88015",
          "parent_tax_id": "88014",
          "rank": "SPECIES",
          "sci_name": "Eurytemora affinis",
          "strain": "Atlantic clade",
          "tax_id": "88015",
          "title": "Eurytemora affinis"
        },
        "seq_length": "389032277",
        "submission_date": "2017-12-12"
      }
    },
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "8126944",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "330164727",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "13113008",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "5472117",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "7246018",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Nov 04, 2020",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Pollicipes_pollicipes/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_011947565.2",
        "assembly_category": "representative genome",
        "assembly_level": "Scaffold",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA614970",
                "parent_accessions": [
                  "PRJNA649812"
                ],
                "title": "Pollicipes pollicipes isolate:AB1234 Genome sequencing and assembly"
              },
              {
                "accession": "PRJNA649812",
                "parent_accessions": [
                  "PRJNA533106"
                ],
                "title": "The Global Invertebrate Genomics Alliance (GIGA) genomes and transcriptomes"
              },
              {
                "accession": "PRJNA533106",
                "title": "Earth BioGenome Project (EBP)"
              }
            ]
          }
        ],
        "chromosomes": [
          "Un"
        ],
        "contig_n50": 109725,
        "display_name": "Ppol_2",
        "estimated_size": "597159620",
        "org": {
          "assembly_counts": {
            "node": 2,
            "subtree": 2
          },
          "isolate": "AB1234",
          "key": "41117",
          "merged_tax_ids": [
            "223993"
          ],
          "parent_tax_id": "36136",
          "rank": "SPECIES",
          "sci_name": "Pollicipes pollicipes",
          "tax_id": "41117",
          "title": "Pollicipes pollicipes"
        },
        "seq_length": "770089732",
        "submission_date": "2020-10-27"
      }
    },
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "11302958",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "828679130",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "18240609",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "7235079",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "8792038",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Nov 19, 2020",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Penaeus_monodon/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_015228065.1",
        "assembly_category": "representative genome",
        "assembly_level": "Chromosome",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA611030",
                "title": "Genomic sequences of Penaeus monodon"
              }
            ]
          }
        ],
        "chromosomes": [
          "1",
          "2",
          "3",
          "4",
          "5",
          "6",
          "7",
          "8",
          "9",
          "10",
          "11",
          "12",
          "13",
          "14",
          "15",
          "16",
          "17",
          "18",
          "19",
          "20",
          "21",
          "22",
          "23",
          "24",
          "25",
          "26",
          "27",
          "28",
          "29",
          "30",
          "31",
          "32",
          "33",
          "34",
          "35",
          "36",
          "37",
          "38",
          "39",
          "40",
          "41",
          "42",
          "43",
          "44",
          "Un",
          "MT"
        ],
        "contig_n50": 45084,
        "display_name": "NSTDA_Pmon_1",
        "estimated_size": "1407275500",
        "org": {
          "assembly_counts": {
            "node": 4,
            "subtree": 4
          },
          "common_name": "black tiger shrimp",
          "isolate": "SGIC_2016",
          "key": "6687",
          "parent_tax_id": "133894",
          "rank": "SPECIES",
          "sci_name": "Penaeus monodon",
          "tax_id": "6687",
          "title": "black tiger shrimp"
        },
        "seq_length": "2394331783",
        "submission_date": "2020-11-05"
      }
    },
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "10526977",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "618319281",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "18152621",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "7313578",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "8491987",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Dec 07, 2018",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Penaeus_vannamei/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_003789085.1",
        "assembly_category": "representative genome",
        "assembly_level": "Scaffold",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA438564",
                "title": "Penaeus vannamei breed:Keihai No. 1 Genome sequencing and assembly"
              }
            ]
          }
        ],
        "chromosomes": [
          "Un",
          "MT"
        ],
        "contig_n50": 86864,
        "display_name": "ASM378908v1",
        "estimated_size": "1534800868",
        "org": {
          "assembly_counts": {
            "node": 3,
            "subtree": 3
          },
          "breed": "Kehai No.1",
          "common_name": "Pacific white shrimp",
          "key": "6689",
          "merged_tax_ids": [
            "583111"
          ],
          "parent_tax_id": "133894",
          "rank": "SPECIES",
          "sci_name": "Penaeus vannamei",
          "sex": "male",
          "tax_id": "6689",
          "title": "Pacific white shrimp"
        },
        "seq_length": "1663565311",
        "submission_date": "2018-11-16"
      }
    },
    {
      "assembly": {
        "annotation_metadata": {
          "file": [
            {
              "estimated_size": "6482984",
              "type": "GENOME_GFF"
            },
            {
              "estimated_size": "249136834",
              "type": "GENOME_GBFF"
            },
            {
              "estimated_size": "15862279",
              "type": "RNA_FASTA"
            },
            {
              "estimated_size": "6535574",
              "type": "PROT_FASTA"
            },
            {
              "estimated_size": "5624279",
              "type": "GENOME_GTF"
            }
          ],
          "name": "NCBI Annotation Release 100",
          "release_date": "Sep 13, 2016",
          "release_number": "100",
          "report_url": "https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Hyalella_azteca/100/",
          "source": "NCBI"
        },
        "assembly_accession": "GCF_000764305.1",
        "assembly_category": "representative genome",
        "assembly_level": "Scaffold",
        "bioproject_lineages": [
          {
            "bioprojects": [
              {
                "accession": "PRJNA243935",
                "parent_accessions": [
                  "PRJNA163973"
                ],
                "title": "Hyalella azteca isolate:HAZT.00-mixed Genome sequencing and assembly"
              },
              {
                "accession": "PRJNA163973",
                "parent_accessions": [
                  "PRJNA163993"
                ],
                "title": "i5k Arthropod Genome Pilot Project"
              },
              {
                "accession": "PRJNA163993",
                "title": "i5k initiative"
              }
            ]
          }
        ],
        "chromosomes": [
          "Un"
        ],
        "contig_n50": 114415,
        "display_name": "Hazt_2.0",
        "estimated_size": "442600481",
        "org": {
          "assembly_counts": {
            "node": 2,
            "subtree": 2
          },
          "isolate": "HAZT.00-mixed",
          "key": "294128",
          "parent_tax_id": "199487",
          "rank": "SPECIES",
          "sci_name": "Hyalella azteca",
          "tax_id": "294128",
          "title": "Hyalella azteca"
        },
        "seq_length": "550885727",
        "submission_date": "2016-07-20"
      }
    }
  ],
  "total_count": 6
}

If you just want to get the count of available RefSeq (GCF) genomes that fall under a particular tax name, use the --refseq flag and set --limit to NONE:

In [1]:
!./datasets summary genome taxon crustacea --refseq --limit NONE
{"total_count":6}

Downloading genome assembly sequence and metadata

In this section, we'll show you how to download a genome data package for one of the Crustacean genomes using the datasets download genome command. Genome data packages can be retrieved in four ways

  • accession: an NCBI Assembly accession
  • organism: an organism or a taxonomical group name
  • taxid: using an NCBI Taxonomy identifier, at any level.
  • BioProject: using an NCBI BioProject accession

The default genome data package includes the following data (when available):

  • genomic sequence (genomic.fna)
  • transcript sequences (rna.fna)
  • protein sequences (protein.faa)
  • annotation in gff3 format (genomic.gff)
  • a data report containing genome assembly and annotation metadata (assembly_data_report.jsonl)
  • a sequence report listing the nucleotide sequences that comprise the genome assembly (sequence_report.jsonl)

In this example, we'll download the Datasets genome package for the Penaeus vannamei reference genome. For the purposes of this demonstration, we will redirect all messages from the datasets command to datasets.log.

In [1]:
!./datasets download genome taxon "penaeus vannamei" --filename pacific_white_shrimp.zip >datasets.log 2>&1
!printf "Downloaded:\n%s" "$(du --human-readable pacific_white_shrimp.zip)"
Downloaded:
901M	pacific_white_shrimp.zip

Converting the Datasets assembly data report to tabular format

The Datasets genome assembly data report can be converted to tabular format using the dataformat tool. In this step, we'll use the help command to view the data fields available for conversion

In [1]:
!./dataformat tsv genome --help
Convert Genome Assembly Data Report into TSV format.

Refer to NCBI's [command line start](https://www.ncbi.nlm.nih.gov/datasets/docs/command-line-start) documentation for information about getting started with the command-line tools.

Usage
  dataformat tsv genome [flags]

Examples
  dataformat tsv genome --inputfile human/ncbi_dataset/data/assembly_data_report.jsonl
  dataformat tsv genome --package human.zip

Flags
      --fields strings     comma-separated list of fields
                               - annotinfo-featcount-gene-non-coding
                               - annotinfo-featcount-gene-other
                               - annotinfo-featcount-gene-protein-coding
                               - annotinfo-featcount-gene-pseudogene
                               - annotinfo-featcount-gene-total
                               - annotinfo-name
                               - annotinfo-release-date
                               - annotinfo-report-url
                               - annotinfo-source
                               - assminfo-bioproject-lineage-accession
                               - assminfo-bioproject-lineage-parent-accession
                               - assminfo-bioproject-lineage-parent-accessions
                               - assminfo-bioproject-lineage-title
                               - assminfo-biosample-accession
                               - assminfo-description
                               - assminfo-genbank-assm-accession
                               - assminfo-level
                               - assminfo-linked-assm
                               - assminfo-name
                               - assminfo-refseq-assm-accession
                               - assminfo-refseq-category
                               - assminfo-sequencing-tech
                               - assminfo-submission-date
                               - assminfo-submitter
                               - assminfo-type
                               - assminfo-ucsc-assm-name
                               - assmstats-contig-l50
                               - assmstats-contig-n50
                               - assmstats-gaps-between-scaffolds-count
                               - assmstats-number-of-component-sequences
                               - assmstats-number-of-contigs
                               - assmstats-number-of-scaffolds
                               - assmstats-scaffold-l50
                               - assmstats-scaffold-n50
                               - assmstats-total-number-of-chromosomes
                               - assmstats-total-sequence-len
                               - assmstats-total-ungapped-len
                               - breed
                               - common-name
                               - cultivar
                               - ecotype
                               - isolate
                               - organelle-assembly-name
                               - organelle-bioproject-accessions
                               - organelle-description
                               - organelle-infraspecific-name
                               - organelle-submitter
                               - organelle-total-seq-length
                               - organism-name
                               - sex
                               - strain
                               - tax-id
                               - wgs-contigs-url
                               - wgs-project-accession
                               - wgs-url
  -h, --help               help for genome
      --inputfile string   input file
      --package string     datasets package (zip archive), inputfile parameter is relative to the root path inside the archive



Global Flags
      --elide-header   Do not output header

Let's look at the catalog inside the package, converting this JSON into an easy-to-read table.

In [1]:
!./dataformat catalog --package pacific_white_shrimp.zip 2>/dev/null | ./jq -r '.assemblies[] | .files[] | [.filePath, .fileType] | @csv'
"GCA_003730335.1/GCA_003730335.1_ASM373033v1_genomic.fna","GENOMIC_NUCLEOTIDE_FASTA"
"GCA_003730335.1/sequence_report.jsonl","SEQUENCE_REPORT"
"GCA_003789085.1/GCA_003789085.1_ASM378908v1_genomic.fna","GENOMIC_NUCLEOTIDE_FASTA"
"GCA_003789085.1/genomic.gff","GFF3"
"GCA_003789085.1/protein.faa","PROTEIN_FASTA"
"GCA_003789085.1/sequence_report.jsonl","SEQUENCE_REPORT"
"GCF_003789085.1/GCF_003789085.1_ASM378908v1_genomic.fna","GENOMIC_NUCLEOTIDE_FASTA"
"GCF_003789085.1/genomic.gff","GFF3"
"GCF_003789085.1/protein.faa","PROTEIN_FASTA"
"GCF_003789085.1/rna.fna","RNA_NUCLEOTIDE_FASTA"
"GCF_003789085.1/sequence_report.jsonl","SEQUENCE_REPORT"
"assembly_data_report.jsonl","DATA_REPORT"

Now we'll use the dataformat tool to convert a default set of data fields into tsv format.

In [1]:
!./dataformat tsv genome --package pacific_white_shrimp.zip --fields assminfo-name,assminfo-refseq-assm-accession,assminfo-genbank-assm-accession,assminfo-refseq-category,assmstats-number-of-contigs,assmstats-number-of-scaffolds
Assembly Name	Assembly RefSeq Accession	Assembly GenBank Accession	Assembly Refseq Dategory	Assembly Stats Number of Contigs	Assembly Stats Number of Scaffolds
ASM373033v1	na	GCA_003730335.1	na	19584	19584
ASM378908v1	GCF_003789085.1	GCA_003789085.1	representative genome	33019	4682
ASM378908v1	GCF_003789085.1	GCA_003789085.1	representative genome	33019	4682

Next, we can list the first 30 FASTA deflines for the ASM378908v1 RefSeq assembly:

In [1]:
!unzip -q -c pacific_white_shrimp.zip ncbi_dataset/data/GCF_003789085.1/GCF_003789085.1_ASM378908v1_genomic.fna | grep --max-count=30 '^>'
>NW_020868286.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1, whole genome shotgun sequence
>NW_020868287.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_10, whole genome shotgun sequence
>NW_020868288.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_100, whole genome shotgun sequence
>NW_020868289.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1000, whole genome shotgun sequence
>NW_020868290.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1001, whole genome shotgun sequence
>NW_020868291.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1002, whole genome shotgun sequence
>NW_020868292.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1003, whole genome shotgun sequence
>NW_020868293.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1004, whole genome shotgun sequence
>NW_020868294.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1005, whole genome shotgun sequence
>NW_020868295.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1006, whole genome shotgun sequence
>NW_020868296.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1007, whole genome shotgun sequence
>NW_020868297.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1008, whole genome shotgun sequence
>NW_020868298.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1009, whole genome shotgun sequence
>NW_020868299.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_101, whole genome shotgun sequence
>NW_020868300.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1010, whole genome shotgun sequence
>NW_020868301.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1011, whole genome shotgun sequence
>NW_020868302.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1012, whole genome shotgun sequence
>NW_020868303.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1013, whole genome shotgun sequence
>NW_020868304.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1014, whole genome shotgun sequence
>NW_020868305.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1015, whole genome shotgun sequence
>NW_020868306.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1016, whole genome shotgun sequence
>NW_020868307.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1017, whole genome shotgun sequence
>NW_020868308.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1018, whole genome shotgun sequence
>NW_020868309.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1019, whole genome shotgun sequence
>NW_020868310.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_102, whole genome shotgun sequence
>NW_020868311.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1020, whole genome shotgun sequence
>NW_020868312.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1021, whole genome shotgun sequence
>NW_020868313.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1022, whole genome shotgun sequence
>NW_020868314.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1023, whole genome shotgun sequence
>NW_020868315.1 Penaeus vannamei breed Kehai No.1 unplaced genomic scaffold, ASM378908v1 LVANscaffold_1024, whole genome shotgun sequence