Skip to content

The bam file does not contain cell and umi barcodes appropriately formatted.(use SRP129388 human week 10 fetal forebrain dataset in your paper ) #185

Open
@songxiaoning

Description

@songxiaoning

Hi i'm using the human week 10 fetal forebrain dataset(SRP129388) in your paper to do the velocyto. I hava downloaded SRR6470906.sra and SRR6470907.sra and use fastq-dump --split-3 --gzip to get the fastq. And there is one fastq file after fastq-dump named SRR6470906.fastq.gz and SRR6470907.fastq.gz. I renamed them as SRR6470906_S1_L001_R1_001.fastq.gz and SRR6470907_S1_L001_R1_001.fastq.gz. Then i use cellranger to get the output. My code is

cellranger count --localcores=12 --id=SRR6470906_output --transcriptome=/data/cellranger-hg19/refdata-cellranger-hg19-3.0.0 --fastqs=./ --sample=SRR6470906

Finally i use velocyto to run the 10 × data.

/opt/python3.7/bin/velocyto run10x  SRR6470906_output /data1/database/GTF/gencode.v27lift37.chr_hg19_annotation.gtf

But the error is like:

Traceback (most recent call last):
  File "/opt/python3.7/bin/velocyto", line 11, in <module>
    load_entry_point('velocyto==0.17.17', 'console_scripts', 'velocyto')()
  File "/opt/python3.7/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/python3.7/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/python3.7/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/python3.7/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/python3.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/python3.7/lib/python3.7/site-packages/velocyto/commands/run10x.py", line 115, in run10x
    samtools_memory=samtools_memory, dump=dump, loom_numeric_dtype=dtype, verbose=verbose, additional_ca=additional_ca)
  File "/opt/python3.7/lib/python3.7/site-packages/velocyto/commands/_run.py", line 159, in _run
    exincounter.peek(bamfile[0])
  File "/opt/python3.7/lib/python3.7/site-packages/velocyto/counter.py", line 158, in peek
    raise IOError("The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.")
OSError: The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag.
Usage: velocyto run10x [OPTIONS] SAMPLEFOLDER GTFFILE
Try "velocyto run10x --help" for help.

The bam file in SRR6470906_output is here:

SRR6470906.238819510	272	1	10066	1	57M	*	0	0	CTACCCCTACCCCTACCCCTACCCCAACCCCTAACCCTAACCCAACCCTAACCCTAA	<...A<<..<G<<.<.G<..<.G<..<.G<..<.GG<...<G.GGIIGGGGIGGGGG	NH:i:3	HI:i:2	AS:i:44	nM:i:6	RE:A:I	CR:Z:AAGCAGTGGTATCAAC	CY:Z:GGAAGGGIGGGGGGGG	UR:Z:GCAGAGTACA	UY:Z:GGIGGGGGGG	UB:Z:GCAGAGTACA	RG:Z:SRR6470906_output:MissingLibrary:1::

I check my coding and i really don't know what's happening. Why the bam don't have cell and umi barcodes. Can you help me. Thanks a lot!

Activity

gioelelm

gioelelm commented on Apr 1, 2019

@gioelelm
Member
songxiaoning

songxiaoning commented on Apr 1, 2019

@songxiaoning
Author

@gioelelm Thanks for your reply. But I don't think so because the log shows that velocyto looks for 5000 entrys of the bam file:

2019-03-26 09:52:24,693 - WARNING - Not found cell and umi barcode in entry 0 of the bam file
2019-03-26 09:52:24,693 - WARNING - Not found cell and umi barcode in entry 1 of the bam file
2019-03-26 09:52:24,693 - WARNING - Not found cell and umi barcode in entry 2 of the bam file
2019-03-26 09:52:24,693 - WARNING - Not found cell and umi barcode in entry 3 of the bam file
2019-03-26 09:52:24,693 - WARNING - Not found cell and umi barcode in entry 4 of the bam file
2019-03-26 09:52:24,694 - WARNING - Not found cell and umi barcode in entry 5 of the bam file
.....
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4994 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4995 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4996 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4997 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4998 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 4999 of the bam file
2019-03-26 09:52:25,056 - WARNING - Not found cell and umi barcode in entry 5000 of the bam file

I use cellranger --version (2.1.1) and velocyto, version 0.17.17. The output of cell ranger is:

_cmdline   _finalstate              _invocation  _log        outs   SC_RNA_COUNTER_CS  SRR6470906_output.mri.tgz  _timestamp  _vdrkill  _versions
_filelist  _finalstate._truncated_  _jobmode     _mrosource  _perf  _sitecheck         _tags                      _uuid       

The outs file contains:

analysis       filtered_gene_bc_matrices        metrics_summary.csv  possorted_genome_bam.bam      raw_gene_bc_matrices        web_summary.html
cloupe.cloupe  filtered_gene_bc_matrices_h5.h5  molecule_info.h5     possorted_genome_bam.bam.bai  raw_gene_bc_matrices_h5.h5

And the possorted_genome_bam.bam looks:

SRR6470906.238819510	272	1	10066	1	57M	*	0	0	CTACCCCTACCCCTACCCCTACCCCAACCCCTAACCCTAACCCAACCCTAACCCTAA	<...A<<..<G<<.<.G<..<.G<..<.G<..<.GG<...<G.GGIIGGGGIGGGGG	NH:i:3	HI:i:2	AS:i:44	nM:i:6	RE:A:I	CR:Z:AAGCAGTGGTATCAAC	CY:Z:GGAAGGGIGGGGGGGG	UR:Z:GCAGAGTACA	UY:Z:GGIGGGGGGG	UB:Z:GCAGAGTACA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.213712499	272	1	10178	0	57M	*	0	0	CCTACCCCTACCCCCAACCCTACCCCAAACCCTAACCCTAACCCTAACCCTAACCCC	G<.<.GGA<A.GA<.<.<.<<<.A<<.<.GA..<..A<..<AG<.A.GG<.A.AGG<	NH:i:8	HI:i:2	AS:i:46	nM:i:5	RE:A:I	CR:Z:GGTTCGGGTTCGGGTT	CY:Z:GA....AA.....A..	UR:Z:CGGGTTCGGG	UY:Z:..AGAA.<AG	UB:Z:CGGGTTCGGG	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.169244430	272	1	10263	1	7S50M	*	0	0	CACTAACACCCTAACCCTAACCCTAACCCTACCCCCAACCCCAACCCCAACCCCAAC	..<..<.GGA..A<.AA.GAAG<.GA<A<.<..<..A<.IGGGAGGGGG<.A.A..A	NH:i:3	HI:i:3	AS:i:47	nM:i:1	RE:A:I	CR:Z:GAGGGTGAGGGTAGGG	CY:Z:AAGGGIGGGGGAGGGI	UR:Z:TTAGGGGTTA	UY:Z:GGIGGGG<G<	UB:Z:TTAGGGGTTA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.20440046	272	1	10265	1	57M	*	0	0	CCTAACCCTAACCCTAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCC	GGG<.GGG..<GGG.GGGGGAGGGGGGG<IGGGGGGGGGGGIGGGGGGGIIGGIIIG	NH:i:3	HI:i:3	AS:i:56	nM:i:0	RE:A:I	CR:Z:AAGCAGTGGTATCAAC	CY:Z:GGAAGGIGGGGGGGGG	UR:Z:GCAGAGTACA	UY:Z:GGGGGGAGGG	UB:Z:GCAGAGTACA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.3	272	1	11679	1	57M	*	0	0	CCTGGAGATTCTTATTAGTGATTTGGACTGGGGCCTGGCCATGTGTATTTTTTTAAA	GIIIIIGIIIGGIIIGGGIIIIGGIIGIIIGIIIGIGGGGIIIIIIIIIIGIGIIII	NH:i:3	HI:i:3	AS:i:54	nM:i:1	RE:A:I	CR:Z:CATTCTCAACACCGGC	CY:Z:GGGGGIIGIIIIIGGA	UR:Z:CATGCAGCAA	UY:Z:GGGGGGIGGI	UB:Z:CATGCAGCAA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.130137451	272	1	11794	0	57M	*	0	0	GACTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCA	GIGIIIIIIIIIIGGGGIIIGGIGIGIIIGGGGIIGIGGIIGIGGGGGGGGGIGGGG	NH:i:6	HI:i:5	AS:i:54	nM:i:1	RE:A:I	CR:Z:GAAAAGGCTGACGGCA	CY:Z:GGGAGIGGGGIGGAGG	UR:Z:AGTTAACAAA	UY:Z:II<GGGGGGG	UB:Z:AGTTAACAAA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.130137450	272	1	11797	0	57M	*	0	0	TTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTC	GGGGGGGGGIIGGGIGIIIGIIGIIGIIIGIGIIIGGIIGIIIIGGIGGIIIGGIGG	NH:i:6	HI:i:5	AS:i:56	nM:i:0	RE:A:I	CR:Z:AAAGAAAAGGCTGACG	CY:Z:GGGAGIGGGGGGGGIG	UR:Z:GCAAGTTAAC	UY:Z:IIIGGIIIII	UB:Z:GCAAGTTAAC	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.47996069	16	1	11810	0	57M	*	0	0	CCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTT	GGGGIIIIIIGIGIIGGIIGGGIGIIGIGIIGIIGIGGIIIIGIIGGGGGIGIIGIG	NH:i:6	HI:i:1	AS:i:56	nM:i:0	RE:A:I	CR:Z:GAAAGAAGAGGTCAAA	CY:Z:GGGGGGIGIGGGGGGI	UR:Z:GAAAAGGCTG	UY:Z:IGGGGGGIII	UB:Z:GAAAAGGCTG	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.4	272	1	11838	1	57M	*	0	0	GCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAGCCTTTTCTTT	GGGIIGGGIIGIIGGGAGGIIIIGGIIGIGGGGGGGGIGGIIGIIIGGGGIGGGGGG	NH:i:3	HI:i:3	AS:i:56	nM:i:0	RE:A:I	CR:Z:GCTAAGAGACAGCAAA	CY:Z:GAAGGIIIGIIIGGGG	UR:Z:TACACATGAA	UY:Z:GGGIGGIIII	UB:Z:TACACATGAA	RG:Z:SRR6470906_output:MissingLibrary:1::
SRR6470906.5	272	1	11844	1	57M	*	0	0	GGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAGCCTTTTCTTTGACCTC	IIIIIIIIIIGIIIGIIGGIIIIGGIIIGGIIIIIIIIIIIGIGIIIIIGIIIIGII	NH:i:3	HI:i:3	AS:i:56	nM:i:0	RE:A:I	CR:Z:GTCTGGGCTAAGAGAC	CY:Z:GGAGGGGGGIGGIGII	UR:Z:AGCAAATACA	UY:Z:IIIIGGIIII	UB:Z:AGCAAATACA	RG:Z:SRR6470906_output:MissingLibrary:1::

Always the 10 ×data contains R1 and R2 fastq but this only contain 1 fastq after fastq-dump. Is this something wrong?

gioelelm

gioelelm commented on Apr 1, 2019

@gioelelm
Member
songxiaoning

songxiaoning commented on Apr 2, 2019

@songxiaoning
Author

@gioelelm Thanks. I have downloaded the bam file but velocyto requires a output file as this

velocyto run10x  SRR6470906_output chr_hg19_annotation.gtf

I still can't run velocyto only have the bam file. Could you please update the sra file?
Thanks a lot for your help.

gioelelm

gioelelm commented on Apr 2, 2019

@gioelelm
Member
songxiaoning

songxiaoning commented on Apr 2, 2019

@songxiaoning
Author

@gioelelm Thanks! It works with velocyto run. But still i need the counts matrix for pagoda for clustering……

gioelelm

gioelelm commented on Apr 2, 2019

@gioelelm
Member
songxiaoning

songxiaoning commented on Apr 2, 2019

@songxiaoning
Author

@gioelelm I mean the matirx which contains the UMI counts of every barcode. Then i can use that for clustering.

gioelelm

gioelelm commented on Apr 2, 2019

@gioelelm
Member
songxiaoning

songxiaoning commented on Apr 2, 2019

@songxiaoning
Author

@gioelelm Yes and thanks!

shaaaarpy

shaaaarpy commented on Mar 16, 2020

@shaaaarpy

@songxiaoning can you please tell you machine configurations, as i also have 20gb bam file like yours, and my velocyto run command gets an error message killed while creating loom file.
Also if i sort my bam file, i get umi not found error.
Thank you

denvercal1234GitHub

denvercal1234GitHub commented on Nov 15, 2021

@denvercal1234GitHub

@shaaaarpy -- Did you resolve this memory issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @gioelelm@shaaaarpy@songxiaoning@denvercal1234GitHub

        Issue actions

          The bam file does not contain cell and umi barcodes appropriately formatted.(use SRP129388 human week 10 fetal forebrain dataset in your paper ) · Issue #185 · velocyto-team/velocyto.py