Skip to content

Paired reads have different names #228

Closed
@MichaelVirology

Description

@MichaelVirology

Hi,

I encountered the issue about "paired reads have different names" in some of my sequencing data. The data are PE reads generated from MiSeq.

The bwa commands I used were as follows:

$ bwa index ref.fasta
$ bwa mem ref.fasta read1.fastq read2.fastq -v 2 > map.sam

It terminated prematurely with the error message:

[M::mem_pestat] several lines of information...
[mem_sam_pe] paired reads have different names: "MISEQ-Sample1:1:2106:12181:2146", "MISEQ-Sample1:1:2106:12181:21461"

bwa mem mapping failed.
Problem mapping with bwa mem.
Problem mapping to the reference in ref.fasta. Quitting.

I used grep to show a few lines of the input fastq files:

$ grep -n -A 3 MISEQ-Sample1:1:2106:12181:2146 read1.fastq 
5578609:@MISEQ-Sample1:1:2106:12181:2146/1
5578610-ATGCTGCAATTATAAGAGAGGTTGAGATTATCATTGCCAAAACTGATAGTGCTATTTGTGCTATAGATTTTAAATTTAATTTGTATAAACAAGAGGATATTACAATGAGATGATTAAGAGTATCCCAGGTCTTTTCTAGAGTCCCGGCAGTGCGTTGATTCTTGTTTTTGGACATTGTTGCATTTGCCCCCCCCAGATCGGAGAGCACACGTCTGAACTCCAGTCACTCGCCACAATCTCGTATGCCGTCTTCTGCTTGAAAAAA
5578611-+
5578612-CCCCCGGGGGGGGGGGGFGEGGGGGGGGGGGGGGGGGGGGGGGFGGFGGFFFGGGFFGFEGGGGGGGFGFGGEGGGGGGGGGGFGGCGGGGGGGGGGGGGGGGGGGGGFGGGGGFFGGGGGGCGFGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGG>8CFGGDGCDFE:E=FGGGG1>FCEGGGG
--
5771065:@MISEQ-Sample1:1:2106:12181:21461/1
5771066-GCTGGCTTGTTGTTCTGTGTTGGAGTAGAGGTTGTGCTTTTGGTTTGTGCTGTTGTATGGTGTGTTTCTGATTTTGTATTGGGTGATATTGTGGCTGAGTTTGTGTGGATTGGTGGTGTGGCTGTGGGTTGTTCGGATGGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGCCACAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACAAACACATAATATAAATACCACTGTGTCATCTGTTAGATGCAA
5771067-+
5771068-CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@FGGGGGGGGGGEGGGGGGGGGGEGCGGGGGGG8:12***2**+2+2+++2++++22+25++2++++++3+0+30++3+

$ grep -n -A 3 MISEQ-Sample1:1:2106:12181:2146 read2.fastq 
5578609:@MISEQ-Sample1:1:2106:12181:2146/2
5578610-GGGGGGGGCAAATGCAACAATGTCCAAAAACAAGAATCAACGCACTGCCGGGACTCTAGAAAAGACCTGGGATACTCTTAATCATCTCATTGTAATATCCTCTTGTTTATACAAATTAAATTTAAAATCTATAGCACAAATAGCACTATCAGTTTTGGCAATGATAATCTCAACCTCTCTTATAATTGCAGCATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAGGACGGTGTAGAGCCCGGTGGTCGCCGTGGCAGTA
5578611-+
5578612-CCCCCGGGGGGGGGGGGGGGGGGFGGFFGGGFGGGGGGGGFGGGGGGFGCEGGGFFFGGGFGCFFGGGGDFGGGFGGDGGGDFGFG?FFFFGGGGGFCGGGGGGGGCFGGGGGGFGGGGGGGGGGGFGGGGGGFFGGGGFEFFFGFGGGGGGGGGGFEGGGCGFFGGGFFFGGGGGGGGGGGGGGGGGGGGGGGFFCGGGGGGGGGGGGGC?*2CGGGGGFGC*02<CDECC097E3**2<**2**/:8DC**85)/./)0.*8*
--
5771065:@MISEQ-Sample1:1:2106:12181:21461/2
5771066-GGCCCATCCGAACAACCCACAGCCACACCACCAATCCACACAAACTCAGCCACAATATCACCCAATACAAAATCAGAAACACACCATACAACAGCACAAACCAAAAGCACAACCTCTACTCCAACACAGAACAACAAGCCAGCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAGGACGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAACCACAACACCACAACAAACCAAGCACCGGACTAACACC
5771067-+
5771068-CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCFGGGGGGGGFGGGGFGGGGGGGGGGGGGGGGGFGGCGGGGGGFGGGGGEF@EFGGGFFGGGGCGCCFGFGG>@CC:EECEE?8CCFGGGCFGGGEECE*2*:****1*/9C***1*****1*/.9***)*)1.)2*)*6<

It seems to me that bwa is unable to differentiate MISEQ-Sample1:1:2106:12181:2146 and MISEQ-Sample1:1:2106:12181:21461, where only the last digit was different.

I tried to modified the read names from MISEQ-Sample1:1:2106:12181:21461 to MISEQ-Sample1:1:2106:12181:21463, and it terminated again with the same error but different reads:

[mem_sam_pe] paired reads have different names: "MISEQ-Sample1:1:1108:19211:1173", "MISEQ-Sample1:1:1108:19211:11731"

bwa mem mapping failed.
Problem mapping with bwa mem.
Problem mapping to the reference in ref.fasta. Quitting.

I thought this might be an issue. Could you please help look into it?

Many thanks,
Michael

Activity

qiaowei-vvjoe

qiaowei-vvjoe commented on Sep 22, 2019

@qiaowei-vvjoe

Hi @MichaelVirology
Have you fixed your problem?
I think I meet the same problem as you.
nohup ~/software/bwa-0.7.17/bwa mem ~/reference/hg38/hg38bwaidx E1_input.fq.gz E1_pulldown.fq.gz 1>E1.sam 2>E1.bwa.align.log &
and it shows

nohup: ignoring input
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 200000 sequences (10000000 bp)...
[M::process] read 200000 sequences (10000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 0, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "CL100056099L2C001R002_24", "CL100056099L2C001R002_61"

also I checked the error reads
$ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24"
paste: paste: No such file or directory
$ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24"
@CL100056099L2C001R002_24
$ paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_24"
@CL100056099L2C001R002_2406

$ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61"
paste: paste: No such file or directory
$ paste <(gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61"
@CL100056099L2C001R002_615
$ paste <(gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1) | grep -m 1 -F "CL100056099L2C001R002_61"
@CL100056099L2C001R002_61

It seems the reads don't have same read name in the r1 and r2 fastqs. and here https://www.biostars.org/p/239535/, they say the data wouldn't be trust.Do there any other fixed methods?Thanks so much for any help.

tomaskopsa

tomaskopsa commented on Sep 24, 2019

@tomaskopsa

If bwa is terminated with this error, it usually signifies that pair reads in both FASTQ files are disordered.

From BBTools Repair Guide:
With paired reads in 2 files, the first read in file 1 must be the mate of the first read in file 2, etc. For paired reads in a single interleaved file, the second read is the mate of the first read, and the 4th read is the mate of the 3rd read, etc.

There are few solutions which worked for me.

  1. Sort FASTQ files
  • BBTools repair.sh is supposed to do that, but it seems it cannot fix large datasets as it loads whole file to RAM.
  • FASTQ-SORT is way better for large FASTQs. It buffers RAM on hard drive, it successfully sorted ~ 150GB file.
  1. Use another aligner
  • Bowtie 2 doesn't require the condition of mates in the same order, but while aligning disordered FASTQs with Bowtie 2 (as of my experience) you get less mapped reads.
qiaowei-vvjoe

qiaowei-vvjoe commented on Sep 25, 2019

@qiaowei-vvjoe

@tomaskopsa
Hi Tomas, thank you so much for sharing.
As for me,I have checked the reads:

$ gunzip -c E1_input.fq.gz | paste - - - - | cut -f 1| head
@CL100056099L2C001R002_24
@CL100056099L2C001R002_40
@CL100056099L2C001R002_45
@CL100056099L2C001R002_73
@CL100056099L2C001R002_74
@CL100056099L2C001R002_81
@CL100056099L2C001R002_90
@CL100056099L2C001R002_91
@CL100056099L2C001R002_95
@CL100056099L2C001R002_103
$ gunzip -c E1_pulldown.fq.gz | paste - - - - | cut -f 1| head
@CL100056099L2C001R002_61
@CL100056099L2C001R002_115
@CL100056099L2C001R002_154
@CL100056099L2C001R002_218
@CL100056099L2C001R002_228
@CL100056099L2C001R002_253
@CL100056099L2C001R002_255
@CL100056099L2C001R002_269
@CL100056099L2C001R044_184179
@CL100056099L2C001R002_305

It seems the reads name are totally different so maybe I can't use BBTools repair.sh to sort order.Also I have read
http://seqanswers.com/forums/showthread.php?t=46538
https://www.biostars.org/p/254155/
https://www.biostars.org/p/160701/
https://www.biostars.org/p/176230/
so I guess maybe the disordered reads is caused by dump or trim because it is my PI who gave me these trimmed data.According to the last reference I add -p in my bwa-men alignment:

$nohup ~/software/bwa-0.7.17/bwa mem -M -R "@RG\tID:E1\tSM:E1\tLB:ATACseq\tPL:Illumina" ~/reference/hg38/hg38bwaidx -p E1_input.fq.gz E1_pulldown.fq.gz 1>E1.sam 2>E1.bwa.align.log &

Though the bwa-men can run successfully I am not sure how about the alignment quality, so I also decide to test by bowtie2.Thank you again for sharing your bowtie2 experience.

tomaskopsa

tomaskopsa commented on Sep 26, 2019

@tomaskopsa

Hi @qiaowei-vvjoe
are you sure your FASTQ files contains paired reads ?

qiaowei-vvjoe

qiaowei-vvjoe commented on Sep 26, 2019

@qiaowei-vvjoe

Hi @tomaskopsa
Frankly speaking I am not sure.I just got these fastq from my PI, then I fastqc them showing no adapter.I am not sure how they library and trim, maybe so do my PI. I only have this docx
analysis.docxwhich show their information.Could you please give me some advice about my following analysis?Thank you so much in advance.

nnbuainain

nnbuainain commented on Oct 9, 2019

@nnbuainain

I have the exact same error in my pe-bwa file for only four of 75 samples sequenced in the same batch.

When I run BBtool's repair.sh I end up with all reads in a singleton file and the r1 and r2 fixed files are empty. I've ran these samples before and they worked fine. Anyone know what this BBtools results mean?

Executing jgi.SplitPairsAndSingles [rp, in1=INPA_A1990_Tun_och_fer-READ1.fastq, in2=INPA_A1990_Tun_och_fer-READ2.fastq, out1=INPA_A1990_Tun_och_fer-READ1-fixed.fastq, out2=INPA_A1990_Tun_och_fer-READ2-fixed.fastq, outs=INPA_A1990_Tun_och_fer-singletons-repair.fastq, -Xmx12G]

Set INTERLEAVED to false
Started output stream.

Input: 1937896 reads 281176225 bases.
Result: 1937896 reads (100.00%) 281176225 bases (100.00%)
Pairs: 0 reads (0.00%) 0 bases (0.00%)
Singletons: 1937896 reads (100.00%) 281176225 bases (100.00%)

Time: 14.529 seconds.
Reads Processed: 1937k 133.39k reads/sec
Bases Processed: 281m 19.35m bases/sec

Rohit-Satyam

Rohit-Satyam commented on Jul 10, 2020

@Rohit-Satyam

Use BBmap instead bbtools is no more available.

https://anaconda.org/bioconda/bbmap

Ofsm

Ofsm commented on Oct 20, 2022

@Ofsm

Hi @tomaskopsa could you please let me know the command you used to sort with FASTQ-sort? I'm working with WGS read mapping and several of my raw reads present the same "Paired reads have different names" issue.

Thanks for your reply

elsemikk

elsemikk commented on Feb 23, 2023

@elsemikk

Is there a way to disable bwa from throwing an error when the names of read 1 and read 2 do not match? I am working with a lot of data from the NCBI SRA, and in many cases the reads have names like "HWI-1KL117:327:C6CF1ACXX:8:1101:1319:1990_forward" and "HWI-1KL117:327:C6CF1ACXX:8:1101:1319:1990_reverse". The reads really are correctly paired, it is just that they are named with "_forward" and "_reverse". I'm hoping there is a way of disabling the error, since it would take a lot of computation to go through and strip the "_forward" and "_reverse" tags from hundreds of gigabytes of reads.

jmarshall

jmarshall commented on Feb 23, 2023

@jmarshall
Contributor

Downstream analysis programs will expect paired reads to have identical QNAME values. So something is going to have to strip those _forward and _reverse suffixes.

Your choices would be:

  • Encourage SRA to produce more standard FASTQ files.

  • BWA already strips /1 and /2. So you can patch your local version of BWA to also strip these two suffixes.

  • Write a script to strip them or convert them to /1 and /2. If this is organised as a streaming filter in front of bwa's input, the added load will be trivial.

elsemikk

elsemikk commented on Feb 23, 2023

@elsemikk

Thanks! Good to know, frustrating that the data comes in a non-standard format. Good idea about stripping them while streaming to bwa, I'll do that.

jingydz

jingydz commented on Mar 16, 2023

@jingydz

SRR19880797检查
[mem_sam_pe] paired reads have different names: "SRR19880797.5358018", "SRR19880797.10839728"
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 9982028
[main_samview] truncated file.
Mapping failed

samtools view -h ./SRR19880797/SRR19880797.bam |less +9982028 -SN

(参考)3370行 SRR19880797.1 65 chr8 143932417 60 100M chr22 20819568 0 TGGCGGTCATGTTGGTGTTGCGGTCGCTCCAGTCGAAGCCCACCTCCTCCTCCTCCTTCTCATTCAGCCACATTAGCTCCTTAGTGGCGGTTGCCACAAA FFFGFCFFGFFFGFFFFFFFFFFFGGFFGFFFEFFDFGFFFFFFGFFFFFGFGGFGGFF@GGGFGFFFFFBFFGAFFGFFFGGDAGFGDG+@d9@/?=59 NM:i:1 MD:Z:90C9 AS:i:95 XS:i:23 RG:Z:SRR19880797
(参考)3371行 SRR19880797.1 129 chr22 20819568 60 100M chr8 143932417 0 AGAGGGATTTTCTTCGCAGGGGAGCTTAACAGGGTCTTTCTCCTCTGCTCTTTCCCCAGTAGCCCAGGCCCACCTGAGAGATGCTGGACACACTGCTGGT GFDFFFFFF;FFF9FFFFGFFFBFFFFGFFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFEFFFFFFFFF>FFFGFFFEFFFFFFFF@FFDFFFFEECG: NM:i:0 MD:Z:100 AS:i:100 XS:i:20 RG:Z:SRR19880797

(报错前一行)9982027行 SRR19880797.5023186 81 chr22 22643043 0 100M chr3 126500710 0 AGCGAGGTGACCTGGGCTGAGTCCTGGGAATGGGAAGAGGTGGCAGGAAGGGGATCTGAGGAGGAGAACAGGGGGCCTGGTGGTCTGTGCTTCTTCCCAG FF;AFG@GGFFDGGFFE>DFFFFFFGGFGFGFGFFEFFGFEFGGGFFGGFGGFGFEEGGEGFGFF>GGGFFFFFGGFFFGGFFEGFFGFFFFFFFFFFGG NM:i:0 MD:Z:100 AS:i:100 XS:i:100 RG:Z:SRR19880797
(报错行)9982028行 SRR19880797.5023186 161 chr3 126500710 60 100M chr22 22643043 0 TCCTTGAACACAGCAGGGTTGGAGGCCATGAGGCTCTGGGCCTCCGTGAAGCTGAGCTGCACAGGGTAGTAGCCGCCATTGAACGGGTTGTGGCAGGATG FFFFFGDFFGFFEFFFFFFEFF@FFFFFEFFFFFFFFFFFFFGFDGFGF;FFGGEGFFGEFFFFFFFF>FFFFFFEFFFFGFFDFFG@FF<FFFDF=@fg NM:i:0 MD:Z:100 AS:i:100 XS:i:0 RG:Z:SRR19880797

less SRR19880797_sort.1.fastp.fastq.gz
(参考)@SRR19880797.5023185 5023185/1
CTGTGGCCCTGTGCCAAACCTGGAGCAGCTGCCTTTAGAGGCCAGGAGGGCTACTTCCCGTTTCCTGAGCACTGTCCCTCTGTCTGCAGGAGTGCTGCTG
+
FF@FFFFFFFFGFFFFFFFGAFFEFFFFFFGFFFFGCFFGFEFGGF>GGFFFFGGFFGFFFGGBGGFGGGFFGGFFGFFFBGFGFFFFFDFFFGFC
(报错)@SRR19880797.5023186 5023186/1
CTGGGAAGAAGCACAGACCACCAGGCCCCCTGTTCTCCTCCTCAGATCCCCTTCCTGCCACCTCTTCCCATTCCCAGGACTCAGCCCAGGTCACCTCGCT
+
GGFFFFFFFFFFGFFGEFFGGFFFGGFFFFFGGG>FFGFGEGGEEFGFGGFGGFFGGGFEFGFFEFFGFGFGFGGFFFFFFD>EFFGGDFFGG@GFA;FF
(参考)@SRR19880797.5023187 5023187/1
AGGACACGGTACAAAAGGGCAGCCAGGCAGGGTTGGAAGGTGGGGTCTGAGGGGTTTCCACCTGCCCTCTCCCATCCTTCCAGGTTTTGGCGGCAGATGG
+
F?FFFFGFF/FGFGFF>FFFFFFFFFFFFFFFFFFFFEFFFFEFFD@FFCFFEFFFFGGFFFFDFDFFFGFFFFFFFFFEFFGFBFGFFECFF:DFBFFF

less SRR19880797_sort.2.fastp.fastq.gz
(参考)@SRR19880797.5023185 5023185/2
GGCTGGCCCAGCGCCAGCGTCGGAGCGCCGGCCCCCTCCCCGGGCCGCCCCCACCCAACCAGACCCTCCAGCGCGTGCCACCGGACCTCGTGTCCTAGAC
+
)<;7@CDB1B:AA=DCB3AE;D?>C:5=61?469@19+9*7&&>A'@4;)9&8?>&8E3>76*='(BB,>&<&2EC'4;?=9.4>+5
(报错)@SRR19880797.5023186 5023186/2
TCCTTGAACACAGCAGGGTTGGAGGCCATGAGGCTCTGGGCCTCCGTGAAGCTGAGCTGCACAGGGTAGTAGCCGCCATTGAACGGGTTGTGGCAGGATG
+
FFFFFGDFFGFFEFFFFFFEFF@FFFFFEFFFFFFFFFFFFFGFDGFGF;FFGGEGFFGEFFFFFFFF>FFFFFFEFFFFGFFDFFG@FF<FFFDF=@fg
(参考)@SRR19880797.5023187 5023187/2
AAATTCCACAAGAGGGTCATTAAGTGTGATAGTGGAAATGCCCTAACCTCCACCCTTACTTCTCAAATATTCTAGCTATTGGAGATAAAGTACCATATAC
+
GFFFFFGFF?FGFFFFFFFGFGFGFGFFFFFGFGFFFFFGFFFGFFFFFF>GFFFFFFFFFFFGGFFFGFFFFCEFFGGFGFFFFFFFFFFFFGFGGFFF

what's wrong with my file?

haydenshinn

haydenshinn commented on Aug 8, 2023

@haydenshinn

@jingydz did you ever discover what was wrong with your file? I'm having a similar issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jmarshall@lh3@tomaskopsa@elsemikk@qiaowei-vvjoe

        Issue actions

          Paired reads have different names · Issue #228 · lh3/bwa