Description
Dear Alicia,
I am currently running analyses with NucleoATAC on my ATAC-seq data.
After several QC, it seems to me my ATAC-libraries were of reasonable quality. But since it is the first time i'am working on this kind of data, I want to check twice :-)
Could you tell me please what is your feeling about the NFR profile attached ?
Compared to the Figure S4 of the NucleoATAC paper, it looks pretty similar, with good enrichment values and a good profiling of the double helix related waves.
Nevertheless, for fragment longer than 100bp the observed distribution of fragments length looks definitely more flat, while I know from Bioanalyzer profile I managed to get a really good enrichment of mononucleosomes. What did I do wrong ? Maybe the peaks I called with MACS2 are too much noisy ?
I also noticed a drop at around 90bp in the genome-wide distribution of fragment length. This is further confirmed by the Bionalyzer profile but I didn't manage to explain why my library is depleted in these specific length.
Thanks a lot for your advice.
++
Pef
Here is the profile showing the genome-wide distribution of fragment length :
Activity
AliciaSchep commentedon Oct 9, 2015
The drop at 90 bp is indeed strange. What is the read length in the sequencing? I am guessing it is 100bp (NucleoATAC subtracts 8 from the fragment length to get the transposition center-to-center distance) and that there might be some artifact with how the aligner deals with fragments where the forward read and reverse read are exact reverse complements of each other. I know that Bowtie (1 but not 2) has some issue with those reads. Given the very specific drop to 0 and the fact that it seems to coincide potentially with sequencing read length
NucleoATAC will look at the fragment size distribution within the peaks that are called, thus the fragment size distribution may be a bit different from the global distribution. Are you using broad peaks? I would recommend either using broad peaks or extending the narrow peaks somewhat so that the nucleosomes flanking a nucleosome free region are captured.
It is not necessarily that anything is wrong with your sample. Some samples have more mono-nucleosome spanning fragments than others, and it does not always correlate well with how much enrichment of fragments are observed in open chromatin regions. Trying to optimize for both a lot ofnucleosome-spanning fragments and good enrichment in open chromatin regions can be difficult. However, with less fragments spanning mono-nucleosomes it is harder to call nucleosome positions, especially if sequencing is not very deep.
PFRoux commentedon Oct 12, 2015
Dear Alicia,
Thanks a lot for your answer. Indeed, I am working with 100bp PE libraries.
Trying to explore your hypothesis (an artifact related to the aligner) I've tried several mapping approaches using different sets of parameters with Bowtie2.
Actually the profiles I've uploaded above the other day were generated after alignement with Bowtie2 2.2.3 using default parameters, except the maximum insert size (-X) set to 2000bp. I've just finished another round of analysis adding the --dovetail option to handle potential cases with the mates seemingly extending "past" each other (which, I suppose, could arise after trimming in the case of ATAC libraries). And here is the fragment length distribution profile I get (for same library as for the profile above).
Now I've still got this depletion of fragment around 90bp in length, but I've got now an enrichment of fragment around 91 bp !
May I ask you what were the parameters you used for the alignment with Bowtie2 for the NucleoATAC paper ? And what was the version of Bowtie2 you used ? This is now quite clear to me that artifacts in the mapping are generating this atypical fragment length distribution profile. And your advices could definitely help me to figure out what is happening and how to solve this issue.
Many thanks !
Pef
PFRoux commentedon Oct 13, 2015
Dear Alicia,
I've ran additional analysis, including a mapping with BWA.
And it confirms that the issue I'am facing is related to the way Bowtie2 deals with fragments where the forward read and reverse read are exact reverse complements of each other.
Using BWA and Picard CollectInsertSizeMetrics here is the profile I get.

My problem now is that NucleoATAC is not properly working using PE .bam files generated with BWA.
Is it expected ?
Do you have additional advices to help me ?
Many thanks again,
Pef
AliciaSchep commentedon Oct 13, 2015
Hi Pef,
We avoid the issue with Bowtie2 by the behavior of our adapter trimmer. Our adapter trimmer looks for whether forward and reverse reads are reverse complements and uses that as basis of trimming adapter. The script removes 1 extra base, even in cases of no adapter itself being trimmed if the forward and reverse reads are perfect reverse complements. One way to adjust this problem might be to trim a base pair at end of all reads after adapter trimming.
The issue with the BWA alignment is that there are some clipped alignments that lead to fragments less than 8 bp. I had not expected such fragments from atac-seq (should not really be possible from Nextera; definitely artifacts), so there was an issue that fragments that short cause an error. I have fixed this bug in the latest code. You can try uninstalling nucleoatac, pulling the new code from the repository, reinstalling. Hopefully that bug will be fixed. Alternatively, filtering out fragments less than 9 bp should fix issue; really fragments shorter than 30 bp or so should probably be removed as those are likely artifacts as shown by Adey et al. in original Nextera sequencing paper.
-Alicia
PFRoux commentedon Oct 14, 2015
Hi Alicia,
Many thanks for your precious advice. I suspected that the alignment with Bowtie2 was the main problem, by non properly dealing with fragments where the forward read and reverse read are exact reverse complements of each other. And I was thinking about a solution to by-pass this issue. But the one you proposed sounds faster and easier to implement.
After a few tests on a subset of my raw sequences, it seems that trimming a base pair at the end of all reads after adapter trimming did the job pretty well. I will see what happen on the global data-set and let you know, but I am confident.
Thanks a lot !
Pef
PFRoux commentedon Oct 15, 2015
Hi Alicia,
Just to let you know that it worked perfectly.
Thanks a lot again,
Pef
AliciaSchep commentedon Oct 15, 2015
Great, glad I could help!