% of reads unmapped: too short is HUGE

I am mapping several collections with star and I am extremely satisfied with the speed and results, but for some libraries, the number of reads unmapped and flagged as short is huge. as an example, this is one of the log files:

```
Started job on |    Jul 07 14:50:01
                             Started mapping on |   Jul 07 14:52:00
                                    Finished on |   Jul 07 15:00:20
       Mapping speed, Million of reads per hour |   774.76

                          Number of input reads |   107605440
                      Average input read length |   98
                                    UNIQUE READS:
                   Uniquely mapped reads number |   10642081
                        Uniquely mapped reads % |   9.89%
                          Average mapped length |   97.98
                       Number of splices: Total |   1652808
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   175313
                       Number of splices: GC/AG |   24635
                       Number of splices: AT/AC |   14999
               Number of splices: Non-canonical |   1437861
                      Mismatch rate per base, % |   2.07%
                         Deletion rate per base |   0.00%
                        Deletion average length |   1.62
                        Insertion rate per base |   0.02%
                       Insertion average length |   1.90
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   7076476
             % of reads mapped to multiple loci |   6.58%
        Number of reads mapped to too many loci |   1614447
             % of reads mapped to too many loci |   1.50%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   12.33%
                 % of reads unmapped: too short |   62.64%
                     % of reads unmapped: other |   7.06%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

```

in which more than half of collection is considered as short and filtered out. I checked several forums and in some cases they were suggesting to specify the --sjdbOverhang parameter relevant to read size but I have no annotation for this genome and STAR does not accept this parameter with out the gtf file.

my reads are pair-end and the library with shortest length is 2_50 bp with 4kb insert size and as of the above collection is 2_98bp with 20kb insert size.

I also tried the mapping with lower quality score and also higher number of mismatches but seems the reads are filtered out in the initial steps due to read length before mapping or so on~

is there any solution to this problem since I saw many different users are having the same problem with no concrete way around.

I must also say that the library is clean, I also checked for the contamination and low quality sequences.

Thanks a lot


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

% of reads unmapped: too short is HUGE #169

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

% of reads unmapped: too short is HUGE #169

Description

Activity

alexdobin commented on Jul 12, 2016

bostanict commented on Jul 28, 2016

Gig77 commented on Dec 15, 2016

koenvandenberge commented on Mar 23, 2017

Gig77 commented on Mar 23, 2017

colinwxl commented on Apr 12, 2018

alexdobin commented on Apr 17, 2018

skchronicles commented on Jun 8, 2023

katlande commented on Nov 7, 2023

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions