Skip to content

custom de novo library failing - batch failures and "Comparison failed. Retrying with larger minmatch (10)" #124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kcastellano13 opened this issue Aug 13, 2021 · 4 comments
Labels

Comments

@kcastellano13
Copy link

Hi,
I am having trouble getting RepeatMasker to run with a custom de novo library built by combining repeats from multiple programs (RepeatModeler plus a few others). I keep getting batch failures and "WARNING: Comparison failed. Retrying with larger minmatch (10)" (see error logs below) . It always fails on different batches but I have pulled out some of them and nothing looks off to me. I built the library with the same method for the genome of a sister species and it ran with no problem. This genome is slightly larger (900Mb genome size) but less fragmented than the sister species. I should mention that both have a high repeat content (both ~68% with the RepeatModeler de novo library, the sister species is ~80% when masked with my custom de novo library).
For troubleshooting so far I have: 1) cut the headers down to < 50 characters (I classified with repeatclassifer so all headers have a classification) 2) split the genome and run it on ~200 sequences 3) used the flag "-frag 1000" on the full genome and one of the split files - all of which did not work. I was able to run it successfully on one contig and I was able to run RepeatMasker successfully on this genome with the de novo library from RepeatModeler only but I need to mask with my custom library. I attached the full error logs from my most recent run. Any help would be greatly appreciated!
Kate
combinedLib_split0_3234109.out.txt
combinedLib_split0_3234109.err.txt

@jebrosen
Copy link
Member

Hi, sorry to hear you are having this issue.

To get some additional information about the error that happened, could you re-run one of those commands listed for the "engine parameters" in a failed run - and keep the full output in a file? For example,

.../cross_match -alignments -gap_init -30 -ins_gap_ext -6 -del_gap_ext -5 -minmatch 10 -minscore 225 -bandwidth 14 -masklevel 101 -matrix .../20p39g.matrix .../...batch-25.masked .../...library.fa.classified >cm_output.txt 2>&1

The cm_output.txt file should have a more detailed error, that may solve the problem or at least point in the right direction.

This problem might be specific to cross_match; another search engine may work on this particular file if that is an option for you.

@kcastellano13
Copy link
Author

Hi Jeb,

Thanks for responding so quickly! I attached the cm_output.txt file for you to see. It does look like an issue with crossmatch where it is getting a score discrepancy for some reason. So, I tried rmblast as the search engine on one of my split files with and without the -frag 1000 flag and both completed successfully so I think that was the problem and I should be okay moving forward.

Thank you again!
cm_output.txt

@jebrosen jebrosen added the bug label Aug 18, 2021
@jebrosen
Copy link
Member

That is strange. I suspect either a bug in cross_match, or a misleading error message for input data it can't accept for some reason. Unfortunately it looks like the issue might be pretty deep in one of the underlying algorithms. If you are willing and able to provide us with the batch-25.masked and library.fa.classified files (attached, or via email to help@repeatmasker.org), we and/or cross_match may be able to find or troubleshoot the problem more specifically.

Either way, I am glad to hear that RMBlast worked!

@huangxiaoyun1123
Copy link

I'm having the same problem, how can I fix it,thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants