-
Notifications
You must be signed in to change notification settings - Fork 58
Add support for SQLite #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Single quotes for strings, CURRENT_TIMESTAMP instead of now(), and INSERT INTO.
Thanks, Nathan! That’s amazing!
I’ll aim to go through this in the next couple of weeks.
In the meantime, if you’d like to share your fork with the group, that would be perfectly fine.
…-Brian
(by iPhone)
On Oct 26, 2017, at 4:06 PM, Nathan T. Weeks ***@***.***> wrote:
PASA is difficult to run on an HPC cluster due to the MySQL requirement. As an alternative to rearchitecting PASA to completely remove the dependency on a relational database, adding support for SQLite could involve much less development effort and provide enough capability/compatibility to run on most such systems (assuming, e.g., that the SQLite database isn't placed on a Lustre file system that has been mounted with noflock to disable file locking).
To get the ball rolling, I've done a minimal SQLite port that at least seems to be enough for run_sample_data.pl to complete without error. To use SQLite, in the pasa_conf/conf.txt file, set:
MYSQLSERVER=SQLite
Then the MYSQLDATABASE parameter is used as a pathname to an SQLite database. A few caveats:
The results of run_sample_data.pl haven't been compared with the results vs. when MySQL is used
Web UI is completely untested
The schema has been minimally translated from MySQL, with a few perhaps-unnecessary CHECK constraints in lieu of (unsupported) unsigned integers in random places.
You can view, comment on, or merge this pull request online at:
#43
Commit Summary
ANSI SQL changes
Initial SQLite support
File Changes
M PerlLib/Ath1_cdnas.pm (4)
M PerlLib/Mysql_connect.pm (9)
M cgi-bin/PerlLib/Ath1_cdnas.pm (4)
M cgi-bin/check_symmetry.dbi (4)
M docs/AnnotationPlugins.txt (2)
M pasa_conf/pasa.CONFIG.template (3)
A schema/cdna_alignment_sqliteschema (460)
M scripts/Annotation_store_preloader.dbi (4)
M scripts/Launch_PASA_pipeline.pl (12)
M scripts/Load_Current_Gene_Annotations.dbi (2)
M scripts/PASA_transcripts_and_assemblies_to_GFF3.dbi (2)
M scripts/alignment_assembly_to_gene_models.dbi (2)
M scripts/assembly_db_loader.dbi (10)
M scripts/assign_clusters_by_gene_intergene_overlap.dbi (4)
M scripts/assign_clusters_by_stringent_alignment_overlap.dbi (4)
M scripts/build_comprehensive_transcriptome.dbi (10)
M scripts/build_comprehensive_transcriptome.tabix.dbi (6)
M scripts/cDNA_annotation_comparer.dbi (18)
M scripts/cDNA_annotation_updater.dbi (4)
M scripts/classify_alt_splice_as_UTR_or_protein.dbi (6)
M scripts/classify_alt_splice_isoforms.dbi (6)
M scripts/classify_alt_splice_isoforms_per_subcluster.dbi (6)
M scripts/compute_gene_coverage_by_incorporated_PASA_assemblies.dbi (2)
M scripts/dump_annot_store.dbi (4)
M scripts/ensure_single_valid_alignment_per_cdna_per_cluster.pl (6)
M scripts/find_alternate_internal_exons.dbi (2)
M scripts/import_custom_alignments.dbi (2)
M scripts/import_spliced_alignments.dbi (2)
M scripts/invalidate_single_exon_ESTs.dbi (2)
M scripts/polyA_site_transcript_mapper.dbi (2)
M scripts/populate_alignments_via_btab.dbi (8)
M scripts/populate_ath1_cdnas.dbi (2)
M scripts/populate_cdna_clusters.dbi (4)
M scripts/populate_mysql_assembly_alignment_field.dbi (2)
M scripts/populate_mysql_assembly_sequence_field.dbi (2)
M scripts/reassign_clusters_via_valid_align_coords.dbi (4)
M scripts/reset_to_prior_to_assembly_build.dbi (2)
M scripts/subcluster_builder.dbi (2)
M scripts/subcluster_loader.dbi (6)
M scripts/update_fli_status.dbi (2)
M scripts/upload_transcript_data.dbi (2)
M scripts/validate_alignments_in_db.dbi (2)
Patch Links:
https://github.com/PASApipeline/PASApipeline/pull/43.patch
https://github.com/PASApipeline/PASApipeline/pull/43.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Use mostly SQLite column type affinities for clarity. Use CHECK constraints to emulate unsigned integers, tinyint(1) that had effectively functioned as booleans, and datetime columns. Use WITHOUT ROWID optimization (https://sqlite.org/withoutrowid.html) where it looks like it should work. Use NUMERIC for most columns instead of FLOAT (REAL) to store values precisely.
OK, will do. Before advertising more broadly, I've done a second pass to the schema to implement a few optimizations (e.g., WITHOUT ROWID) and emulate additional data types with CHECK constraints where it seemed reasonable to do so. I didn't add foreign key constraints, but that might be a good idea for data integrity (e.g., catch software bugs). |
This is brilliant, I've hit the same obstacle on our HPC system due to the requirement for MySQL and was looking into doing this myself until I spotted your pull request. I see it hasn't been merged yet, is that likely to happen any time soon? |
Sorry guys... Best to just use Nathan's fork for now for the sqlite
solution. I've got way too much on my plate right now.
…On Wed, Jan 17, 2018 at 10:43 AM, Liam Gretton ***@***.***> wrote:
This is brilliant, I've hit the same obstacle on our HPC system due to the
requirement for MySQL and was looking into doing this myself until I
spotted your pull request. I see it hasn't been merged yet, is that likely
to happen any time soon?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHMVX5oB4ashIU-1argHReWx0kzAKEjBks5tLcDLgaJpZM4QIJF2>
.
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Nathan,
Maybe we could just add some documentation to the pasa page to point them
to your fork for the sqlite mods. At some point, we can aim to merge
efforts...
Or, if you want direct access to the PASA codebase, I could make that
happen too, but it would require that you field inquiries and take on some
project ownership for it... Happy to discuss further.
best,
~b
On Wed, Jan 17, 2018 at 5:40 PM, Brian Haas <bhaas@broadinstitute.org>
wrote:
… Sorry guys... Best to just use Nathan's fork for now for the sqlite
solution. I've got way too much on my plate right now.
On Wed, Jan 17, 2018 at 10:43 AM, Liam Gretton ***@***.***>
wrote:
> This is brilliant, I've hit the same obstacle on our HPC system due to
> the requirement for MySQL and was looking into doing this myself until I
> spotted your pull request. I see it hasn't been merged yet, is that likely
> to happen any time soon?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#43 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHMVX5oB4ashIU-1argHReWx0kzAKEjBks5tLcDLgaJpZM4QIJF2>
> .
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi @brianjohnhaas , I think this would be more visible in the main repo, though (if you prefer) maybe initially in a feature branch. I'm certainly willing to watch the repo & submitting pull requests "from afar" in response to relevant issues/suggestions; however, if you'd rather consider adding me as a developer (with the understanding that my scope would be somewhat limited), that's fine with me as well. |
Hi Nathan
I've added you as a developer on the pasa project and the documentation
repo. I trust you'll know the best way to move forward on this.
welcome to the PASA team. :-)
~brian
…On Wed, Jan 17, 2018 at 8:45 PM, Nathan T. Weeks ***@***.***> wrote:
Hi @brianjohnhaas <https://github.com/brianjohnhaas> , I think this would
be more visible in the main repo, though (if you prefer) maybe initially in
a feature branch. I'm certainly willing to watch the repo & submitting pull
requests "from afar" in response to relevant issues/suggestions; however,
if you'd rather consider adding me as a developer (with the understanding
that my scope would be somewhat limited), that's fine with me as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHMVX-Yo9dEs00LLvqsqP7yQqWDFu4WXks5tLk3mgaJpZM4QIJF2>
.
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks, Brian---I'll try my best not to mess things up! ;) |
Closed in favor of #60 |
PASA is difficult to run on an HPC cluster due to the MySQL requirement. As an alternative to rearchitecting PASA to completely remove the dependency on a relational database, adding support for SQLite could involve much less development effort and provide enough capability/compatibility to run on most such systems (assuming, e.g., that the SQLite database isn't placed on a Lustre file system that has been mounted with noflock to disable file locking).
To get the ball rolling, I've done a minimal SQLite port that at least seems to be enough for run_sample_data.pl to complete without error. To use SQLite, in the pasa_conf/conf.txt file, set:
Then the MYSQLDATABASE parameter is used as a pathname to an SQLite database. A few caveats: