You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Question
I come from SQLite and I'm trying Hive now, I need to store 50 000+ json objects, are you sure it's ok to leave the box open and keep all of them in RAM?
@leisim what if I split the entries into different files, will a hive box open faster than an sqlite query and what's the reasonable limit for the number of entries per box so that the app would maintain 60 fps?
@VadimOsovsky Splitting the data helps if you don't need all the entries at the same time. There are two problems with too many entries:
RAM usage: Normal boxes keep all keys and values are in memory. Lazy boxes only keys. But 50,000 keys still add up.
CPU: When a box is being opened, all of its entries have to be read and decoded. Your UI will freeze with a huge amount of entries.
will a hive box open faster than an sqlite query
It will depend on the SQLite query. If you query all the entries, Hive will probably be faster. If you have a condition and an index, SQLite might win.
what's the reasonable limit for the number of entries
This obviously depends on your device but I recommend keeping it below 1000 (5000 max) for best performance. In some cases 10,000 entries might work as well.
That being said, SQLite might also have problems too with your huge number of entries. May I ask why you need to store them on the device and can't rely on a backend?
@leisim sure, the use case is that a have a mail app that needs to download all user emails to work offline. In sql I can keep all of them in one table and query them depending on an account and a folder (like inbox) selected. I can create each box per folder but then I would get up to 50 boxes just for emails
I have a similar requirement where almost 30k entities have to be stored. In addition, the entities must be sorted and filtered according to an integer property and must be searchable by a search string.
I have chosen an approach where I use 2 boxes in an isolate. A LazyBox that contains the entities and a Box that contains the "index" as key and the entity ID as value.
A key contains the integer property and the searchable text. (eg: 050_searchabletext)
Those keys are then filtered in a query and the associated entities are read in parallel from the LazyBox.
Some performance measurements (iPhone 5!):
Write: All entities (~4 sec) & index values (~1.3 sec) -> ~ 4.3 sec
Opening the two boxes: ~1.3 sec
Query limited to the first 100 matches + transport to main isolate: ~110 ms
Of course I also tried SQLite using moor. Since it took a long time to write the entities in the isolate, I did not pursue this approach any further.
simc, cconstab, JsonLinesCode, djyuning, rayjasson98 and 5 more
@VadimOsovsky not sure if you have found a solution. But as this is still open I thought I would chime in. We had similar issues in our app, we needed to handle lists containing 100K+ items. We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.
VadimOsovsky, simc, novas1r1, vishalrao8, harlanx and 4 more
@dave-trudes Very interesting approach. I experimented with isolates during development of Hive but for smaller amounts of data, the overhead of transferring data between isolates has been too big.
I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.
We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.
@leisim Our data structure is very nested and contains many other entities - so it is inevitable for me to parse it in an isolate (parsing takes almost 5 sec on iPhone5).
As already mentioned, I also tried sqlite via moor - but in the end you generate a huge number of SQL insert statements that take many times longer to execute than writing to a LazyBox. (Imaging 30k entities, each with at least 10 sub-entities -> 300k SQL statements...)
That is where hive, in the current flutter database world, really shines to me 👏.
The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.
...the overhead of transferring data between isolates has been too big.
I forgot to mention in the previous post that I also invoke the request in the isolate - this way I save the transport of the request response.
Sending back of max 100 entities to the main isolate is negligible in my case.
I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.
I'm afraid that we won't get support for that in the near future. On the one hand, it is easier to implement the isolate approach on all platforms, and on the other hand, web workers also do not support shared memory.
The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.
I would love sharding.
Maybe allow for a sharded SortedLazyBox (say only allows auto-increment keys or something similar) and only loads into memory a shard map with (start_index, end_index)->shard.
@leisim it would really be nice if you could also write basic queries. Is Hives’s internal structure any similar to MongoDB? Loading all 10k entrances into ram to write a where statement is too much in terms of time I think
But maybe sharding or something similiar could be an option
I thought about that too but I did not find a performant solution on how to find the correct storage position of a specific key / index. Do you have an idea or could you elaborate what you had in mind?
Sending back of max 100 entities to the main isolate is negligible in my case.
Unfortunately sending objects is not supported by a dart2js so we would need to encode to binary, send over SendPort and decode on the main isolate. I'm not sure whether this would be worth it.
I'm afraid that we won't get support for that in the near future.
Unfortunately I think you are right. They are working on other things currently.
it would really be nice if you could also write basic queries
Yeah it would be. I've been experimenting for quite some time now. I have not found a solution for something like indices to improve query performance so the queries would work just like List.filter(). Executing them for the first time would be expensive (but reasonable fast for <5000 entries). Listening to queries would be very efficient.
If someone has an idea or a link to a resource of an alternative, that would be very helpful.
@leisim
Yes, I do. I'm happy to draft up a PR (maybe in about 1.5 weeks because I don't have time for it right now, and that's when I have to implement something similar for my app anyways).
Want to open a feature-request and assign it to me?
Activity
simc commentedon Jan 5, 2020
50,000 will be too much for a normal box and maybe even for a lazy box. In your case, I would recommend using SQLite.
VadimOsovsky commentedon Jan 5, 2020
@leisim what if I split the entries into different files, will a hive box open faster than an sqlite query and what's the reasonable limit for the number of entries per box so that the app would maintain 60 fps?
simc commentedon Jan 6, 2020
@VadimOsovsky Splitting the data helps if you don't need all the entries at the same time. There are two problems with too many entries:
RAM usage: Normal boxes keep all keys and values are in memory. Lazy boxes only keys. But 50,000 keys still add up.
CPU: When a box is being opened, all of its entries have to be read and decoded. Your UI will freeze with a huge amount of entries.
It will depend on the SQLite query. If you query all the entries, Hive will probably be faster. If you have a condition and an index, SQLite might win.
This obviously depends on your device but I recommend keeping it below 1000 (5000 max) for best performance. In some cases 10,000 entries might work as well.
That being said, SQLite might also have problems too with your huge number of entries. May I ask why you need to store them on the device and can't rely on a backend?
VadimOsovsky commentedon Jan 6, 2020
@leisim sure, the use case is that a have a mail app that needs to download all user emails to work offline. In sql I can keep all of them in one table and query them depending on an account and a folder (like inbox) selected. I can create each box per folder but then I would get up to 50 boxes just for emails
simc commentedon Jan 6, 2020
I think you should stick to SQLite then for your use case.
dave-trudes commentedon Jan 13, 2020
I have a similar requirement where almost 30k entities have to be stored. In addition, the entities must be sorted and filtered according to an integer property and must be searchable by a search string.
I have chosen an approach where I use 2 boxes in an isolate. A LazyBox that contains the entities and a Box that contains the "index" as key and the entity ID as value.
A key contains the integer property and the searchable text. (eg: 050_searchabletext)
Those keys are then filtered in a query and the associated entities are read in parallel from the LazyBox.
Some performance measurements (iPhone 5!):
Of course I also tried SQLite using moor. Since it took a long time to write the entities in the isolate, I did not pursue this approach any further.
DevonJerothe commentedon Jan 16, 2020
@VadimOsovsky not sure if you have found a solution. But as this is still open I thought I would chime in. We had similar issues in our app, we needed to handle lists containing 100K+ items. We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.
simc commentedon Jan 16, 2020
@dave-trudes Very interesting approach. I experimented with isolates during development of Hive but for smaller amounts of data, the overhead of transferring data between isolates has been too big.
I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.
That is what I would recommend generally.
dave-trudes commentedon Jan 17, 2020
@leisim Our data structure is very nested and contains many other entities - so it is inevitable for me to parse it in an isolate (parsing takes almost 5 sec on iPhone5).
As already mentioned, I also tried sqlite via moor - but in the end you generate a huge number of SQL insert statements that take many times longer to execute than writing to a LazyBox. (Imaging 30k entities, each with at least 10 sub-entities -> 300k SQL statements...)
That is where hive, in the current flutter database world, really shines to me 👏.
The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.
I forgot to mention in the previous post that I also invoke the request in the isolate - this way I save the transport of the request response.
Sending back of max 100 entities to the main isolate is negligible in my case.
I'm afraid that we won't get support for that in the near future. On the one hand, it is easier to implement the isolate approach on all platforms, and on the other hand, web workers also do not support shared memory.
bolasim commentedon Jan 17, 2020
I would love sharding.
Maybe allow for a sharded SortedLazyBox (say only allows auto-increment keys or something similar) and only loads into memory a shard map with (start_index, end_index)->shard.
VadimOsovsky commentedon Jan 18, 2020
@leisim it would really be nice if you could also write basic queries. Is Hives’s internal structure any similar to MongoDB? Loading all 10k entrances into ram to write a where statement is too much in terms of time I think
simc commentedon Jan 18, 2020
@dave-trudes
I thought about that too but I did not find a performant solution on how to find the correct storage position of a specific key / index. Do you have an idea or could you elaborate what you had in mind?
Unfortunately sending objects is not supported by a dart2js so we would need to encode to binary, send over
SendPort
and decode on the main isolate. I'm not sure whether this would be worth it.Unfortunately I think you are right. They are working on other things currently.
@re-bola
Do you have an idea how to implement it?
@VadimOsovsky
Yeah it would be. I've been experimenting for quite some time now. I have not found a solution for something like indices to improve query performance so the queries would work just like
List.filter()
. Executing them for the first time would be expensive (but reasonable fast for <5000 entries). Listening to queries would be very efficient.If someone has an idea or a link to a resource of an alternative, that would be very helpful.
bolasim commentedon Jan 18, 2020
@leisim
Yes, I do. I'm happy to draft up a PR (maybe in about 1.5 weeks because I don't have time for it right now, and that's when I have to implement something similar for my app anyways).
Want to open a feature-request and assign it to me?
simc commentedon Jan 18, 2020
That would be amazing!
I'd rather keep the discussion in a single place (this issue)
galdazbiz commentedon Nov 8, 2022
has this changed in 2 years or still the same?