Skip to content

leaving a box open with tens of thousands of objects open? #170

Open
@VadimOsovsky

Description

@VadimOsovsky

Question
I come from SQLite and I'm trying Hive now, I need to store 50 000+ json objects, are you sure it's ok to leave the box open and keep all of them in RAM?

Activity

simc

simc commented on Jan 5, 2020

@simc
Member

50,000 will be too much for a normal box and maybe even for a lazy box. In your case, I would recommend using SQLite.

VadimOsovsky

VadimOsovsky commented on Jan 5, 2020

@VadimOsovsky
Author

@leisim what if I split the entries into different files, will a hive box open faster than an sqlite query and what's the reasonable limit for the number of entries per box so that the app would maintain 60 fps?

simc

simc commented on Jan 6, 2020

@simc
Member

@VadimOsovsky Splitting the data helps if you don't need all the entries at the same time. There are two problems with too many entries:

  1. RAM usage: Normal boxes keep all keys and values are in memory. Lazy boxes only keys. But 50,000 keys still add up.

  2. CPU: When a box is being opened, all of its entries have to be read and decoded. Your UI will freeze with a huge amount of entries.

will a hive box open faster than an sqlite query

It will depend on the SQLite query. If you query all the entries, Hive will probably be faster. If you have a condition and an index, SQLite might win.

what's the reasonable limit for the number of entries

This obviously depends on your device but I recommend keeping it below 1000 (5000 max) for best performance. In some cases 10,000 entries might work as well.

That being said, SQLite might also have problems too with your huge number of entries. May I ask why you need to store them on the device and can't rely on a backend?

VadimOsovsky

VadimOsovsky commented on Jan 6, 2020

@VadimOsovsky
Author

@leisim sure, the use case is that a have a mail app that needs to download all user emails to work offline. In sql I can keep all of them in one table and query them depending on an account and a folder (like inbox) selected. I can create each box per folder but then I would get up to 50 boxes just for emails

simc

simc commented on Jan 6, 2020

@simc
Member

I think you should stick to SQLite then for your use case.

dave-trudes

dave-trudes commented on Jan 13, 2020

@dave-trudes

I have a similar requirement where almost 30k entities have to be stored. In addition, the entities must be sorted and filtered according to an integer property and must be searchable by a search string.

I have chosen an approach where I use 2 boxes in an isolate. A LazyBox that contains the entities and a Box that contains the "index" as key and the entity ID as value.
A key contains the integer property and the searchable text. (eg: 050_searchabletext)
Those keys are then filtered in a query and the associated entities are read in parallel from the LazyBox.

Some performance measurements (iPhone 5!):

  • Write: All entities (~4 sec) & index values ​​(~1.3 sec) -> ~ 4.3 sec
  • Opening the two boxes: ~1.3 sec
  • Query limited to the first 100 matches + transport to main isolate: ~110 ms

Of course I also tried SQLite using moor. Since it took a long time to write the entities in the isolate, I did not pursue this approach any further.

DevonJerothe

DevonJerothe commented on Jan 16, 2020

@DevonJerothe

@VadimOsovsky not sure if you have found a solution. But as this is still open I thought I would chime in. We had similar issues in our app, we needed to handle lists containing 100K+ items. We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.

simc

simc commented on Jan 16, 2020

@simc
Member

@dave-trudes Very interesting approach. I experimented with isolates during development of Hive but for smaller amounts of data, the overhead of transferring data between isolates has been too big.

I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.

We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.

That is what I would recommend generally.

dave-trudes

dave-trudes commented on Jan 17, 2020

@dave-trudes

@leisim Our data structure is very nested and contains many other entities - so it is inevitable for me to parse it in an isolate (parsing takes almost 5 sec on iPhone5).

As already mentioned, I also tried sqlite via moor - but in the end you generate a huge number of SQL insert statements that take many times longer to execute than writing to a LazyBox. (Imaging 30k entities, each with at least 10 sub-entities -> 300k SQL statements...)
That is where hive, in the current flutter database world, really shines to me 👏.

The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.

...the overhead of transferring data between isolates has been too big.

I forgot to mention in the previous post that I also invoke the request in the isolate - this way I save the transport of the request response.
Sending back of max 100 entities to the main isolate is negligible in my case.

I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.

I'm afraid that we won't get support for that in the near future. On the one hand, it is easier to implement the isolate approach on all platforms, and on the other hand, web workers also do not support shared memory.

bolasim

bolasim commented on Jan 17, 2020

@bolasim

The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.

I would love sharding.
Maybe allow for a sharded SortedLazyBox (say only allows auto-increment keys or something similar) and only loads into memory a shard map with (start_index, end_index)->shard.

VadimOsovsky

VadimOsovsky commented on Jan 18, 2020

@VadimOsovsky
Author

@leisim it would really be nice if you could also write basic queries. Is Hives’s internal structure any similar to MongoDB? Loading all 10k entrances into ram to write a where statement is too much in terms of time I think

simc

simc commented on Jan 18, 2020

@simc
Member

@dave-trudes

But maybe sharding or something similiar could be an option

I thought about that too but I did not find a performant solution on how to find the correct storage position of a specific key / index. Do you have an idea or could you elaborate what you had in mind?

Sending back of max 100 entities to the main isolate is negligible in my case.

Unfortunately sending objects is not supported by a dart2js so we would need to encode to binary, send over SendPort and decode on the main isolate. I'm not sure whether this would be worth it.

I'm afraid that we won't get support for that in the near future.

Unfortunately I think you are right. They are working on other things currently.

@re-bola

Maybe allow for a sharded SortedLazyBox

Do you have an idea how to implement it?

@VadimOsovsky

it would really be nice if you could also write basic queries

Yeah it would be. I've been experimenting for quite some time now. I have not found a solution for something like indices to improve query performance so the queries would work just like List.filter(). Executing them for the first time would be expensive (but reasonable fast for <5000 entries). Listening to queries would be very efficient.

If someone has an idea or a link to a resource of an alternative, that would be very helpful.

bolasim

bolasim commented on Jan 18, 2020

@bolasim

@leisim
Yes, I do. I'm happy to draft up a PR (maybe in about 1.5 weeks because I don't have time for it right now, and that's when I have to implement something similar for my app anyways).

Want to open a feature-request and assign it to me?

simc

simc commented on Jan 18, 2020

@simc
Member

Yes, I do. I'm happy to draft up a PR

That would be amazing!

Want to open a feature-request and assign it to me?

I'd rather keep the discussion in a single place (this issue)

galdazbiz

galdazbiz commented on Nov 8, 2022

@galdazbiz

has this changed in 2 years or still the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @galdazbiz@dave-trudes@bolasim@DevonJerothe@simc

        Issue actions

          leaving a box open with tens of thousands of objects open? · Issue #170 · isar/hive