Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leaving a box open with tens of thousands of objects open? #170

Open
VadimOsovsky opened this issue Jan 5, 2020 · 15 comments
Open

leaving a box open with tens of thousands of objects open? #170

VadimOsovsky opened this issue Jan 5, 2020 · 15 comments
Labels
question Further information is requested

Comments

@VadimOsovsky
Copy link

Question
I come from SQLite and I'm trying Hive now, I need to store 50 000+ json objects, are you sure it's ok to leave the box open and keep all of them in RAM?

@VadimOsovsky VadimOsovsky added the question Further information is requested label Jan 5, 2020
@simc
Copy link
Member

simc commented Jan 5, 2020

50,000 will be too much for a normal box and maybe even for a lazy box. In your case, I would recommend using SQLite.

@VadimOsovsky
Copy link
Author

VadimOsovsky commented Jan 5, 2020

@leisim what if I split the entries into different files, will a hive box open faster than an sqlite query and what's the reasonable limit for the number of entries per box so that the app would maintain 60 fps?

@simc
Copy link
Member

simc commented Jan 6, 2020

@VadimOsovsky Splitting the data helps if you don't need all the entries at the same time. There are two problems with too many entries:

  1. RAM usage: Normal boxes keep all keys and values are in memory. Lazy boxes only keys. But 50,000 keys still add up.

  2. CPU: When a box is being opened, all of its entries have to be read and decoded. Your UI will freeze with a huge amount of entries.

will a hive box open faster than an sqlite query

It will depend on the SQLite query. If you query all the entries, Hive will probably be faster. If you have a condition and an index, SQLite might win.

what's the reasonable limit for the number of entries

This obviously depends on your device but I recommend keeping it below 1000 (5000 max) for best performance. In some cases 10,000 entries might work as well.

That being said, SQLite might also have problems too with your huge number of entries. May I ask why you need to store them on the device and can't rely on a backend?

@VadimOsovsky
Copy link
Author

VadimOsovsky commented Jan 6, 2020

@leisim sure, the use case is that a have a mail app that needs to download all user emails to work offline. In sql I can keep all of them in one table and query them depending on an account and a folder (like inbox) selected. I can create each box per folder but then I would get up to 50 boxes just for emails

@simc
Copy link
Member

simc commented Jan 6, 2020

I think you should stick to SQLite then for your use case.

@dave-trudes
Copy link

I have a similar requirement where almost 30k entities have to be stored. In addition, the entities must be sorted and filtered according to an integer property and must be searchable by a search string.

I have chosen an approach where I use 2 boxes in an isolate. A LazyBox that contains the entities and a Box that contains the "index" as key and the entity ID as value.
A key contains the integer property and the searchable text. (eg: 050_searchabletext)
Those keys are then filtered in a query and the associated entities are read in parallel from the LazyBox.

Some performance measurements (iPhone 5!):

  • Write: All entities (~4 sec) & index values ​​(~1.3 sec) -> ~ 4.3 sec
  • Opening the two boxes: ~1.3 sec
  • Query limited to the first 100 matches + transport to main isolate: ~110 ms

Of course I also tried SQLite using moor. Since it took a long time to write the entities in the isolate, I did not pursue this approach any further.

@DevonJerothe
Copy link

@VadimOsovsky not sure if you have found a solution. But as this is still open I thought I would chime in. We had similar issues in our app, we needed to handle lists containing 100K+ items. We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.

@simc
Copy link
Member

simc commented Jan 16, 2020

@dave-trudes Very interesting approach. I experimented with isolates during development of Hive but for smaller amounts of data, the overhead of transferring data between isolates has been too big.

I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.

We ended up using Hive for smaller app related entries such as user settings etc, then SQLite (moor) for the larger stuff.

That is what I would recommend generally.

@dave-trudes
Copy link

dave-trudes commented Jan 17, 2020

@leisim Our data structure is very nested and contains many other entities - so it is inevitable for me to parse it in an isolate (parsing takes almost 5 sec on iPhone5).

As already mentioned, I also tried sqlite via moor - but in the end you generate a huge number of SQL insert statements that take many times longer to execute than writing to a LazyBox. (Imaging 30k entities, each with at least 10 sub-entities -> 300k SQL statements...)
That is where hive, in the current flutter database world, really shines to me 👏.

The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.

...the overhead of transferring data between isolates has been too big.

I forgot to mention in the previous post that I also invoke the request in the isolate - this way I save the transport of the request response.
Sending back of max 100 entities to the main isolate is negligible in my case.

I hope the Dart team thinks about some kind of shared memory in the future. It would greatly benefit Hive.

I'm afraid that we won't get support for that in the near future. On the one hand, it is easier to implement the isolate approach on all platforms, and on the other hand, web workers also do not support shared memory.

@bolasim
Copy link

bolasim commented Jan 17, 2020

The only drawback is that a LazyBox loads all keys when it is opened. But maybe sharding or something similiar could be an option 🤔.

I would love sharding.
Maybe allow for a sharded SortedLazyBox (say only allows auto-increment keys or something similar) and only loads into memory a shard map with (start_index, end_index)->shard.

@VadimOsovsky
Copy link
Author

@leisim it would really be nice if you could also write basic queries. Is Hives’s internal structure any similar to MongoDB? Loading all 10k entrances into ram to write a where statement is too much in terms of time I think

@simc
Copy link
Member

simc commented Jan 18, 2020

@dave-trudes

But maybe sharding or something similiar could be an option

I thought about that too but I did not find a performant solution on how to find the correct storage position of a specific key / index. Do you have an idea or could you elaborate what you had in mind?

Sending back of max 100 entities to the main isolate is negligible in my case.

Unfortunately sending objects is not supported by a dart2js so we would need to encode to binary, send over SendPort and decode on the main isolate. I'm not sure whether this would be worth it.

I'm afraid that we won't get support for that in the near future.

Unfortunately I think you are right. They are working on other things currently.

@re-bola

Maybe allow for a sharded SortedLazyBox

Do you have an idea how to implement it?

@VadimOsovsky

it would really be nice if you could also write basic queries

Yeah it would be. I've been experimenting for quite some time now. I have not found a solution for something like indices to improve query performance so the queries would work just like List.filter(). Executing them for the first time would be expensive (but reasonable fast for <5000 entries). Listening to queries would be very efficient.

If someone has an idea or a link to a resource of an alternative, that would be very helpful.

@bolasim
Copy link

bolasim commented Jan 18, 2020

@leisim
Yes, I do. I'm happy to draft up a PR (maybe in about 1.5 weeks because I don't have time for it right now, and that's when I have to implement something similar for my app anyways).

Want to open a feature-request and assign it to me?

@simc
Copy link
Member

simc commented Jan 18, 2020

Yes, I do. I'm happy to draft up a PR

That would be amazing!

Want to open a feature-request and assign it to me?

I'd rather keep the discussion in a single place (this issue)

@galdazbiz
Copy link

has this changed in 2 years or still the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants