Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

field family segment design : hot-search-field with more frequent segment merge. #31464

Closed
xzhthu2018 opened this issue Jun 20, 2018 · 5 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@xzhthu2018
Copy link

Hi,
In my ElasticSearch clusters, write and search are both heavy. And the document in the cluster will have many many fields, While just some of them are frequently searched(we named it as hot-search-field). We hope that these kinds of search can achieve better performance to avoid the response time increasing because of the segment number araising.

And we found that search can achieve much better performance after merging to less segments because of less segment scans and Lucene's cache design (it just cache the DocIdSet which is from the most major segment ) .

Now Lucene's Segment design is based on row model (or document model). I wander that if we make Segment re-design to be based on field model (or field family model), so that the hot-search-fields can have more cpu resources, and have frequent segment merges to make the number of segments down to a very small number. If so, ElasticSearch / Lucene can achieve much better performance when the queries with hot-search-fields, especially when ElasticSearch cluster with large amount of bulk requests.

this design need to deep into Lucene segment, maybe include live files, refresh, merge, segment meta, index buffer

@ywelsch ywelsch added discuss :Search/Search Search-related issues that do not fall into other categories labels Jun 20, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jpountz
Copy link
Contributor

jpountz commented Jun 20, 2018

Now Lucene's Segment design is based on row model (or document model).

Actually this is not true, only stored fields have a row model. Other data-structures like the inverted index and doc values, which are the most used ones when it comes to running queries/aggregations are per-field. So your suggestion is actually the way that things are working today already, to the exception of stored fields, but they usually don't matter for performance.

@jpountz jpountz closed this as completed Jun 20, 2018
@xzhthu2018
Copy link
Author

@jpountz i think row model means that a segment contains all fields (with its invert index, doc values, store fields). so that when es do a segment merge, it have to merge and re-build all the fields together, not merge the hot-search fields first. if so, the merge speed of the hot-search fields will be slowed down.

@xzhthu2018
Copy link
Author

@jpountz 3ks for your explanation.
for example:
in a segment#1 with following fields:
id:1(with doc_value,invert_index)
hot_search_field:1(with doc_value,invert_index)
cold_search_field:1(with doc_value,invert_index)

in a segment#2 with following fields:
id:2(with doc_value,invert_index)
hot_search_field:2(with doc_value,invert_index)
cold_search_field:2(with doc_value,invert_index)

when i do the segment, segment#1 and segment#2 merge together, the hot_search_field and the cold_search_field will be merged together. But actually, the cold one is no need to merge first. If we spend more cpu on the hot_search_field merge, when we search on hot_search_field ,we can achieve better performance

@jpountz
Copy link
Contributor

jpountz commented Jun 22, 2018

@xzhthu2018 In any case, we cannot publish a merge until all fields have been merged so your idea wouldn't work. The thing that is closest to your needs that I can think about would be for you to have one additional index that only has the hot search fields and use it to search whenever none of the cold fields are needed. I'm not sure how practical it would be however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants