Closed
Listed in 4 issues
Description
Summary
Release name: v1.0-nightly, get on the train now ✋
Let's make the Databend more Lakehouse!
v1.0 (Prepare for release on March 5th)
Task | Status | Comments |
---|---|---|
(Query) Support Decimal data type#2931 | DONE | high-priority(release in v1.0 ) |
(Query) Query external stage file(parquet)#9847 | DONE | high-priority(release in v1.0) |
(Query) Array functions#7931 | DONE | high-priority(release in v1.0) |
(Query) Query Result Cache#10010 | DONE | high-priority(release in v1.0) |
(Planner) CBO#9597 | DONE | high-priority(release in v1.0) |
(Processor) Aggregation spilling#10273 | DONE | high-priority(release in v1.0) |
(Storage) Alter table#9441 | DONE | high-priority(release in v1.0 ) |
(Storage) Block data cache#9772 | DONE | high-priority(release in v1.0 ) |
Archive releases
Reference
What are Databend release channels?
Nightly v1.0 is part of our Roadmap 2023
Community website: https://databend.rs
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
xudong963 commentedon Jan 15, 2023
Is there an expected time to release v1.0?
BohuTANG commentedon Jan 15, 2023
The preliminary plan is to release in March, mainly focusing on
alter table
,update
, andgroup by spill
.tangguoqiang172528725 commentedon Feb 13, 2023
Hope simplify the way to insert data, it will help get more user.
BohuTANG commentedon Feb 24, 2023
Add Query Result Cache#10010
haydenflinner commentedon Feb 24, 2023
@BohuTANG Are there any plans for higher-performance client reads, like maybe streaming Arrow/Parquet/some other high-perf format? I'm not familiar with other read protocols like for example ClickHouse's, I've just been using the mysql connector. But it would be neat to be able to have databend in the middle while paying little overhead vs reading the raw parquet files from S3.
BohuTANG commentedon Feb 25, 2023
@haydenflinner
Databend supports the suffix an
ignore_result
to ignore the result from server to client by MySQL wired protocol.For example:
With
ignore_result
(Not send result to client):haydenflinner commentedon Feb 25, 2023
@BohuTANG That is neat and confirms my suspicion that MySql protocol is a bottleneck in some usecases. Parquet read speeds are in the GB/s, but even by telling the mysql client not to handle the result, we get only MB/s. This confirms the results in the paper I linked, see "Postgres++" in the final table of results vs "Postgres".
If one wanted to use databend as a simple intermediary between dataframes and s3 (more lake-house style), databend is providing a lot of value still in interactive query handling, file size and metadata mgmt, far simpler interface, etc. But it presents a bottleneck when it comes to raw-read-speed. If I wanted to do this for example:
df = pd.read_sql("select * from hits limit 1000000")
, that would be I think 10x slower thandf = pd.read_parquet("local-download-of-hits.parquet")
. But I suspect primarily due to mysql protocol overhead; the rest of databend is so fast I wouldn't expect it to get in the way much. I can file a ticket for this, don't let me derail the 1.0 thread, sorry 😄haydenflinner commentedon Feb 25, 2023
I believe the modern open source protocol most similar to what that paper describes is "Apache Arrow Flight"
sundy-li commentedon Feb 25, 2023
Yes, we have plan to do this in #9832.
If the query result is small, MySQL client could work as normal since OLTP data result will commonly be small so it's ok.
Otherwise, we should use other formats or protocols to handle large output (MySQL client is really bad in this case)
You can use:
Unload
command to upload the data in parquet/csv formats into storage. https://databend.rs/doc/unload-data/This paper did not cover
clickhouse-client
. But AFAIK,clickhouse-client
is the best client/protocol I ever see.211 remaining items