-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Add new DataType Map(key,value) #15806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@CurtizJ |
You can try to merge with fresh master.
|
Have you considered using separate columns for each key? In that case it would be easy to implement combinators and such stuff, since then it would be no need to collect values by keys first. |
In the first implement version for Map Datatype, I use tow array columns, one for key and the other for value. I think it's the simplest way to implement this. |
@CurtizJ I have finished coding. Can you review it when free? Thank you. |
Yes, I will try to review your PR by the end of the week. BTW, please, add tests with other table engines, at least with CREATE TABLE table_map(a Map(String, String)) ENGINE = MergeTree ORDER BY tuple();
insert into table_map values ({'name':'zhangsan', 'gender':'male'}), ({'name':'lisi', 'gender':'female'}); stacktrace
|
@CurtizJ Ok, I will add tests with MergeTree Engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to implement function map
(similar to array
and tuple
) to allow creation of maps not only for literals.
To allow something like this:
insert into table_map(m) select {'key1' : number, 'key2' : number * 2} from numbers(1000)
insert into table_map(m) select map('key1', number, 'key2', number * 2) from numbers(1000)
src/Functions/array/arrayElement.cpp
Outdated
} | ||
else | ||
{ | ||
// Default value for unmatched keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use default value of type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to distinguish the real value and the default value?
for example, Integer type's default value is 0
src/Functions/array/arrayElement.cpp
Outdated
const DataTypePtr & key_type = (typeid_cast<const DataTypeArray *>(kv_types[0].get()))->getNestedType(); | ||
const DataTypePtr & value_type = (typeid_cast<const DataTypeArray *>(kv_types[1].get()))->getNestedType(); | ||
|
||
Field index = (*columns[arguments[1]].column)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems, that only const indices are supported. Need to add check for constness of argument. But it's better to implement support of non-const columns as index, like it's done in arrayElement
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will improve this.
src/Functions/array/arrayElement.cpp
Outdated
} | ||
|
||
template <typename DataType> | ||
bool FunctionArrayElement::executeMappedValueNumber(const ColumnArray * column, std::vector<int> matched_idxs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a const reference for vector.
const ColumnArray::Offsets & offsets = column->getOffsets(); | ||
size_t rows = offsets.size(); | ||
|
||
for (size_t i = 0; i < rows; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you know number of elements, it's better to do reserve for vector before loop for better performance.
src/DataTypes/DataTypeMap.cpp
Outdated
std::string DataTypeMap::doGetName() const | ||
{ | ||
WriteBufferFromOwnString s; | ||
s << "Map(" << (typeid_cast<const DataTypeArray *>(keys.get()))->getNestedType()->getName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to typeid_cast
here. You have already saved key_type
and value_type
.
ColumnMap::ColumnMap(MutableColumns && mutable_columns) | ||
{ | ||
columns.reserve(mutable_columns.size()); | ||
for (auto & column : mutable_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can init ColumnMap::columns
to have less than 2 columns, that would cause a crash at run-time in say ColumnMap::cloneEmpty()
or any other method that assumes that there are exactly 2 columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider checking that each of mutable_columns
is actually a ColumnArray
.
SELECT CAST(([1, 2, 3], ['1', '2', 'foo']), 'Map(UInt8, String)') AS map
src/DataTypes/DataTypeMap.cpp
Outdated
|
||
void DataTypeMap::serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const | ||
{ | ||
writeChar('[', ostr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be more natural to serialize Map
as a JSON object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/Core/Field.cpp
Outdated
UInt8 type; | ||
DB::readBinary(type, buf); | ||
|
||
switch (type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Field::dispatch
+ overloaded functions, to avoid enumerating all Field types in many places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, see writeFieldText
or toString(Field)
in Field.cpp
.
Allow casting Tuple as Map.
@akuzm @CurtizJ @Enmk
But this is repeated with query parameter. From the test I found the syntax of DataType Map is confused with below test cases:
3)01056_prepared_statements_null_and_escaping.sh
4)01015_insert_values_parametrized.sh
7)00954_client_prepared_statements.sh
But
can also be parsed as Map . |
@CurtizJ ready for review. |
@alexey-milovidov @CurtizJ After this PR is reviewed, in the next stage:
|
Thanks. I will review and merge soon. |
This PR is moved to #17829 |
🥇 |
Hi, |
@wangxin05 maybe this feature will be implemented in #23932 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new datatype Map for supporting storage k:v .
related to issue: #1841
1st version for Map only support String type of key and value.
Later I will implement Int and other type for key and value.
Detailed description / Documentation draft:
Add new datatype Map for supporting storage k:v .
related to issue: #1841
1st version for Map only support String type of key and value.
Later I will implement Int and other type for key and value.
Usage:
Addition:
PostgreSQL Json data type
postgresql does not support map data type but json, Operator for json in PostgreSQL:
And postgreSQL support other json functions like : xxx_to_json, json_each....
Hive Map data type
Usage: