Skip to content

Feature: copy/select from stage by pos #11581

Open
@youngsofun

Description

@youngsofun
Member

Summary

motivation:

  • Data EXPLORATION for NDJSON and CSV
  • NDJSON By name misses the ability to treat a line as a variant

example

select $1,$2,$10 from @~/1.csv.gz;
----
$1	$2	$10
100	small	<null>
200	lamb co.	<null>

whole picture

  copy select
(transform may add a cast after projection)
parquet
[name] default
or
[pos]
1. load
2. reorder cols if by name
3. cast
(impl base on select)
1. load
csv[pos] 1. decode(dst schema) 1. decode (strings), tolerant bad CSV
ndjson[name] default 1. decode(dst schema) 1. decode(variants)
ndjson[pos] 1. decode (1 variant)
2. cast
1. decode (1 variant)
only allow $1

MATCH_COLUMN_BY_NAME

MATCH_COLUMN_BY_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE

NONE means by pos

default value

  • copy (without transform): based on file type:
    • Parquet/NDJson/XML: CASE_SENSITIVE
    • TSV/CSV: NONE
  • Select: based on select type:
    • select $1, $2 ...: None
    • select id, age, ...: CASE_SENSITIVE
      • use settings:
        • unquoted_ident_case_sensitive: default 0
        • quoted_ident_case_sensitive: default 1
    • Select *
      • Only support parquet

Tasks

  • support $1, $2
    add MATCH_COLUMN_BY_NAME
  • by POS
    • select CSV
    • Copy/select NDJSON
    • Copy/select Parquet (optional)
  • by NAME
    • Select JSON

Activity

changed the title [-]Feature: select/load_with_tranform for CSV/JS[/-] [+]Feature: copy/select from stage by pos[/+] on May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @youngsofun

        Issue actions

          Feature: copy/select from stage by pos · Issue #11581 · databendlabs/databend