DeltaLake

Config

@dataclass
class DeltaLakeWriterConfig:
    data_uri: str
    partition_by: Dict[str, list[str]] = field(default_factory=dict)
    storage_options: Optional[Dict[str, str]] = None
    writer_properties: Optional[deltalake.WriterProperties] = None
    anchor_table: Optional[str] = None

Dict[str, _] fields generally mean config per table name for example partition_by["my_table"] would give list of columns to partition the my_table table by.

Example

data_uri = "./data"

writer = cc.Writer(
    kind=cc.WriterKind.DELTA_LAKE,
    config=cc.DeltaLakeWriterConfig(
        data_uri=data_uri,
    ),
)

Anchor table

All tables are written in parallel but anchor table is written seperately so it can be used to implement crash-resistance.