{{/* Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */}}
Key Default Type Description
changelog.precommit-compact.thread-num
(none) Integer Maximum number of threads to copy bytes from small changelog files. By default is the number of processors available to the Java virtual machine.
commit.custom-listeners
(none) String Commit listener will be called after a successful commit. This option list custom commit listener identifiers separated by comma.
end-input.watermark
(none) Long Optional endInput watermark used in case of batch mode or bounded stream.
lookup.async
false Boolean Whether to enable async lookup join.
lookup.async-thread-number
16 Integer The thread number for lookup async.
lookup.bootstrap-parallelism
4 Integer The parallelism for bootstrap in a single task for lookup join.
lookup.cache
AUTO

Enum

The cache mode of lookup join.

Possible values:
  • "AUTO"
  • "FULL"
  • "MEMORY"
lookup.dynamic-partition.refresh-interval
1 h Duration Specific dynamic partition refresh interval for lookup, scan all partitions and obtain corresponding partition.
lookup.refresh.async
false Boolean Whether to refresh lookup table in an async thread.
lookup.refresh.async.pending-snapshot-count
5 Integer If the pending snapshot count exceeds the threshold, lookup operator will refresh the table in sync.
lookup.refresh.time-periods-blacklist
(none) String The blacklist contains several time periods. During these time periods, the lookup table's cache refreshing is forbidden. Blacklist format is start1->end1,start2->end2,... , and the time format is yyyy-MM-dd HH:mm. Only used when lookup table is FULL cache mode.
partition.idle-time-to-done
(none) Duration Set a time duration when a partition has no new data after this time duration, mark the done status to indicate that the data is ready.
partition.mark-done-action.mode
process-time

Enum

How to trigger partition mark done action.

Possible values:
  • "process-time": Based on the time of the machine, mark the partition done once the processing time passes period time plus delay.
  • "watermark": Based on the watermark of the input, mark the partition done once the watermark passes period time plus delay.
partition.mark-done.recover-from-state
true Boolean Whether trigger partition mark done when recover from state.
partition.time-interval
(none) Duration You can specify time interval for partition, for example, daily partition is '1 d', hourly partition is '1 h'.
postpone.default-bucket-num
1 Integer Bucket number for the partitions compacted for the first time in postpone bucket tables.
precommit-compact
false Boolean If true, it will add a compact coordinator and worker operator after the writer operator,in order to compact several changelog files (for primary key tables) or newly created data files (for unaware bucket tables) from the same partition into large ones, which can decrease the number of small files.
read.shuffle-bucket-with-partition
true Boolean Whether shuffle by partition and bucket when read.
scan.bounded
(none) Boolean Bounded mode for Paimon consumer. By default, Paimon automatically selects bounded mode based on the mode of the Flink job.
scan.dedicated-split-generation
false Boolean If true, the split generation process would be performed during runtime on a Flink task, instead of on the JobManager during initialization phase.
scan.infer-parallelism
true Boolean If it is false, parallelism of source are set by global parallelism. Otherwise, source parallelism is inferred from splits number (batch mode) or bucket number(streaming mode).
scan.infer-parallelism.max
1024 Integer If scan.infer-parallelism is true, limit the parallelism of source through this option.
scan.max-snapshot.count
-1 Integer The max snapshot count to scan per checkpoint. Not limited when it's negative.
scan.parallelism
(none) Integer Define a custom parallelism for the scan source. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. If user enable the scan.infer-parallelism, the planner will derive the parallelism by inferred parallelism.
scan.partitions
(none) String Specify the partitions to scan. Partitions should be given in the form of key1=value1,key2=value2. Partition keys not specified will be filled with the value of partition.default-name. Multiple partitions should be separated by semicolon (;). This option can support normal source tables and lookup join tables. For lookup joins, two special values max_pt() and max_two_pt() are also supported, specifying the (two) partition(s) with the largest partition value.
scan.remove-normalize
false Boolean Whether to force the removal of the normalize node when streaming read. Note: This is dangerous and is likely to cause data errors if downstream is used to calculate aggregation and the input is not complete changelog.
scan.split-enumerator.batch-size
10 Integer How many splits should assign to subtask per batch in StaticFileStoreSplitEnumerator to avoid exceed `akka.framesize` limit.
scan.split-enumerator.mode
fair

Enum

The mode used by StaticFileStoreSplitEnumerator to assign splits.

Possible values:
  • "fair": Distribute splits evenly when batch reading to prevent a few tasks from reading all.
  • "preemptive": Distribute splits preemptively according to the consumption speed of the task.
scan.watermark.alignment.group
(none) String A group of sources to align watermarks.
scan.watermark.alignment.max-drift
(none) Duration Maximal drift to align watermarks, before we pause consuming from the source/task/partition.
scan.watermark.alignment.update-interval
1 s Duration How often tasks should notify coordinator about the current watermark and how often the coordinator should announce the maximal aligned watermark.
scan.watermark.emit.strategy
on-event

Enum

Emit strategy for watermark generation.

Possible values:
  • "on-periodic": Emit watermark periodically, interval is controlled by Flink 'pipeline.auto-watermark-interval'.
  • "on-event": Emit watermark per record.
scan.watermark.idle-timeout
(none) Duration If no records flow in a partition of a stream for that amount of time, then that partition is considered "idle" and will not hold back the progress of watermarks in downstream operators.
sink.clustering.by-columns
(none) String Specifies the column name(s) used for comparison during range partitioning, in the format 'columnName1,columnName2'. If not set or set to an empty string, it indicates that the range partitioning feature is not enabled. This option will be effective only for bucket unaware table without primary keys and batch execution mode.
sink.clustering.sample-factor
100 Integer Specifies the sample factor. Let S represent the total number of samples, F represent the sample factor, and P represent the sink parallelism, then S=F×P. The minimum allowed sample factor is 20.
sink.clustering.sort-in-cluster
true Boolean Indicates whether to further sort data belonged to each sink task after range partitioning.
sink.clustering.strategy
"auto" String Specifies the comparison algorithm used for range partitioning, including 'zorder', 'hilbert', and 'order', corresponding to the z-order curve algorithm, hilbert curve algorithm, and basic type comparison algorithm, respectively. When not configured, it will automatically determine the algorithm based on the number of columns in 'sink.clustering.by-columns'. 'order' is used for 1 column, 'zorder' for less than 5 columns, and 'hilbert' for 5 or more columns.
sink.committer-cpu
1.0 Double Sink committer cpu to control cpu cores of global committer.
sink.committer-memory
(none) MemorySize Sink committer memory to control heap memory of global committer.
sink.committer-operator-chaining
true Boolean Allow sink committer and writer operator to be chained together
sink.cross-partition.managed-memory
256 mb MemorySize Weight of managed memory for RocksDB in cross-partition update, Flink will compute the memory size according to the weight, the actual memory used depends on the running environment.
sink.managed.writer-buffer-memory
256 mb MemorySize Weight of writer buffer in managed memory, Flink will compute the memory size for writer according to the weight, the actual memory used depends on the running environment.
sink.operator-uid.suffix
(none) String Set the uid suffix for the writer, dynamic bucket assigner and committer operators. The uid format is ${UID_PREFIX}_${TABLE_NAME}_${USER_UID_SUFFIX}. If the uid suffix is not set, flink will automatically generate the operator uid, which may be incompatible when the topology changes.
sink.parallelism
(none) Integer Defines a custom parallelism for the sink. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration.
sink.savepoint.auto-tag
false Boolean If true, a tag will be automatically created for the snapshot created by flink savepoint.
sink.use-managed-memory-allocator
false Boolean If true, flink sink will use managed memory for merge tree; otherwise, it will create an independent memory allocator.
sink.writer-cpu
1.0 Double Sink writer cpu to control cpu cores of writer.
sink.writer-memory
(none) MemorySize Sink writer memory to control heap memory of writer.
source.checkpoint-align.enabled
false Boolean Whether to align the flink checkpoint with the snapshot of the paimon table, If true, a checkpoint will only be made if a snapshot is consumed.
source.checkpoint-align.timeout
30 s Duration If the new snapshot has not been generated when the checkpoint starts to trigger, the enumerator will block the checkpoint and wait for the new snapshot. Set the maximum waiting time to avoid infinite waiting, if timeout, the checkpoint will fail. Note that it should be set smaller than the checkpoint timeout.
source.operator-uid.suffix
(none) String Set the uid suffix for the source operators. After setting, the uid format is ${UID_PREFIX}_${TABLE_NAME}_${USER_UID_SUFFIX}. If the uid suffix is not set, flink will automatically generate the operator uid, which may be incompatible when the topology changes.
unaware-bucket.compaction.parallelism
(none) Integer Defines a custom parallelism for the unaware-bucket table compaction job. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration.