-
Notifications
You must be signed in to change notification settings - Fork 0
KTGMC
KTGMC is a CUDA port of QTGMC.
Requires the following three files and AviSynthNeo:
- KTGMC.avsi
- KTGMC.dll
- KNNEDI3.dll
Runs on NVIDIA GPUs with compute capability 3.5+.
Check your GPU support at NVIDIA’s CUDA GPU list (e.g. https://developer.nvidia.com/cuda-gpus).
Wrap processing with OnCPU and OnCUDA:
SetMemoryMax(2048, type=DEV_TYPE_CUDA)
srcfile="..."
LWLibavVideoSource(srcfile)
OnCPU(2).KTGMC(SourceMatch=3, Lossless=2, tr0=1, tr1=1, tr2=1).OnCUDA(2)
See AviSynthNeo for details on OnCPU, OnCUDA, and SetMemoryMax.
By default, the CUDA memory limit is 768MB, which is often insufficient.
Use SetMemoryMax to raise the CUDA memory to around 2GB.
Some GPUs may not have that much memory. In that case, increase Preset (simplify processing).
With Preset="Fast" or higher, the default 768MB usually does not cause performance drops.
CUDA version of QTGMC. Arguments are basically the same as QTGMC (see below for supported status). Both input and output are CUDA frames.
Additional arguments:
-
int useFlag = 0
- Available values:
- 0: Normal processing
- 1: Interpolate using previous field only
- 2: Interpolate using next field only
- Intended for use with DecombUCF. Normally interpolation uses both previous and next fields; if either is corrupted, the output can be contaminated. With useFlag=1 or 2, you can avoid the corrupted field when generating interpolated frames.
- Available values:
-
int dev = 0
- GPU index to use (0–)
-
int analyzeBatch = 4
- Number of frames processed per KTGMC_MAnalyze call. Depending on image/block size, a single frame may not provide enough parallelism; batching multiple frames improves performance.
Currently only YV12 (8‑bit) is supported.
Both width and height must be multiples of 4.
Many features are not yet implemented.
Without specifying other parameters, Preset supports Slower–Faster.
SourceMatch and Lossless are supported.
If you use unsupported features, a "Device unmatch" error is thrown.
Adjust parameters to use only supported features.
Supported TR is up to 2. Noise reduction features are not supported.
Motion estimation Overlap supports only half of Blocksize (i.e., 16 for Blocksize=32, 8 for Blocksize=16).
With Preset="Very Faster" or higher, Blocksize=32 and Overlap=8 is forced, which causes an error.
EDI supports only NNEDI3. With Preset="Faster" and SourceMatch ≥ 1, it attempts to use unsupported Yadif, so beware.
Internal processing is optimized for forward frame retrieval; scrubbing backwards in an editor may be slow.
CUDA version of NNEDI3. Arguments are the same as NNEDI3. CPU processing is also supported, so calling it on the CPU behaves like NNEDI3.
Only field=-2, dh=false are tested; others likely won’t work.
RGB and YUY2 are unsupported. YUV planar may work, but only YV12 has been tested.
Of int16/float internal arithmetic, only int16 is supported; thus 16‑bit definitely won’t work. Up to 15‑bit may work.
Without fapprox&2, it switches to float arithmetic and fails.
pscrn supports only ≥2. opt and threads are irrelevant for CUDA mode.
The following parameters accept only these values:
- tr: 1, 2
- pel: 1, 2
- blksize: 8, 16, 32