Try out the new Jake: AI Coding Assistant for LabVIEW!
Get answers to questions about LabVIEW and discuss your code.

CuLab - GPU Accelerated by Ngene - Toolkit for LabVIEW Download

CuLab - GPU Accelerated Toolkit

Discussion Watch

435

Version	4.1.2.80
Released	May 23, 2025
Publisher	Ngene
License	Ngene Custom
LabVIEW Version	LabVIEW x64>=20.0
Operating System	Windows x64
Project links	Homepage Documentation Repository Discussion

Description

CuLab is a GPU acceleration toolkit for LabVIEW, designed to simplify complex computations on Nvidia GPUs. It provides a broad API to accelerate a wide range of functions, including mathematical operations, linear algebra, signal generation, signal processing (FFT/IFFT, correlation, convolution, resampling), and array manipulation directly on the GPU. CuLab supports tensors (arrays) across all numeric types and dimensions (0D to 4D), making it highly adaptable to various data processing tasks. With its user-friendly design, CuLab enables LabVIEW users to seamlessly accelerate their applications on Nvidia GPUs.

Release Notes

4.1.2.80 (May 23, 2025)

V4.1.2
General Description
This update resolves several bugs while maintaining full backward compatibility with all v4.x.x versions of the toolkit.

Bug Fixes & Enhancements
1. Resolved a memory leak caused by BLAS API usage.
2. Improved error handling in the CU_Tensor_Pull_to_DVR API.
3. Enhanced error handling in the GPU_Info tool.
4. Fixed a missing help file link in the Get_Exec_Time.vim utility.
5. Corrected typos in the help file.
6. Removed Get_Exec_Time.vim from the examples and replaced it with the version included in the toolkit.

4.1.1.77 (Apr 09, 2025)

V4.1.1
General Description
This update introduces new functionalities, performance optimizations, and bug fixes while maintaining full backward compatibility.

New Features
1. Numeric Subpalette:
• CU_Reduce.vi: Enables batch mode reduction.
• Supported Data Types: I32, U32, I64, U64, SGL, DBL, CDB, CSG.
• Dimensionality: 2D.
• Operations: Sum, Product, Min, Max.
• Preprocessing: None, Abs, Sqr, Sqrt.
2. Array Subpalette:
• CU_Array_Permute.vi: Permutes tensor dimensions.
• Supported Data Types: All.
• Dimensionalities: 3D, 4D.
3. Statistics Subpalette:
• CU_Mean_Batch.vi & CU_RMS_Batch.vi: Perform batch mode mean and RMS calculations. Available since v4.0.1
• Supported Data Types: SGL, DBL, CSG, CDB.
• Dimensionality: 2D.
4. CU_Array_Reshape_to_TxD: Now supports an in-place option and allows one target dimension to be inferred automatically.

Optimizations
1. CU_Transpose_2D_Array.vi: Significantly improved execution speed.
2. General Performance: Optimized multiple functions by using asynchronous copy operation

Bug Fixes & Enhancements
1. Fixed a bug in CU_Initialize_Array.vi when initializing large tensors with complex data types.
2. CU_Convolution.vi now returns an error when called in in-place mode.
3. Improved error messages to display the full call chain.
4. Resolved an issue where some VIs were broken after installation.
5. Enhanced dimension validation with clearer error messages for mismatched dimensions.
6. Removed unnecessary DLLs from the package installer.
7. Improved error handling in GPU Info and CU_FIR_Filter functions.
8. Fixed unnecessary host destination wire connections in example VIs calling CU_Tensor_Pull.vi.
9. Various minor fixes and stability improvements.

Cosmetic Changes
1. Updated icons for the following API VIs (now indicating default in-place operations):
• CU_Replace_Array_Element.vi
• CU_Replace_Array_Element_by_Index_Batch.vi
• CU_Replace_Array_Subset_1D_1D.vi
• CU_Replace_Array_Subset_2D_2D.vi
• CU_Replace_Array_Subset_2D_1D.vi

4.0.1.73 (Jan 09, 2025)

V4.0.1
General Description
This version of the toolkit brings lots of new functionalities and improvements.
It mainly keeps the backward compatibility with the previous version but does break for some API function

New Features
1. Now the toolkit supports adaptive tensor sizing. The output/destination tensor will automatically be resized depending on the input/s dimensions and configurations (e.g. continuous signal processing VIs).
2. Added new function in “Signal Operation” subpalette:
a. “CU_Decimate_Single_Shot_Adaptive.vi”. This vi is designed to perform decimation based on output size, in contrast to “CU_Decimate_Single_Shot.vi”, which performs decimation based on decimation factor, and adaptively choses output size.
- Supported input types: SGL, DBL.
- Batch mode supported: False.
- Supported operations: Avg, Min, Max, MinMax
- This VI does not have LabVIEW counterpart.
3. Added new function in “Filters” subpalette in the “Signal Processing” palette.
a. “CU_FIR_Filter.vi” – Performs continuous FIR filtering over sequence of input signal.
- Supported input types: SGL, DBL, CSG, CDB.
- Batch mode supported: True.
4. Updated “CU_Sine_Pattern.vi”:
a. Added support for complex types.
b. Added support for continuous signal generation between consecutive calls.
c. Added batch mode support.
5. Updated “CU_Rational_Resample.vi”:
a. Removed “averaging” input.
b. Added support for continuous processing.
c. Added support for custom FIR filter coefficients through separate input.
d. Removed “Dest_in” input, as it is changing its size between consecutive calls.
6. Updated “CU_Digital_Down_Conversion.vi””
a. Added support for continuous reference/carrier signal generation.
b. Added support for custom FIR filter coefficients through separate input.
c. Removed “Dest_in” input, as it is changing its size between consecutive calls.
7. Added new “API CU_Tensor_Pull_to_DVR.vi” in “Tensor” palette, which aims to improve GPU to CPU data copying efficiency.
8. Removed support for preallocated CPU array destination for “CU_Tensor_Pull.vi”. Destination allocation (or reallocation) is performed internally.
9. Updated “CU_Tensor_Push.vi”. Now it does not require to have preallocated tensor as a destination input.
10. Updated GPU info tool
a. Changed display font to monospace type for aligned visualization of the information.
b. Added info about installed driver version.
c. Added info about detected GPUs.

Cosmetic Changes
1. Moved “CU_Add_Broadcast.vi” and “CU_Multiply_Broadcast.vi” functions into “Broadcast” subpalette.
2. “CU_Sine_Pattern.vi” connector pane changed to match with LabVIEW “Sine Pattern” vi.
3. “CU_Max_Min.vi” connector pane changed to match with LabVIEW “Max & Min” vi.
4. “CU_Decimate_Single_Shot.vi” connector pane changed to match with LabVIEW “Decimate Single Shot” vi.

Bug Fix
1. Fixed the bug when “CU_Square_Root.vi” returns inf for “0+j0” complex input.
2. Fixed a bug dimension validation in the “CU_Add_Broadcast.vi” and “CU_Multiply_Broadcast.vi”
3. Fixed the bug in “CU_Convolution.vi”, which caused the CUDA runtime system to crash when run with 1D_SGL and 1D_DBL in frequency domain mode.
4. Other minor fixes.

3.0.4.67 (May 27, 2024)

V3.0.4
General Description
This is a major update which brings lots of new functionalities and improvements.

New Features
1. Added “Computer Vision” palette with the following new functions:
a. CU_CV_GrayMorpholgy.vi - Performs grayscale morphological transformations.
- Supported operations: Erode, Dilate
- Supported input types: U8, U16, SGL.
- Batch mode supported: False.
b. CU_CV_Resample.vi - Performs resampling operation.
- Supported input types: All CV types.
- Batch mode supported: True.
c. CU_CV_Extract.vi - Extracts a portion of the input image.
- Supported input types: All CV types.
- Batch mode supported: True.
Images are represented as T2D tensors.
Grayscale images are represented with the help of U8, I8, U16, I16, SGL datatypes, while color images
(e.g. ARGB) are represented using U32 datatype.
Some functions also support batch mode of operation in which case the batch of images are provided
as T3D tensors.
2. Added new function in “Signal Operation” subpalette:
a. CU_Digital_Down_Conversion.vi - Performs digital down conversion.
- Supported input types: SGL, DBL, CSG, CDB.
- Batch mode supported: True.
b. CU_Convolution_Batch.vi – Performs batch convolution operation.
- Supports 1D MCH-MCH, MCH-1CH modes.
- Batched instances in CU_Convolution polymorphic vi have been removed.
b. c. Added support for 2D convolution to CU_Convolution.vi.
- Supported input types: SGL, DBL.
3. Changed the “FFT” subpalette name to “Transforms” and added the following functions:
a. CU_Hilbert_Transform.vi - Computes the fast Hilbert transform of the input Tensor.
- Accepted input Tensor types: SGL, DBL
- Supported dimensionalities: T1D.
b. CU_Analytic_Signal.vi - Computes the complex Analytic Signal of the real-valued input Tensor.
- Accepted input Tensor types: SGL, DBL
- Return Tensor types: CSG, CDB
- Supported dimensionalities: T1D.
4. Added “Boolean Operation” palette with the following functions.
a. CU_Boolean_2in.vi – compound function for different binary(two-input) logical operations.
- Supported Boolean Operations:
1. AND
2. OR
3. XOR
4. NAND
5. NOR
6. XNOR
7. Select X
8. Select Y
b. CU_Boolean_Not.vi
c. CU_And_Array_Elements.vi
d. CU_Or_Array_Elements.vi
5. Added “Comparison Operation” palette with the following functions.
a. CU_Compare_1_Input.vi – compound function for different unary (single-input) comparison operations.
- Supported Comparison Operations:
1. Equal To 0?
2. Not Equal To 0?
3. Greater Than 0?
4. Greater Or Equal To 0?
5. Less than 0?
6. Less Or Equal To 0?
- Supports all Tensor types, except complex types.
b. CU_Compare_2_Inputs.vi – compound function for different binary (two-input) comparison operations.
- Supported Comparison Operations:
1. Equal?
2. Not Equal?
3. Greater?
4. Greater Or Equal?
5. Less?
6. Less Or Equal?
- Supports all Tensor types.
- Accepts Comparison with a Constant.
c. CU_In_Range_and_Coerce.vi
- Supports all Tensor types, except complex types.
- Supports all tensor dimensionalities except T0D.
d. CU_Max_Min.vi
- Supports all Tensor types, except complex types.
- Supports tensor dimensionalities: T1D, T2D.
6. Added “Lookup” subpalette in “Array” palette with the following functions.
a. CU_Array_Lookup_by_Index.vi. - Returns a Tensor containing elements of the input Tensor specified
by Index Tensor.
- Supports all tensor dimensionalities except T0D.
- Supports all Tensor types.
b. CU_Array_Lookup_by_Bool.vi.
- Description: Returns a Tensor containing elements of the input Tensor, that have a value 1 (TRUE) in
Boolean input Tensor.
- Supports all tensor dimensionalities except T0D.
- Supports all Tensor types, except complex types.
7. Added new function in “Array” palette.
a. CU_Replace_Array_Elemenets_by_Index_Batch.vi - replaces elements in input Tensor with elements
from Sub-Tensor at indices specified in Index-Tensor
- Supports tensor dimensionalities: T1D, T2D.
- Accepts all input Tensor types.
8. Added new functions in “Numeric” palette.
a. CU_Add_Broadcast.vi - Performs broadcast addition of T2D with T1D
- Supported input types: SGL, DBL, CSG, CDB.
b. CU_Multiply_Broadcast.vi - Performs broadcast multiplication of T2D with T1D
- Supported input types: SGL, DBL, CSG, CDB.
9. Added option for swapping inputs in tensor-constant operations for CU_Subtract.vi and CU_Divide.vi
9. 10. Added the following function in “Conversion” subpalette.
a. CU_To_U64.vi
10. 11. Added “Utilities” palette with the following function:
a. Get_Exec_Time.vi – returns the execution time.
11. 12. Added the following function in “Device Management” subpalette.
a. CU_Get_CUDA_Version.vi – returns CUDA version.
b. CU_Reset_GPU.vi - Destroy all allocations and reset all states.
12. 13. Added GPU info tool in help menu.
14. Requirement for Run-time licensing has been added to this version of the toolkit.

Optimizations
1. Greatly optimized the efficiency of data movement between CPU and GPU which leads to
significant (30-40% for common benchmarks) improvement of toolkit overall performance.
2. Optimized the execution of numeric conversion functions.
3. Optimized the execution of CU_Array_Subset.vi
4. Significantly improved the performance of CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi.
5. Optimized the memory, context, and other resource management functionalities.
6. Other optimizations.

Extended Functionalities
1. The FIR filter specification has been incorporated into CU_Rational_Resample.vi for both single-channel and
multichannel (batch) inputs.
2. The following functions have been updated to accept a constant as a second input.
a. CU_Logarithm_Base_X.vi
b. CU_Power_Of_X.vi
3. All numeric conversion functions now support all tensor types.
4. Error Dialog Box returns the full path for call chain.
5. CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi check if Input Tensor and CPU Data Array dimensions
match before pushing the data to GPU.
6. Automated the process of adding dependency DLLs when building applications.
7. The help file was updated to reflect the updated functionalities.

Bug Fix
1. Fixed input tensor types for T1D:DBL instance of CU_Inverse_Tangent_2_input.vi.
2. The functionality of T4D instances of CU_Array_Subset.vi have been corrected.
3. Fixed array max dimension (65535) issue in CU_Power_Spectrum.vi.
4. CU_Decimate_Single_Shot.vi connector pane changed to match with LabVIEW Decimate Single Shot vi.
5. Renamed the following functions to conform with the common naming conventions.
a. From “CU_Square Root.vi” to “CU_Square_Root.vi”
b. From “CU_Add Array Elements.vi” to “CU_Add_Array_Elements.vi”
6. Fixed the bug of incorrect results in CU_Square.vi for complex inputs.
7. Fixed the CPU memory leakage bug in CU_Tensor_Destroy.vi.
8. Other minor fixes.

2.1.1.50 (Oct 26, 2023)

V2.1.1

General Description
This version of CuLab toolkit brings new functionalities and improvements to existing ones.

New Features
1. Added Signal Operation subpalette with the following functions.
1. CU_Decimate_Single_Shot.vi
• Supports 1Ch-1Ch, MCh-1Ch, MCh-MCh modes.
• Accepts (SGL, DBL, CSG, CDB) Tensor Types.
2. CU_Rational_Resampl.vi
• Supports single channel mode.
• Accepts (SGL, DBL, CSG, CDB) Tensor Types.
3. CU_Convolution.vi
• Supports 1Ch-1Ch, MCh-1Ch, MCh-MCh modes.
• Accepts (SGL, DBL, CSG, CDB) Tensor Types.
4. Cross_Correlation.vi
• Supports single channel mode.
• Accepts (SGL, DBL, CSG, CDB) Tensor Types.
2. Added a function to the Complex subpalette.
1. CU_Interleaved_to_Complex.vi.
• Description: Converts interleaved sampled IQ data into complex representation. Designed to minimize data copy overhead during conversion.
• Supports all tensor dimensionalities except T0D.
• Accepted input datatypes (I8, U8, I16, U16, I32, U32, I64, U64, SGL, DBL)
• Supported output datatypes (CSG, CDB)
3. Added following functions to the Device Management subpalette.
1. CU_Get_GPU_List.vi
• Returns list of Nvidia GPUs available on the PC.
2. CU_Get_GPU_Properties.vi
• Returns the properties for the selected GPU ID.
3. CU_Set_GPU.vi
• Sets the selected GPU to be active.
Extended Functionalities
1. CU_Sine.vi, CU_Cosine.vi and CU_exponential.vi accept tensors with complex data representations (CSG, CDB) as input.
2. The help file was updated to reflect the updated functionalities.
Bug Fix
1. Fixed array max dimension issue in CU_Transpose_2D_Array.vi.
2. Fixed “Dest in” memory allocation issue in CU_Power_Spectrum.vi.
3. Fixed issue when CU_Square_Root.vi returned incorrect results when imaginary part is 0, and when run in in-place mode.
4. Other minor fixes.

2.0.1.39 (Feb 08, 2023)

v2.0.1

General Description
This version of CuLab toolkit brings new functionalities and improves the performance of existing ones.

Backward Compatibility
This is a major update which breaks backward compatibility with v1.0.1 version of the toolkit.

Features
1. Added Trigonometric functions:
a) Sine
b) Cosine
c) Tangent
d) Cotangent
e) Inverse Sine
f) Inverse Cosine
g) Inverse Tangent
h) Inverse Tangent 2 Input (atan2)
i) Inverse Cotangent
j) Sine & Cosine
k) Sinc

2. Added Exponential functions
a) Exponential
b) Exponential Arg -1
c) Logarithm Base 10
d) Logarithm Base 2
e) Logarithm Base X
f) Natural Logarithm
g) Natural Logarithm Arg +1
h) Power Of 10
i) Power Of 2
j) Power Of X
k) Y-th Root of X

3. Added Hyperbolic functions
a) Hyperbolic Sine
b) Hyperbolic Cosine
c) Hyperbolic Tangent
d) Hyperbolic Cotangent
e) Inverse Hyperbolic Sine
f) Inverse Hyperbolic Cosine
g) Inverse Hyperbolic Tangent
h) Inverse Hyperbolic Cotangent

4. Added missed complex APIs
a) Complex to Polar
b) Polar to Complex
c) Polar to Re/Im
d) Re/Im to Polar

5. Added function for Quotient & Remainder
6. Added support for Tensor-Constant operations for binary numeric operations
7. This allows to choose a CPU based constant as a second operand
8. Added new function generation functions
a) Ramp Pattern
b) Sine Pattern
c) Power Spectrum

9. Added support for missing numeric types for numeric operations
10. Redesigned colors for tensor wires to make them distinguishable across numeric types and dimensionality
11. Added batched versions of FFT and IFFT
12. Added spectrum shifting functionality single channel and batched FFT

Optimizations
1. Greatly improved the performance of Real-to-Complex (R2C and D2Z) FFTs
2. Optimized execution times for Complex functions
3. Other optimizations

Bug Fix
1. Fixed automatic instance selection issue in Array Max Min polymorphic function
2. Fixed memory leakage issue
3. Fixed an error in BLAS benchmarking example
4. Corrected typos
5. Other bug fixes

1.0.1.19 (Sep 07, 2022)

v 1.0.1

ngene was a contributor to this release

Show all release notes (+6 more releases)

Download Package

Versions

4.1.2.80 (May 23, 2025)

4.1.1.77 (Apr 09, 2025)

4.0.1.73 (Jan 09, 2025)

3.0.4.67 (May 27, 2024)

2.1.1.50 (Oct 26, 2023)

+ 2 more

Published by

All Contributors

Post an Idea Post a Resource

Try out the new Jake: AI Coding Assistant for LabVIEW!
Get answers to questions about LabVIEW and discuss your code.

CuLab - GPU Accelerated by Ngene - Toolkit for LabVIEW Download

CuLab - GPU Accelerated Toolkit

Description

Release Notes

Recent Posts

Forum Posts

Can waveform generation be included as simple trig and linear operations like ramp and sine pattern Many RF DSP maths require simple signals to perform operations. Making those signals takes horsepow… by norm!, 2 years, 9 months ago, 1 , 2
Can complex number library be fleshed out with polar transforms? Complex to polar transforms are done a TON in RF DSP. I'd love to see the impact on some core algor… by norm!, 2 years, 9 months ago, 1 , 1

Try out the new Jake: AI Coding Assistant for LabVIEW! Get answers to questions about LabVIEW and discuss your code.

Description

Release Notes

Recent Posts

Forum Posts

Try out the new Jake: AI Coding Assistant for LabVIEW!
Get answers to questions about LabVIEW and discuss your code.