CuLab  GPU Accelerated by Ngene  Toolkit for LabVIEW Download
CuLab  GPU Accelerated Toolkit
Version  3.0.4.67 
Released  May 27, 2024 
Publisher  Ngene 
License  Ngene Custom 
LabVIEW Version  LabVIEW x64>=20.0 
Operating System  Windows x64 
Project links  Homepage Documentation Repository Discussion 
Description
CuLab is an intuitive toolkit specifically designed for LabVIEW users, simplifying the acceleration of complex computations on Nvidia GPUs.
CuLab offers an extensive API, specifically tailored to accelerate essential functions, including mathematical operations (covering simple math, trigonometric, and exponential functions), Linear Algebra, signal generation and processing (FFT/IFF, correlation, convolution, resampling), and seamless array manipulation directly on GPUs.
One of CuLab's key features is its support for tensors, representing data structures on GPU, spanning all numeric data types and dimensionalities from 0D (scalars) to 4D arrays. This flexibility ensures compatibility with a wide range of data processing needs.
Designed with simplicity in mind, CuLab serves as a userfriendly bridge, enabling the acceleration of any data processing code developed in LabVIEW on Nvidia GPUs.
Release Notes
V3.0.4
General Description
This is a major update which brings lots of new functionalities and improvements.
New Features
1. Added “Computer Vision” palette with the following new functions:
a. CU_CV_GrayMorpholgy.vi  Performs grayscale morphological transformations.
 Supported operations: Erode, Dilate
 Supported input types: U8, U16, SGL.
 Batch mode supported: False.
b. CU_CV_Resample.vi  Performs resampling operation.
 Supported input types: All CV types.
 Batch mode supported: True.
c. CU_CV_Extract.vi  Extracts a portion of the input image.
 Supported input types: All CV types.
 Batch mode supported: True.
Images are represented as T2D tensors.
Grayscale images are represented with the help of U8, I8, U16, I16, SGL datatypes, while color images
(e.g. ARGB) are represented using U32 datatype.
Some functions also support batch mode of operation in which case the batch of images are provided
as T3D tensors.
2. Added new function in “Signal Operation” subpalette:
a. CU_Digital_Down_Conversion.vi  Performs digital down conversion.
 Supported input types: SGL, DBL, CSG, CDB.
 Batch mode supported: True.
b. CU_Convolution_Batch.vi – Performs batch convolution operation.
 Supports 1D MCHMCH, MCH1CH modes.
 Batched instances in CU_Convolution polymorphic vi have been removed.
b. c. Added support for 2D convolution to CU_Convolution.vi.
 Supported input types: SGL, DBL.
3. Changed the “FFT” subpalette name to “Transforms” and added the following functions:
a. CU_Hilbert_Transform.vi  Computes the fast Hilbert transform of the input Tensor.
 Accepted input Tensor types: SGL, DBL
 Supported dimensionalities: T1D.
b. CU_Analytic_Signal.vi  Computes the complex Analytic Signal of the realvalued input Tensor.
 Accepted input Tensor types: SGL, DBL
 Return Tensor types: CSG, CDB
 Supported dimensionalities: T1D.
4. Added “Boolean Operation” palette with the following functions.
a. CU_Boolean_2in.vi – compound function for different binary(twoinput) logical operations.
 Supported Boolean Operations:
1. AND
2. OR
3. XOR
4. NAND
5. NOR
6. XNOR
7. Select X
8. Select Y
b. CU_Boolean_Not.vi
c. CU_And_Array_Elements.vi
d. CU_Or_Array_Elements.vi
5. Added “Comparison Operation” palette with the following functions.
a. CU_Compare_1_Input.vi – compound function for different unary (singleinput) comparison operations.
 Supported Comparison Operations:
1. Equal To 0?
2. Not Equal To 0?
3. Greater Than 0?
4. Greater Or Equal To 0?
5. Less than 0?
6. Less Or Equal To 0?
 Supports all Tensor types, except complex types.
b. CU_Compare_2_Inputs.vi – compound function for different binary (twoinput) comparison operations.
 Supported Comparison Operations:
1. Equal?
2. Not Equal?
3. Greater?
4. Greater Or Equal?
5. Less?
6. Less Or Equal?
 Supports all Tensor types.
 Accepts Comparison with a Constant.
c. CU_In_Range_and_Coerce.vi
 Supports all Tensor types, except complex types.
 Supports all tensor dimensionalities except T0D.
d. CU_Max_Min.vi
 Supports all Tensor types, except complex types.
 Supports tensor dimensionalities: T1D, T2D.
6. Added “Lookup” subpalette in “Array” palette with the following functions.
a. CU_Array_Lookup_by_Index.vi.  Returns a Tensor containing elements of the input Tensor specified
by Index Tensor.
 Supports all tensor dimensionalities except T0D.
 Supports all Tensor types.
b. CU_Array_Lookup_by_Bool.vi.
 Description: Returns a Tensor containing elements of the input Tensor, that have a value 1 (TRUE) in
Boolean input Tensor.
 Supports all tensor dimensionalities except T0D.
 Supports all Tensor types, except complex types.
7. Added new function in “Array” palette.
a. CU_Replace_Array_Elemenets_by_Index_Batch.vi  replaces elements in input Tensor with elements
from SubTensor at indices specified in IndexTensor
 Supports tensor dimensionalities: T1D, T2D.
 Accepts all input Tensor types.
8. Added new functions in “Numeric” palette.
a. CU_Add_Broadcast.vi  Performs broadcast addition of T2D with T1D
 Supported input types: SGL, DBL, CSG, CDB.
b. CU_Multiply_Broadcast.vi  Performs broadcast multiplication of T2D with T1D
 Supported input types: SGL, DBL, CSG, CDB.
9. Added option for swapping inputs in tensorconstant operations for CU_Subtract.vi and CU_Divide.vi
9. 10. Added the following function in “Conversion” subpalette.
a. CU_To_U64.vi
10. 11. Added “Utilities” palette with the following function:
a. Get_Exec_Time.vi – returns the execution time.
11. 12. Added the following function in “Device Management” subpalette.
a. CU_Get_CUDA_Version.vi – returns CUDA version.
b. CU_Reset_GPU.vi  Destroy all allocations and reset all states.
12. 13. Added GPU info tool in help menu.
14. Requirement for Runtime licensing has been added to this version of the toolkit.
Optimizations
1. Greatly optimized the efficiency of data movement between CPU and GPU which leads to
significant (3040% for common benchmarks) improvement of toolkit overall performance.
2. Optimized the execution of numeric conversion functions.
3. Optimized the execution of CU_Array_Subset.vi
4. Significantly improved the performance of CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi.
5. Optimized the memory, context, and other resource management functionalities.
6. Other optimizations.
Extended Functionalities
1. The FIR filter specification has been incorporated into CU_Rational_Resample.vi for both singlechannel and
multichannel (batch) inputs.
2. The following functions have been updated to accept a constant as a second input.
a. CU_Logarithm_Base_X.vi
b. CU_Power_Of_X.vi
3. All numeric conversion functions now support all tensor types.
4. Error Dialog Box returns the full path for call chain.
5. CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi check if Input Tensor and CPU Data Array dimensions
match before pushing the data to GPU.
6. Automated the process of adding dependency DLLs when building applications.
7. The help file was updated to reflect the updated functionalities.
Bug Fix
1. Fixed input tensor types for T1D:DBL instance of CU_Inverse_Tangent_2_input.vi.
2. The functionality of T4D instances of CU_Array_Subset.vi have been corrected.
3. Fixed array max dimension (65535) issue in CU_Power_Spectrum.vi.
4. CU_Decimate_Single_Shot.vi connector pane changed to match with LabVIEW Decimate Single Shot vi.
5. Renamed the following functions to conform with the common naming conventions.
a. From “CU_Square Root.vi” to “CU_Square_Root.vi”
b. From “CU_Add Array Elements.vi” to “CU_Add_Array_Elements.vi”
6. Fixed the bug of incorrect results in CU_Square.vi for complex inputs.
7. Fixed the CPU memory leakage bug in CU_Tensor_Destroy.vi.
8. Other minor fixes.
Recent Posts
Can waveform generation be included as simple trig and linear operations like ramp and sine pattern
Many RF DSP maths require simple signals to perform operations. Making those signals takes horsepow… by norm!, 1 year, 9 months ago, 1 , 2 

Can complex number library be fleshed out with polar transforms?
Complex to polar transforms are done a TON in RF DSP. I'd love to see the impact on some core algor… by norm!, 1 year, 9 months ago, 1 , 1 