GPU acceleration for AI-powered tools

Update – 16 December 2023

There is now an experimental PixInsight repository to enable GPU acceleration on Windows computers in one step. Please see this post on the PixInsight forum for more details. This doesn’t apply to the Photoshop versions of any RC Astro tools (yet).

Background

On MacOS, use of a GPU (or “neural engine” on Apple silicon based Macs) to accelerate neural network computations is automatic and handled by the “CoreML” library provided by Apple. This library makes the decision whether or not a particular hardware configuration is useable to accelerate BlurXTerminator’s neural network. Most Macs of recent vintage benefit from acceleration with no additional configuration needed.

On Windows and Linux machines, the neural network computations are performed using the TensorFlow library provided by Google. A CPU-only version of this is installed by default with PixInsight on hardware that will support it. If your machine has a compatible NVIDIA GPU, it may be possible to dramatically accelerate BlurXTerminator and other neural network based tools.

Unfortunately, NVIDIA does not make it easy for small developers to license all of the additional software libraries needed to enable this. Accomplishing the acceleration is thus unfortunately a complex task that involves various downloads and installations, setting environment variables, etc. It can get even more confusing considering that the versions of all of the downloaded components must be compatible with one another.

For those who have the technical skills and feel up to this task, this page is a brief guide for Windows machines. For Linux machines, see this post from the PixInsight forum. If you are running Linux under Windows WSL, see this post. More detailed information for both Windows and Linux machines is available in this NVIDIA guide.

If you have any doubt about being able to perform these steps successfully, find a tech-savvy friend to help. This procedure will likely upgrade the graphics driver for your GPU — if this might interfere with other applications that depend on a particular graphics driver version, consider backing up your system so you can revert if neeeded.

NOTE: Due to the very large number of variations of CPU, GPU, and Operating System installations, RC Astro cannot provide individual support for GPU acceleration. Proceed at your own discretion, and seek help from tech-savvy friends or online forums for any help needed.

Note that this guide is PixInsight-centric, but most of the steps for accelerating RC Astro tools in Photoshop or Affinity Photo are the same. See the note at the end of the guide to apply the acceleration to these versions.

Compatibility

You will need an Intel/AMD x64 system running Windows 10 or later, and an NVIDIA GPU with CUDA compute capabilities of version 6.1 or higher. GPUs with less than 2GB of on-board RAM may not be sufficient. Check this NVIDIA page to verify your GPU’s capabilities.

You will also need administrator privileges for your user account to make many of the changes. Windows will likely prompt for permission to perform a number of the actions.

Update NVIDIA driver

Make sure you have the latest driver for your GPU directly from NVIDIA. Their driver download page is here.

Download and install the NVIDIA CUDA toolkit

“CUDA” stands for Compute Unified Device Architecture, NVIDIA’s name for a set of software libraries that allow for general-purpose computing to be performed on many of their graphics processors (GPUs). The CUDA toolkit installer can be downloaded from this NVIDIA page. Select Windows, x86_64, your Windows version, “exe (local),” and finally click the download button. Run the installer and select Express installation to install all components.

This will also update the graphics driver for your GPU, ensuring that it is compatible with the version of the CUDA toolkit in the download.

The installer should also set a number of environment variables that will be needed later. The files comprising the toolkit should be installed in a location such as C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8. The last bit is the toolkit version number, the latest of which is 11.8 at the time of this writing.

Download and install cuDNN files

The “cuDNN” moniker refers to yet more software libraries that provide for accelerating “deep neural network” computations on CUDA-enabled devices. Downloading this component requires an NVIDIA developer account, which can be created free of charge.

The cuDNN libraries can be downloaded from this NVIDIA page. Once you go through the process of creating an account and verifying your email address, you should be presented with a list of installers for various operating systems. Click the link labeled “Local Installer for Windows (Zip).”

Even though this is labeled as an installer, it is not. It is a collection of files in a compressed archive, only some of which are needed, and which must be copied manually to the required location. In the ZIP archive, locate the bin folder. Copy the contents of this folder to the bin folder in the CUDA toolkit installation from above, e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin.

None of the files from the lib or include folders are needed unless you will actually be developing your own GPU-accelerated neural network software.

Download and install the ZLIB compression library

There is a data compression software library that the above libraries depend on for some operations. It can be downloaded from this page. Find the link on that page for “zlib for Windows 9x/NT (DLL and static version)” and download the binaries. Decompress the downloaded archive and locate the dll_x64 folder within. Copy the file named zlibwapi.dll to the CUDA toolkit’s bin directory as above.

Download and install the GPU-enabled TensorFlow library

The TensorFlow project maintains different versions of a software library called tensorflow.dll. It is this library that BlurXTerminator and other neural network based tools use to perform computations. The GPU-enabled version of the tensorflow.dll library in turn depends on the CUDA and cuDNN libraries installed above.

The version of tensorflow.dll that is installed with PixInsight supports CPU operations only. A version that supports GPU acceleration can be downloaded from this TensorFlow link. This currently links to version 2.10 of the TensorFlow library. Other versions may be available from the TensorFlow project page.

Look for the entry labeled “Windows GPU only” and download that ZIP archive.
Decompress it and look in the lib folder within to locate the tensorflow.dll file.
Locate PixInsight’s bin folder on your hard drive, usually C:\Program Files\PixInsight\bin
Rename the tensorflow.dll file found there to something like tensorflow_cpu.dll. This is the CPU-only version of the TensorFlow library distributed with PixInsight. Renaming rather than replacing it allows you to easily revert to it if something goes wrong.
Move the new tensorflow.dll file that was downloaded into PixInsight’s bin folder.

Verify/set environment variables

Installation of the CUDA toolkit above should have set some environment variables that are needed so that the tensorflow.dlllibrary can find all of the GPU acceleration goodies. Check these environment variables as follows:

Launch the Windows environment variable editor in Control Panel
Under “System variables”, there should be a CUDA_PATH variable set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8, with perhaps a different version number depending on what you installed above.
There should also be a CUDA_PATH_V11_8 variable (or similar depending on your CUDA toolkit version) pointing to the same location.
The system Path variable should include the CUDA toolkit bin and libnvvp folders: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp.

One additional environment variable should be set to tell the TensorFlow library to only allocate as much GPU memory is needed, as opposed to its default behavior which is to allocate all GPU memory:

Create a new System environment variable named TF_FORCE_GPU_ALLOW_GROWTH, and set it to TRUE

After closing the environment variable editor, check that the changes have taken effect. Launch a DOS Command Prompt and run the set command. This will list all the currently set environment variables and their values. If you don’t see the changes that were made, it may be necessary to restart your machine for them to take effect.

Enjoy fast neural network processing

That should complete the setup for accelerating neural network based computations — launch PixInsight and enjoy. When you first execute BlurXTerminator on an image, it may take longer initially to get going — this delay is the software libraries above mapping the neural network operations to highly parallel GPU hardware operations. After this initial delay, processing should run much, much faster than before.

Any other applications or plug-ins on your machine that use TensorFlow to run neural nets should also be able to be accelerated by replacing the instances of the tensorflow.dll file that they load. RC Astro Photoshop plugins that use neural networks (e.g., StarXTerminator, NoiseXTerminator) can be accelerated by replacing the tensorflow.dll file at C:\Program Files\Common Files\RC-Astro\<Star/NoiseXTerminator> with the GPU version downloaded above.