Enter the query into the form above. You can look for specific version of a package by using @ symbol like this: gcc@10.
API method:
GET /api/packages?search=hello&page=1&limit=20
where search is your query, page is a page number and limit is a number of items on a single page. Pagination information (such as a number of pages and etc) is returned
in response headers.
If you'd like to join our channel webring send a patch to ~whereiseveryone/toys@lists.sr.ht adding your channel as an entry in channels.scm.
This package provides a GPU-accelerated library of primitives for deep neural networks, with highly tuned implementations for standard routines such as forward and backward convolution, attention, matmul, pooling, and normalization.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides the CUDA Direct Sparse Solver library.
This package provides a set of APIs which can be used at runtime to link together GPU devide code. It supports Link Time Optimization.
This package decodes (demangles) low-level identifiers that have been mangled by CUDA C++ into user readable names. For every input alphanumeric word, the output of cu++filt is either the demangled name if the name decodes to a CUDA C++ name, or the original name itself.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides a system-wide performance analysis tool designed to visualize an application’s algorithms, identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs,from large servers to small systems-on-a-chip.
This binary extracts information from CUDA binary files (both standalone and those embedded in host binaries) and presents them in human readable format. The output of cuobjdump includes CUDA assembly code for each kernel, CUDA ELF section headers, string tables, relocators and other CUDA specific sections. It also extracts embedded ptx text from host binaries.
This package provides a GPU-accelerated library of primitives for deep neural networks, with highly tuned implementations for standard routines such as forward and backward convolution, attention, matmul, pooling, and normalization.
This package provides the NVIDIA tool for debugging CUDA applications running. CUDA-GDB is an extension to GDB, the GNU Project debugger. The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware. This enables developers to debug applications without the potential variations introduced by simulation and emulation environments.
This package provides a high-level pythonic module for NVIDIA CUDA toolkit.
This package provides a set of GPU-accelerated basic linear algebra subroutines used for handling sparse matrices that perform significantly faster than CPU-only alternatives. Depending on the specific operation, the library targets matrices with sparsity ratios in the range between 70%-99.9%.
This package provides a functional correctness checking suite included in the CUDA toolkit. This suite contains multiple tools that can perform different type of checks. The memcheck tool is capable of precisely detecting and attributing out of bounds and misaligned memory access errors in CUDA applications, and can also report hardware exceptions encountered by the GPU. The racecheck tool can report shared memory data access hazards that can cause data races. The initcheck tool can report cases where the GPU performs uninitialized accesses to global memory. The synccheck tool can report cases where the application is attempting invalid usages of synchronization primitives.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
This package provides a an interactive profiler for CUDA and NVIDIA OptiX that provides detailed performance metrics and API debugging via a user interface and command-line tool. Users can run guided analysis and compare results with a customizable and data-driven user interface, as well as post-process and analyze results in their own workflows.
This package provides the NVIDIA cuBLAS library. It includes several API extensions for providing drop-in industry standard BLAS APIs and GEMM APIs with support for fusions that are highly optimized for NVIDIA GPUs. The cuBLAS library also contains extensions for batched operations, execution across multiple GPUs, and mixed- and low-precision execution with additional tuning for the best performance.
This package provides a command-line tool to profile CUDA kernels. It enables the collection of a timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels.
This package provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers. A pseudorandom sequence of numbers satisfies most of the statistical properties of a truly random sequence but is generated by a deterministic algorithm. A quasirandom sequence of -dimensional points is generated by a deterministic algorithm designed to fill an -dimensional space evenly.
This package provides the CUDA compiler and the CUDA run-time support libraries for NVIDIA GPUs, all of which are proprietary.
CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS and cuDNN.
CUTLASS decomposes these ``moving parts'' into reusable, modular software components abstracted by C++ template classes. Primitives for different levels of a conceptual parallelization hierarchy can be specialized and tuned via custom tiling sizes, data types, and other algorithmic policy. The resulting flexibility simplifies their use as building blocks within custom kernels and applications.