|
VexCL
|
Sparse matrix in hybrid ELL-CSR format. More...
#include <spmat.hpp>
Public Types | |
| typedef val_t | value_type |
Public Member Functions | |
| SpMat () | |
| Empty constructor. | |
| SpMat (const std::vector< cl::CommandQueue > &queue, size_t n, size_t m, const idx_t *row, const col_t *col, const val_t *val) | |
| Constructor. More... | |
| void | mul (const vex::vector< val_t > &x, vex::vector< val_t > &y, val_t alpha=1, bool append=false) const |
| Matrix-vector multiplication. More... | |
| size_t | rows () const |
| Number of rows. | |
| size_t | cols () const |
| Number of columns. | |
| size_t | nonzeros () const |
| Number of non-zero entries. | |
Static Public Member Functions | |
| static std::string | inline_preamble (const cl::Device &device, int component, int position, kernel_generator_state &) |
| static std::string | inline_expression (const cl::Device &device, int component, int position, kernel_generator_state &) |
| static std::string | inline_parameters (const cl::Device &device, int component, int position, kernel_generator_state &) |
| static void | inline_arguments (cl::Kernel &kernel, uint device, size_t, uint &position, const SpMat &A, const vector< val_t > &x, kernel_generator_state &) |
Sparse matrix in hybrid ELL-CSR format.
|
inline |
Constructor.
Constructs GPU representation of the matrix. Input matrix is in CSR format. GPU matrix utilizes ELL format and is split equally across all compute devices. When there are more than one device, secondary queue can be used to perform transfer of ghost values across GPU boundaries in parallel with computation kernel.
| queue | vector of queues. Each queue represents one compute device. |
| n | number of rows in the matrix. |
| m | number of cols in the matrix. |
| row | row index into col and val vectors. |
| col | column numbers of nonzero elements of the matrix. |
| val | values of nonzero elements of the matrix. |
References vex::is_cpu(), vex::qctx(), and vex::qdev().
|
inline |
Matrix-vector multiplication.
Matrix vector multiplication (
or
) is performed in parallel on all registered compute devices. Ghost values of x are transfered across GPU boundaries as needed.
| x | input vector. |
| y | output vector. |
| alpha | coefficient in front of matrix-vector product |
| append | if set, matrix-vector product is appended to y. Otherwise, y is replaced with matrix-vector product. |
References vex::alignup(), vex::build_sources(), vex::bytes(), vex::kernel_workgroup_size(), vex::qctx(), and vex::qdev().
Referenced by vex::device_spmv_perf().
1.8.3.1