GPU Buffers

GPU buffers allow you to provide data to a GpuProgram similar to a texture, but without the size limitations of textures, and with the ability to store complex data types. They are used for everything from vertex and index data to uniform parameters and arbitrary storage. The framework provides two levels of GPU buffer access:

GpuBuffer - Main-thread type that maintains a CPU-side cache. Writes are synced to the render thread automatically.
render::GpuBuffer - Render-thread type that represents the actual GPU-side buffer. Provides direct mapped memory access, flushing, and invalidation.

For most use cases you work with the main-thread GpuBuffer, which handles synchronization transparently. The render-thread variant is used inside renderer implementations and when working directly with command buffers.

Creation

To create a GpuBuffer you fill out a GpuBufferCreateInformation structure and call GpuDevice::CreateGpuBuffer. The create information provides factory methods for the different buffer types:

Factory	Purpose
`GpuBufferCreateInformation::CreateVertex`	Vertex buffers
`GpuBufferCreateInformation::CreateIndex`	Index buffers
`GpuBufferCreateInformation::CreateUniform`	Uniform/constant buffers
`GpuBufferCreateInformation::CreateSimpleStorage`	Storage buffers with primitive element format
`GpuBufferCreateInformation::CreateStructuredStorage`	Storage buffers with arbitrary element size
`GpuBufferCreateInformation::CreateStagingWrite`	CPU-writable staging buffers (copy source)
`GpuBufferCreateInformation::CreateStagingRead`	CPU-readable staging buffers (copy destination)

Storage buffers

Simple storage buffers contain primitive elements (of GpuBufferFormat format), such as floats or ints, each with up to 4 components. In HLSL these buffers are represented using Buffer or RWBuffer types. In GLSL they are represented using samplerBuffer or imageBuffer types.

// Creates a simple storage buffer with 32 elements, each a 4-component float
GpuBufferCreateInformation createInformation = GpuBufferCreateInformation::CreateSimpleStorage(BF_32X4F, 32);

SPtr<render::GpuBuffer> buffer = gpuDevice->CreateGpuBuffer(createInformation);

Structured storage buffers contain elements of arbitrary size and are usually used for storing structures of more complex data. In HLSL these buffers are represented using StructuredBuffer or RWStructuredBuffer types. In GLSL they are represented using the buffer block, also known as shared storage buffer object.

struct MyData
{
	float a;
	int b;
};

// Creates a structured storage buffer with 32 elements, each with enough size to store the MyData struct
GpuBufferCreateInformation createInformation = GpuBufferCreateInformation::CreateStructuredStorage(sizeof(MyData), 32);

SPtr<render::GpuBuffer> buffer = gpuDevice->CreateGpuBuffer(createInformation);

Memory placement flags

The GpuBufferFlag flags control where the buffer memory is stored:

GpuBufferFlag::StoreOnGPU - Buffer is placed in device memory. Fast GPU access, but CPU reads/writes require staging buffers. This is the default for most buffer types.
GpuBufferFlag::StoreOnCPUWithGPUAccess - Buffer is placed in CPU-visible memory accessible to the GPU. Faster CPU updates (no staging needed), but slower GPU access through the PCI Express bus. This is the default for uniform and staging buffers.

Suballocations

Buffers can contain multiple suballocations — logically separate regions within one physical buffer. This is more efficient than creating a separate GpuBuffer for each entry, because suballocated buffers can be bound using dynamic offsets on the command buffer. To create a buffer with suballocations, pass a suballocationCount when creating a uniform buffer:

// Creates a uniform buffer with space for 64 suballocations
GpuBufferCreateInformation createInformation = GpuBufferCreateInformation::CreateUniform(uniformSize, GpuBufferFlag::StoreOnCPUWithGPUAccess, 64);
SPtr<render::GpuBuffer> buffer = gpuDevice->CreateGpuBuffer(createInformation);

Each suballocation may be larger than the requested size due to GPU alignment requirements (typically 256 bytes for uniform buffers). Use render::GpuBuffer::GetSuballocationSize to query the actual aligned size.

A render::GpuBufferSuballocation is a lightweight handle referencing a specific suballocation within a buffer. It provides the buffer pointer, the byte offset, and the suballocation size.

Reading and writing

Main-thread GpuBuffer

The main-thread GpuBuffer maintains a CPU-side cache. All reads and writes operate on this cache, and changes are automatically synced to the render proxy:

GpuBuffer::Write - Copies data from CPU memory into the cache.
GpuBuffer::WriteTyped - Writes data with proper padding/alignment for GPU types (e.g. pads each row of a 3x3 matrix to 16 bytes).
GpuBuffer::ZeroOut - Clears a region of the cache.
GpuBuffer::Read - Reads from the cache. Only reflects CPU-written data, not GPU writes.
GpuBuffer::Map - Maps a region for direct pointer access. Returns a GpuBufferMappedScope RAII wrapper that marks the data dirty on destruction, triggering a sync.

SPtr<GpuBuffer> buffer = ...;

// Write data directly
MyData data[32];
// ... populate data
buffer->Write(0, sizeof(data), data);

// Or map for direct access
{
	GpuBufferMappedScope mappedScope = buffer->Map(0, sizeof(data), GpuMapOption::Write);
	memcpy(mappedScope.GetMappedMemory(), data, sizeof(data));
} // Automatically marks dirty on scope exit, triggering render proxy sync

Render-thread GpuBuffer

The render-thread render::GpuBuffer provides direct access to GPU memory. Unlike the main-thread variant, you must manage flushing and invalidation manually:

render::GpuBuffer::Write - Writes data directly to mapped GPU memory. Requires the buffer to be CPU-accessible (StoreOnCPUWithGPUAccess).
render::GpuBuffer::WriteTyped - Writes with padding/alignment for GPU types.
render::GpuBuffer::ZeroOut - Clears a region of the buffer.
render::GpuBuffer::Read - Reads directly from mapped GPU memory. If the GPU wrote to the buffer, you must issue execution and memory barriers, then call render::GpuBuffer::Invalidate before reading.
render::GpuBuffer::GetMappedMemory - Returns the raw persistently-mapped memory pointer, or nullptr if not mappable.
render::GpuBuffer::Flush - Makes CPU writes visible to the GPU. Only needed for non-coherent memory.
render::GpuBuffer::Invalidate - Makes GPU writes visible to the CPU. Only needed for non-coherent memory.
render::GpuBuffer::Map - Maps a region and returns a render::GpuBufferMappedScope that automatically invalidates on read mappings and flushes on write mappings when the scope exits.

SPtr<render::GpuBuffer> buffer = ...;

// Map, write, and auto-flush
{
	render::GpuBufferMappedScope mappedScope = buffer->Map(0, dataSize, GpuMapOption::Write);
	memcpy(mappedScope.GetMappedMemory(), data, dataSize);
} // Automatically flushes on scope exit

// You can also map a suballocation directly
render::GpuBufferSuballocation suballocation = ...;
{
	render::GpuBufferMappedScope mappedScope = suballocation.Map(GpuMapOption::Write);
	memcpy(mappedScope.GetMappedMemory(), data, suballocation.GetSize());
}

GpuBufferUtility

render::GpuBufferUtility provides high-level render-thread operations that handle staging buffers internally. This is the preferred way to write to GPU-only buffers from the render thread, as it transparently creates staging buffers and copy commands when needed:

render::GpuBufferUtility::Write - Writes data into a buffer. If the buffer is not CPU-writable or is currently used by the GPU, it internally creates a staging buffer and issues a copy command via the provided command buffer (or an internal transfer buffer if none is provided).
render::GpuBufferUtility::Read - Reads data from a buffer, staging if needed. Blocks if the buffer is in GPU use.
render::GpuBufferUtility::ReadAsync - Non-blocking read via a command buffer. Returns a TAsyncOp that is signaled when the data is ready.
render::GpuBufferUtility::CreateStaging - Creates a staging buffer matching the size of a given buffer.

SPtr<render::GpuBuffer> gpuOnlyBuffer = ...; // Created with StoreOnGPU

// GpuBufferUtility handles staging internally
GpuBufferUtility::Write(gpuOnlyBuffer, 0, dataSize, sourceData);

// Read with blocking
Vector<u8> readBack(dataSize);
GpuBufferUtility::Read(gpuOnlyBuffer, 0, dataSize, readBack.data());

// Non-blocking read via command buffer
TAsyncOp<SPtr<MemoryDataStream>> asyncRead = GpuBufferUtility::ReadAsync(gpuOnlyBuffer, 0, dataSize, commandBuffer);
// ... later, when the command buffer completes, the async op is signaled

The render::GpuBufferWriteFlag flags control behavior when writing to a buffer in GPU use:

render::GpuBufferWriteFlag::Normal - Default. Expects the buffer is not in GPU use.
render::GpuBufferWriteFlag::Discard - Internally reallocates buffer memory so previous GPU operations are not disturbed. Anything not written is undefined.
render::GpuBufferWriteFlag::NoOverwrite - Allows writing while the GPU is using the buffer. The caller is responsible for not writing to regions the GPU is operating on.

Binding

Once created, a buffer can be bound to a GPU program through GpuParameterSet by calling GpuParameterSet::SetStorageBuffer.

SPtr<render::GpuParameterSet> parameterSet = ...;
parameterSet->SetStorageBuffer("myBuffer", buffer);

Load-store buffers

Same as with textures, buffers can also be used for GPU program load-store operations. You simply need to set the GpuBufferFlag::AllowUnorderedAccessOnTheGPU flag in the create information before creating the buffer.

GpuBufferCreateInformation createInformation = GpuBufferCreateInformation::CreateStructuredStorage(sizeof(MyData), 32);
createInformation.Flags |= GpuBufferFlag::AllowUnorderedAccessOnTheGPU;

SPtr<render::GpuBuffer> buffer = commandBuffer->GetGpuDevice().CreateGpuBuffer(createInformation);

After that the buffer can be bound as normal, as shown above. This is different from load-store textures which have a separate set of methods for binding in GpuParameterSet.

Buffer pools

When you need many small buffer allocations of the same type, creating a separate GpuBuffer for each is wasteful. Buffer pools suballocate from larger backing buffers, reducing GPU resource overhead. Two pool types are provided:

TransientGpuBufferPool

render::TransientGpuBufferPool is a per-frame allocator where suballocations are automatically recycled after all in-flight frames complete (typically 3 frames). No manual release is needed — just call render::TransientGpuBufferPool::AdvanceFrame once per frame after submitting all command buffers.

TransientGpuBufferPool stagingPool;
stagingPool.Initialize(gpuDevice, GpuBufferCreateInformation::CreateUniform(bufferSize), 256);

// Each frame:
render::GpuBufferSuballocation suballocation = stagingPool.Allocate();
{
	render::GpuBufferMappedScope mappedScope = suballocation.Map(GpuMapOption::Write);
	memcpy(mappedScope.GetMappedMemory(), data, suballocation.GetSize());
}

// At the end of the frame, after submitting command buffers:
stagingPool.AdvanceFrame(); // Recycles allocations from 3 frames ago

GpuBufferPool

render::GpuBufferPool is a persistent allocator where allocations must be explicitly released. It supports two lifetime modes:

Manual: call render::GpuBufferPool::Allocate to get a render::GpuBufferSuballocation and render::GpuBufferPool::Release when done.
Tracked: call render::GpuBufferPool::AllocateTracked to get a render::TrackedGpuBufferSuballocation that automatically releases when destroyed (RAII).

GpuBufferPool uniformPool;
uniformPool.Initialize(gpuDevice, GpuBufferCreateInformation::CreateUniform(bufferSize), 1024);

// Manual lifetime
render::GpuBufferSuballocation suballocation = uniformPool.Allocate();
// ... use suballocation ...
uniformPool.Release(suballocation);

// Or tracked lifetime (auto-releases on destruction)
UPtr<TrackedGpuBufferSuballocation> tracked = uniformPool.AllocateTracked();
// ... use tracked (it inherits from GpuBufferSuballocation) ...
// Released automatically when tracked goes out of scope

Both pool types automatically grow by allocating new backing GpuBuffer objects when all existing suballocations are in use. Call render::GpuBufferPool::Destroy or render::TransientGpuBufferPool::Destroy to release all GPU resources when shutting down.