vllm.config.kv_events ¶
KVEventsConfig ¶
Configuration for KV event publishing.
Source code in vllm/config/kv_events.py
buffer_steps class-attribute instance-attribute ¶
buffer_steps: int = 10000
The number of steps to cache for replay endpoint. Will only save events from the last N steps for the replay endpoint.
enable_kv_cache_events class-attribute instance-attribute ¶
enable_kv_cache_events: bool = False
If True, enable KV cache events for tracking block storage and removal. Events can be published externally by zmq using the event publisher config.
endpoint class-attribute instance-attribute ¶
endpoint: str = 'tcp://*:5557'
The zmq endpoint to use for publishing kv events.
hwm class-attribute instance-attribute ¶
hwm: int = 100000
The zmq high water mark for the event publisher. After queueing N events, events will start dropping if the consumer is not keeping up.
max_queue_size class-attribute instance-attribute ¶
max_queue_size: int = 100000
The maximum number of events to queue while waiting for publishing.
publisher class-attribute instance-attribute ¶
publisher: str = 'null'
The publisher to use for publishing kv events. Can be "null", "zmq".