LlamaCPP Extension

Overview

Nitro is an inference server on top of llama.cpp. It provides an OpenAI-compatible API, queue, & scaling.

note

Nitro is the default AI engine downloaded with Jan. There is no additional setup needed.

In this guide, we'll walk you through the process of customizing your engine settings by configuring the nitro.json file

Navigate to the App Settings > Advanced > Open App Directory > ~/jan/engine folder.

cd ~/jan/engines

C:/Users/<your_user_name>/jan/engines

cd ~/jan/engines

Modify the nitro.json file based on your needs. The default settings are shown below.

~/jan/engines/nitro.json
{
  "ctx_len": 2048,
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false
}

The table below describes the parameters in the nitro.json file.

Parameter	Type	Description
`ctx_len`	Integer	Typically set at `2048`, `ctx_len` provides ample context for model operations like `GPT-3.5`. (Maximum: `4096`, Minimum: `1`)
`ngl`	Integer	Defaulted at `100`, `ngl` determines GPU layer usage.
`cpu_threads`	Integer	Determines CPU inference threads, limited by hardware and OS. (Maximum determined by system)
`cont_batching`	Integer	Controls continuous batching, enhancing throughput for LLM inference.
`embedding`	Integer	Enables embedding utilization for tasks like document-enhanced chat in RAG-based applications.

tip

By default, the value of ngl is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set ngl to 15 because most models on Mistral or Llama are around ~ 30 layers.
To utilize the embedding feature, include the JSON parameter "embedding": true. It will enable Nitro to process inferences with embedding capabilities. Please refer to the Embedding in the Nitro documentation for a more detailed explanation.
To utilize the continuous batching feature for boosting throughput and minimizing latency in large language model (LLM) inference, include cont_batching: true. For details, please refer to the Continuous Batching in the Nitro documentation.

Assistance and Support

If you have questions, please join our Discord community for support, updates, and discussions.