Can I fine-tune a 70B LLM on a workstation in India?

Yes, with QLoRA (Quantized Low-Rank Adaptation — a technique that reduces the precision of model weights to 4-bit during fine-tuning, drastically reducing VRAM requirements) and gradient checkpointing, a 70B parameter LLM (like Llama 3.1 70B or Falcon 180B in quantized form) can be fine-tuned on 2× RTX 5090 (64 GB total VRAM) in approximately 7–14 days depending on dataset size. Full-precision (BF16) 70B fine-tuning requires 140+ GB VRAM and is only practical with 4× A100 80GB or cloud H100 instances. For most Indian ML engineers, QLoRA on 2× RTX 5090 is the practical on-premise path.

Does RTX 5090 support NVLink for multi-GPU training in India?

The RTX 5090 (consumer Blackwell GPU) does not support NVLink (NVIDIA's high-bandwidth GPU-to-GPU interconnect). Consumer GeForce GPUs use PCIe 5.0 ×16 for inter-GPU communication, which provides approximately 128 GB/s bandwidth versus NVLink's 900 GB/s on H100 SXM. For model-parallel training (splitting large models across GPUs), this PCIe bandwidth limitation makes multi-GPU efficiency lower than NVLink systems. In practice, for data-parallel fine-tuning (each GPU processes different batches of training data), PCIe 5.0 is adequate. For model-parallel inference of large models, RTX 6000 Ada (Quadro — supports NVLink bridge) is the consumer-workstation NVLink option.

How much VRAM do I need for on-premise ML in India?

VRAM requirements scale with model size and training technique: 7B model (BF16 training): 24 GB minimum; 7B model (QLoRA 4-bit): 8 GB; 13B model (QLoRA 4-bit): 12 GB; 70B model (QLoRA 4-bit): 48–64 GB; 70B model (BF16 full fine-tune): 140+ GB. For inference (not training) of a 70B Q4 quantized model: 40 GB VRAM minimum. Two RTX 5090s at 64 GB combined cover 70B QLoRA fine-tuning and 70B Q4 inference comfortably. For larger models or full-precision work, NVIDIA H100 NVL (cloud or leased) is the practical Indian option.

What CPU pairs best with dual RTX 5090 for ML workloads in India?

The AMD Threadripper 7980X (TRX50 platform) provides 96 PCIe 5.0 lanes, enabling two RTX 5090s to run at full PCIe 5.0 ×16 bandwidth simultaneously. Consumer AM5 or LGA1851 platforms provide 24–40 PCIe lanes total — two GPUs would share bandwidth, reducing data-parallel training throughput. The Threadripper TRX50 platform also supports 256 GB ECC DDR5, important for data preprocessing pipelines that load large training datasets (100 GB+ tokenised corpora) into memory before each epoch. Cost in India: Threadripper 7980X approximately ₹2,20,000–2,60,000 + ASUS Pro WS TRX50-SAGE WIFI at ₹1,20,000–1,40,000.

Best Workstation for ML Engineers India 2026

On-premise vs. cloud for 70B fine-tuning in India

Indian ML engineers increasingly face a pragmatic decision: rent H100 instances on AWS or Azure at ₹500–1,500 per GPU-hour, or build an on-premise workstation. For a team fine-tuning models regularly (weekly or monthly), on-premise breaks even in 3–6 months versus cloud. For researchers running occasional experiments, cloud is more cost-effective. This guide covers the on-premise path for teams who have crossed that break-even threshold.

GPU: dual RTX 5090 — VRAM capacity planning

Why VRAM is the primary constraint

LLM fine-tuning is entirely VRAM-bound. The model weights, optimizer state (for AdamW — the standard optimizer — this is 2× the model size in mixed precision), and activation gradients must all fit in VRAM simultaneously. A 70B parameter model in BF16 (16-bit floating point, the standard training precision) requires approximately 140 GB VRAM for fine-tuning. Two RTX 5090s provide 64 GB total VRAM — insufficient for full-precision 70B fine-tuning. However, QLoRA (Quantized Low-Rank Adaptation) loads the base model in 4-bit precision (~35 GB for 70B) while training only small LoRA adapter weights in 16-bit precision, bringing the total VRAM requirement to approximately 50–60 GB for 70B QLoRA fine-tuning. Two RTX 5090s handle this with 4–14 GB headroom depending on batch size.

RTX 5090 in India — pricing and sourcing

The NVIDIA RTX 5090 (Blackwell GB202, 32 GB GDDR7, 575W TDP) is priced approximately ₹2,00,000–2,30,000 per card in India through authorised ASUS, MSI, and Gigabyte AIB partners. Two cards total ₹4,00,000–4,60,000. The PSU must supply at least 200W headroom above dual-GPU TDP — a 1600W PSU (Seasonic TX-1600 or Corsair AX1600i at ₹30,000–45,000) is the correct choice for a dual-RTX-5090 system.

NVLink alternatives for consumer GPU multi-GPU training

Since RTX 5090 lacks NVLink, multi-GPU training uses PCIe 5.0 ×16 per GPU (via the TRX50 platform's ample PCIe lanes). PCIe 5.0 ×16 provides approximately 128 GB/s bidirectional bandwidth. For data-parallel training (each GPU processes different mini-batches and gradients are averaged across GPUs via all-reduce) this is sufficient — gradient synchronization during a typical training step is a fraction of total training time. For model-parallel training (the model is split across GPUs, with each layer on a different GPU) — which is required for models larger than a single GPU's VRAM — PCIe bandwidth becomes a bottleneck. The workaround for on-premise model-parallel 70B training is QLoRA combined with gradient checkpointing (recomputing intermediate activations during backpropagation instead of storing them, trading compute time for VRAM).

CPU: Threadripper 7980X for PCIe lane count

The AMD Threadripper 7980X on the TRX50 platform provides 88 usable PCIe 5.0 lanes. Two RTX 5090s each running at PCIe 5.0 ×16 occupy 32 lanes, leaving 56 lanes for NVMe arrays (training scratch), 10GbE networking, and future expansion. Consumer platforms (AM5 Intel Core Ultra 9) provide 24–44 PCIe lanes total — two GPUs would share bandwidth, with one or both dropping to ×8, reducing effective training throughput by 15–25% in bandwidth-sensitive workloads. The Threadripper 7980X costs approximately ₹2,20,000–2,60,000 in India.

RAM: 256 GB DDR5 for data preprocessing pipelines

Training pipeline preprocessing (tokenisation — converting raw text to integer token IDs, data augmentation, batch assembly) runs on the CPU while the GPU trains. For datasets above 50 GB (a typical instruction fine-tuning dataset for a 70B model runs 20–100 GB), 256 GB DDR5 enables loading the full dataset into memory once rather than re-reading from disk each epoch. This reduces epoch time significantly on datasets with slow NVMe I/O saturation. 256 GB (4×64 GB DDR5-4800) costs approximately ₹1,30,000–1,60,000 on the TRX50 platform through authorised distributors.

Thermal management for 24/7 ML training

Dual RTX 5090s at full training load produce approximately 1,150W of heat inside the case. At Indian summer ambient temperatures (38–42°C), this creates severe thermal stress. A 480mm or 420mm AIO for the CPU (to free case airflow for the GPUs), positive case pressure (front intake fans exceeding rear exhaust), and a full-tower case with mesh front panel are mandatory. GPU temperatures must stay below 83°C for sustained compute throughput — above this, NVIDIA GPUs throttle clock speeds to protect themselves, extending training time. If the training room does not have air conditioning, a dedicated portable AC unit pointed at the workstation intake is a practical solution during summer training runs. Our workstation repair service handles thermal diagnosis for ML systems with unexpectedly long training times — in many cases, thermal paste replacement and dust clearing restore training throughput without hardware replacement.

Best workstation for ML engineers: 70B fine-tuning on-premise India 2026

Key takeaways

On-premise vs. cloud for 70B fine-tuning in India

GPU: dual RTX 5090 — VRAM capacity planning

Why VRAM is the primary constraint

RTX 5090 in India — pricing and sourcing

NVLink alternatives for consumer GPU multi-GPU training

CPU: Threadripper 7980X for PCIe lane count

RAM: 256 GB DDR5 for data preprocessing pipelines

Thermal management for 24/7 ML training

LRW Engineer Team

ML engineer workstation India 2026 — FAQ

Other repairs customers book alongside this service

Workstation Repair

Thermal Service

RAM Upgrade

Data Recovery

Hyderabad customers, in their own words.

JUSTDIAL REVIEWS

ML workstation overheating or throttling? We’re at your door today.

Key takeaways

On-premise vs. cloud for 70B fine-tuning in India

GPU: dual RTX 5090 — VRAM capacity planning

Why VRAM is the primary constraint

RTX 5090 in India — pricing and sourcing

NVLink alternatives for consumer GPU multi-GPU training

CPU: Threadripper 7980X for PCIe lane count

RAM: 256 GB DDR5 for data preprocessing pipelines

Thermal management for 24/7 ML training

LRW Engineer Team

ML engineer workstation India 2026 — FAQ

Other repairs customers book alongside this service

Workstation Repair

Thermal Service

RAM Upgrade

Data Recovery

Need it fixed? Doorstep service in Hyderabad

Hyderabad customers, in their own words.

JUSTDIAL REVIEWS

ML workstation overheating or throttling? We’re at your door today.