FriendliAI Launches InferenceSense™ to Monetize Idle GPU Capacity

No GPU fleet runs at full capacity around the clock. InferenceSense™ automatically fills idle cycles with paid AI inference workloads—and shares the revenue with you.

SAN FRANCISCO--(BUSINESS WIRE)--FriendliAI, The Frontier AI Inference Cloud, today launched Friendli InferenceSense™, the industry’s first inference monetization platform purpose-built for GPU cloud operators.

“The industry is building these massive factories, but most GPU clouds are missing the inference assembly line that actually transforms raw compute into tokens. The AI factory build-out only makes sense when it actually makes cents."
Share

InferenceSense tackles a persistent and expensive reality: GPU clusters cost billions to build and operate, yet many sit idle or underutilized for large portions of every day.

The Problem with GPU Utilization

GPU infrastructure demands massive capital outlay—a single H100 rents for ~$2.00/hour; an 8-GPU node, $16–20/hour—yet no fleet achieves 100% utilization. Training jobs are inherently bursty: they complete, and the hardware goes dark until the next run. Even fully-committed neoclouds experience idle windows between customer workloads.

Every idle GPU-hour is lost margin.

What InferenceSense™ Does

Friendli InferenceSense detects idle GPU capacity in your infrastructure and fills it with monetizable AI inference workloads. When your own workloads need the GPUs back, InferenceSense preempts immediately—your jobs always come first.

Think of it as “AdSense for GPUs”: just as digital publishers use AdSense to automatically monetize available pixel space with high-yield demand, GPU operators can now use InferenceSense to monetize every available GPU cycle.

Integration is frictionless. Operators retain full control—choosing which nodes participate, setting time-of-day schedules, and defining exactly how much spare capacity InferenceSense may use.

Demand is built in. There is no need to source inference customers independently—FriendliAI brings a ready pool of global demand for widely-used open-weight models including DeepSeek, Qwen, Kimi, GLM, and MiniMax, and dispatches workloads to partner hardware automatically. Token revenue generated on those GPUs is shared between the operator and FriendliAI, with no upfront fees and no minimum commitments.

Crucially, the operator’s own workloads always take priority. The moment a scheduler reclaims a GPU, InferenceSense gracefully vacates—monetized workloads are designed to be preempted, ensuring production jobs are never delayed.

Architecture

When InferenceSense detects available GPU capacity, it spins up secured, fully-isolated containers that serve paid AI inference workloads. Under the hood, FriendliAI’s battle-tested inference engine maximizes token throughput per GPU-hour—squeezing peak economic value from every idle cycle.

The moment your scheduler reclaims a GPU, InferenceSense’s preemption controller gracefully terminates the monetized workload and returns the hardware within seconds—zero downtime, zero disruption, zero config changes.

The Economics: From Idle to Income

The prevailing GPU cloud model charges by the hour. Between customer workloads, revenue drops to zero—but the cost of power, cooling, and depreciation never stops. InferenceSense converts that dead time into an incremental revenue stream.

The mechanics are straightforward: FriendliAI aggregates global, real-time demand for popular open-weight models—DeepSeek, Qwen, Kimi, GLM, and others—and routes paid inference workloads to partner GPUs. Partners earn a share of the token revenue generated during otherwise-empty hours. FriendliAI owns the demand pipeline, model optimization, and serving stack; the partner contributes idle capacity.

Because token generation scales with computational efficiency, monetized inference workloads can generate significantly higher economic yield per GPU-hour than traditional rental models.

There is no upfront cost and no minimum commitment. If a GPU is idle, it earns. The moment your workloads need it back, InferenceSense yields instantly. The bottom line: infrastructure that generates margin even when your own customers aren’t on it.

Why We Built This

“The modern data center isn't just a massive compute cluster—it is an AI factory, a high-performance production environment built to manufacture intelligence at scale. Yet most GPU operators act like traditional landlords, watching revenue evaporate every time a workload finishes, or a contract ends,” said Byung-Gon Chun, CEO of FriendliAI.

“The industry is building these massive factories, but most GPU clouds are still missing the inference assembly line that actually transforms raw compute into tokens—the true finished goods of this era.

InferenceSense provides that missing assembly line. Every idle GPU-hour becomes a chance to serve real AI demand and capture token revenue. We own the demand pipeline, the optimization, and the serving—our partners simply plug in and earn. The AI factory build-out only makes sense when it actually makes cents.”

Who It’s For

InferenceSense is designed for any organization operating GPU-dense infrastructure—GPU neoclouds, ML platforms, and research institutions. Any operator whose GPUs are not fully utilized around the clock is a candidate.

Get Started

Friendli InferenceSense™ is now accepting applications from qualified GPU cloud operators. To explore how InferenceSense can unlock new revenue from your existing infrastructure, contact partners@friendli.ai to schedule an executive briefing during NVIDIA GTC.

About FriendliAI

FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented the continuous batching technique that is now industry standard, FriendliAI provides AI engineers with a highly optimized engine that constantly evolves to efficiently run state-of-the-art open-weight and custom models at production scale. By maximizing GPU utilization, FriendliAI delivers speeds up to 3x faster than vLLM, and 50% to 90% cost savings relative to closed model APIs. FriendliAI empowers engineers to deploy frontier AI with uncompromising speed, model ownership, and enterprise-grade reliability.

For more information, visit www.friendli.ai.

Contacts

Media Contact
Ryan Pollock
ryan.pollock@friendli.ai

Industry:

More News From FriendliAI

FriendliAI Appoints Brian Yoo, Former Moloco COO, as Chief Business Officer to Drive Next Phase of Hypergrowth

SAN FRANCISCO--(BUSINESS WIRE)--FriendliAI, The Frontier AI Inference Cloud, today announced the appointment of Brian Yoo as Chief Business Officer. Joining from applied AI leader Moloco, Yoo will lead global commercial operations, go-to-market strategies, and partnerships as FriendliAI rapidly scales its high-performance, cost-efficient AI inference infrastructure worldwide. Yoo brings exceptional operational and strategic scaling expertise to FriendliAI. He most recently served as Chief Opera...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

FriendliAI Launches InferenceSense™ to Monetize Idle GPU Capacity

Contacts

FriendliAI

Contacts

FriendliAI Appoints Brian Yoo, Former Moloco COO, as Chief Business Officer to Drive Next Phase of Hypergrowth

FriendliAI

Contacts