← heapsort-ai

distributed systems

26 items

ARTICLE↑ trendingHacker News (AI)·13d ago

AI Infra Is Nothing Like the "Classic Cloud Infra"

AI infrastructure fundamentally differs from classic cloud infrastructure due to its reliance on specialized hardware like GPUs, unique data management needs, and complex distributed computing challenges. This necessitates a distinct approach to design, deployment, and operation, moving beyond general-purpose cloud paradigms.

42
ARTICLEDEV.to AI·4/16/2026

Fail-Open Patterns: When Your AI Trading System Must Choose Graceful Degradation Over Perfection

This article discusses the critical importance of fail-open patterns in production AI trading systems, emphasizing graceful degradation over complete shutdown when components fail. It contrasts this approach with traditional fail-closed financial systems, arguing that maintaining degraded functionality is crucial for continuous operation.

31
RESEARCHarXiv CS.LG·5/4/2026

FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources

Federated Learning enables private collaborative intelligence across decentralized data sources, but multi-task scenarios face challenges due to device heterogeneity and resource inefficiency. FedACT is introduced as a novel resource heterogeneity-aware device scheduling approach to efficiently manage multiple concurrent FL jobs, aiming to minimize their average job completion time.

28
ARTICLEDEV.to AI·26d ago

Agent Discovery in 2026: DNS-SD, ACP Registries, and Pilot Protocol's Overlay Directory

The article discusses the critical challenge of agent discovery in distributed systems, highlighting three main approaches for 2026: DNS-SD for local setups, ACP-style centralized registries for multi-agent frameworks, and the distinct Pilot Protocol. It analyzes the tradeoffs of each method, considering factors like security, latency, and scalability, emphasizing that no single solution is universally correct.

27
RESEARCHarXiv CS.LG·5/4/2026

Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference

This paper re-examines the viability of cloud-based inference for latency-sensitive cyber-physical systems, challenging the assumption that on-device processing is always superior. It demonstrates that high-throughput cloud platforms can match or surpass on-device performance for real-time control tasks by amortizing network and queueing delays.

27