BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News LinkedIn Re-Architects Edge-Building System to Support Diverse Inference Workflows

LinkedIn Re-Architects Edge-Building System to Support Diverse Inference Workflows

Listen to this article -  0:00

LinkedIn has detailed its re-architected edge-building system, an evolution designed to support diverse inference workflows for delivering fresher and more personalized recommendations to members worldwide. The new architecture addresses growing demands for real-time scalability, cost efficiency, and flexibility across its global platform.

The edge-building system powers LinkedIn’s graph by recommending "edges", or connections between members and content. These recommendations are generated through inference workflows, which run machine learning models to score and rank candidate suggestions. Over time, the system has evolved to balance freshness, latency, and resource efficiency across different inference modes.

The first generation relied on offline inference pipelines, which pre-computed recommendations in bulk. While effective at an early scale, this approach lacked the freshness needed to reflect dynamic member activity. To address this, LinkedIn introduced nearline inference, running models shortly after user actions were recorded, enabling more responsive recommendations while remaining cost-efficient.

Initial architecture using an offline inference model (Source: LinkedIn Engineering Blog)

The next stage of evolution focused on online inference, enabling real-time evaluation of candidate edges at request time. This shift provided the most up-to-date recommendations but introduced latency and resource scaling challenges. To manage this complexity, LinkedIn implemented remote inference capabilities, allowing models hosted in specialized serving systems to be invoked from multiple surfaces.

The different inference models offer varying trade-offs in freshness, scalability, and efficiency:

Comparison of different inference models (Source: LinkedIn Engineering Blog)

The current architecture supports a mix of offline, nearline, online, and remote inference. A Directed Acyclic Graph (DAG) orchestrates these workflows, enabling parallel execution and flexible routing. For example, People You May Know leverages online inference for immediate updates, while large-scale content feeds continue to rely on offline computation.

To improve candidate generation, LinkedIn has adopted Embedding-Based Retrieval (EBR), which creates embeddings from member profiles and retrieves relevant candidates from a vector store. These candidates are then scored online and merged with outputs from other workflows, enhancing both diversity and relevance.

 

Current architecture supporting diverse reference models (Source: LinkedIn Engineering Blog)

Ensuring consistency across workflows at LinkedIn’s scale required significant investment in shared feature stores, model management frameworks, and distributed serving infrastructure.

As emphasized by Yi-Wen Liu, engineer at LinkedIn:

By decoupling workflows and supporting multiple inference strategies, we can flexibly balance freshness, scalability, and cost while continuing to deliver meaningful recommendations to our members.

According to LinkedIn engineers, the evolved edge-building system enables more efficient experiments and improved engagement through A/B testing. It also opens strategic opportunities in AI productivity, cost optimization, adoption of large language models and transformers, embedding-based retrieval, and advanced modeling techniques like graph neural networks and sequential modelsβ€”together enabling more timely, personalized, and actionable recommendations.

About the Author

Rate this Article

Adoption
Style

BT