The promise of edge computing for AI workloads has evolved from experimental optimization to production-ready enterprise architecture. What began as our exploration of WASM efficiency gains has matured into a comprehensive platform strategy that leverages Cloudflare’s full spectrum of services; from Workers and AI inference to containers, durable execution, and Zero Trust security. This isn’t about forcing every workload through a WASM-shaped hole, but about orchestrating the right compute paradigm for each component while maintaining the efficiency gains that drew us to the edge in the first place.
A Pragmatic Approach
Our initial focus on pure WASM compilation through the Fidelity framework revealed both the tremendous potential and practical limitations of edge-first development. Cloudflare Containers are coming this June. Run new types of workloads on our network with an experience that is simple, scalable, global and deeply integrated with Workers. This container support fundamentally changes the architectural possibilities, allowing us to deploy traditional .NET F# workloads where they make sense while maintaining WASM’s efficiency advantages where they matter most.
The architecture now encompasses three complementary execution models:
- WASM via Fidelity/Firefly: Core business logic compiled to efficient WebAssembly modules
- JavaScript via Fable: Service orchestration and platform API integration
- Containers for Fidelity & .NET F#: Complex workloads requiring specialized capabilities or a .NET feature that’s unique to customer requirements
This pragmatic approach acknowledges that most systems are increasingly real-time and chatty, often holding open long-lived connections, performing tasks in parallel, requiring different compute strategies for different components.
The Complete Cloudflare Mosaic
Our comprehensive architecture leverages Cloudflare’s extensive service portfolio to build secure, scalable AI systems:
Identity Verification] CASB[Cloud Access Security
Broker] DLP[Data Loss Prevention] end subgraph "Application Layer" subgraph "Workers Platform" WORKER[Workers
WASM + JS] PAGES[Pages
Static Assets] EMAIL[Email Workers] end subgraph "Container Platform" CONTAINER[.NET or Fidelity
F# Containers
Complex Workloads] SANDBOX[Code Interpreter
Sandboxes] end end subgraph "AI & Compute Services" AI[Workers AI
GPU Inference] VECTOR[Vectorize
Vector Database] WORKFLOW[Workflows
Durable Execution] QUEUE[Queues
Event Processing] end subgraph "Data & Storage Layer" D1[D1
SQLite Database] R2[R2
Object Storage] KV[KV Store
Session Data] DO[Durable Objects
Stateful Coordination] end subgraph "Analytics & Monitoring" ANALYTICS[Analytics Engine
Time-Series Data] LOGS[Logpush
Audit Trail] TRACE[Trace
Distributed Tracing] end %% Security flow ZT --> WORKER ZT --> CONTAINER CASB --> R2 DLP --> D1 %% Application connections WORKER --> AI WORKER --> VECTOR WORKER --> WORKFLOW CONTAINER --> AI %% Data flow WORKFLOW --> QUEUE QUEUE --> DO DO --> D1 WORKER --> KV WORKER --> R2 %% Analytics WORKER --> ANALYTICS CONTAINER --> ANALYTICS AI --> ANALYTICS style ZT fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px style WORKER fill:#ff6d00,stroke:#e65100,stroke-width:2px,color:#ffffff style AI fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
Extending the Enterprise Security Perimeter
Zero Trust security is an IT security model that requires strict identity verification for every person and device trying to access resources on a private network, regardless of whether they are sitting within or outside of the network perimeter. Cloudflare’s Zero Trust services enable enterprises to extend their security boundary into the cloud while maintaining complete control over data access:
// F# Pulumi configuration for Zero Trust architecture
module SecurityInfrastructure =
open Pulumi.Cloudflare
let configureZeroTrust (config: EnterpriseConfig) =
// Access policies for AI services
let aiAccessPolicy =
AccessPolicy(
"ai-services-policy",
AccessPolicyArgs(
ApplicationId = config.AIApplicationId,
Precedence = 1,
Decision = "allow",
Include = [
AccessRuleArgs(
Groups = [config.DataScienceGroupId]
)
],
Require = [
AccessRuleArgs(
DevicePosture = ["compliant"]
)
],
SessionDuration = "24h"
)
)
// DLP rules for sensitive data
let dlpProfile =
DlpProfile(
"sensitive-data-protection",
DlpProfileArgs(
Type = "predefined",
Entries = [
DlpEntryArgs(Pattern = PatternArgs(Regex = @"\d{3}-\d{2}-\d{4}")) // SSN
DlpEntryArgs(Pattern = PatternArgs(Regex = @"\d{16}")) // Credit cards
],
AllowedMatchCount = 0
)
)
// Tunnel for secure connectivity
let secureTunnel =
Tunnel(
"enterprise-tunnel",
TunnelArgs(
AccountId = config.AccountId,
Name = "enterprise-ai-tunnel",
Secret = config.TunnelSecret,
ConfigSrc = "cloudflare"
)
)
{ AccessPolicy = aiAccessPolicy
DlpProfile = dlpProfile
Tunnel = secureTunnel }
This Zero Trust architecture ensures that security controls for public-facing applications have far outpaced applications on private networks, bringing enterprise-grade security to cloud AI deployments.
Container Platform: The Missing Piece
Cloudflare Containers are now available in beta for all users on paid plans. You can now run new kinds of applications alongside your Workers. This container support enables crucial capabilities:
When Containers Make Sense
// Complex F# workload requiring full .NET BCL
module EnterpriseAIProcessor =
open Microsoft.ML
open FSharp.Data
open System.Data.SqlClient
// This runs in a container with full .NET support
let processComplexMLPipeline (data: DataFrame) = async {
// Use ML.NET for complex preprocessing
let! preprocessed =
MLContext()
|> createPipeline data
|> executeWithFullBCL
// Leverage enterprise libraries
let! enriched =
enrichWithSqlServer preprocessed
|> integrateWithLegacySystems
// Complex numerical computations with MathNet
let! analyzed =
MathNet.Numerics.LinearAlgebra.Matrix.Build.DenseOfArray enriched
|> performMatrixOperations
return analyzed
}
// Container configuration in wrangler.jsonc
let containerConfig = {|
containers = [|
{|
name = "ml-processor"
image = "registry.hub.docker.com/fidelity/ml-processor:latest"
port = 8080
cpu = 2
memory = "4GB"
gpu = "nvidia-t4" // GPU support for ML workloads
|}
|]
|}
Durable Execution with Workflows
Cloudflare Workflows is now in open beta. Workflows allows building reliable, repeatable, long-lived multi-step applications that can automatically retry, persist state, and scale out. This enables sophisticated AI pipelines:
module AIWorkflows =
open CloudflareWorkflows
type DocumentProcessor() =
inherit Workflow<DocumentInput, ProcessingResult>()
override this.run(input, step) = async {
// Extract text from document
let! extracted = step.do_("extract", fun () ->
R2.fetch input.documentUrl
|> extractText
)
// Generate embeddings with retry logic
let! embedded = step.do_("embed", fun () ->
WorkersAI.createEmbedding extracted
) |> step.withRetries 3
// Store in vector database
let! stored = step.do_("store", fun () ->
Vectorize.upsert {
id = input.documentId
vector = embedded
metadata = input.metadata
}
)
// Conditionally trigger analysis
if input.requiresAnalysis then
let! analysis = step.do_("analyze", fun () ->
// This could call a container for complex ML
Container.fetch "ml-analyzer" extracted
)
// Wait for human review if needed
if analysis.requiresReview then
let! approved = step.waitForEvent("approval", TimeSpan.FromHours(24))
if approved then
do! step.do_("finalize", fun () ->
D1.insert analysis.results
)
return ProcessingResult.Success stored.id
}
Event-Driven Architecture
you can configure changes to content in any R2 bucket to trigger sophisticated processing pipelines:
module EventDrivenAI =
// R2 event triggers workflow
let configureEventPipeline() =
R2EventNotification.create {
bucket = "user-uploads"
event_types = ["object-create", "object-update"]
destination = Queue "document-processor"
}
// Queue consumer processes events
let documentQueueConsumer =
QueueConsumer.create {
queue = "document-processor"
batch_size = 10
max_retries = 3
handler = fun messages -> async {
for msg in messages do
// Trigger workflow for each document
let! workflowId =
Workflow.create "DocumentProcessor" {
documentUrl = msg.body.url
documentId = msg.body.id
requiresAnalysis = msg.body.size > 1_000_000L
metadata = msg.body.metadata
}
// Track in Analytics Engine
do! Analytics.writeDataPoint {
dataset = "document_processing"
point = {
timestamp = DateTime.UtcNow
workflowId = workflowId
documentSize = msg.body.size
}
}
}
}
Infrastructure as Code with F# Pulumi
Managing this comprehensive architecture requires sophisticated infrastructure orchestration:
module CloudflareInfrastructure =
open Pulumi
open Pulumi.Cloudflare
let deployAIPlatform() =
// Configure Workers with AI bindings
let aiWorker =
WorkersScript(
"ai-orchestrator",
WorkersScriptArgs(
Content = File.ReadAllText("./dist/worker.js"),
Module = true,
Bindings = [
WorkersScriptPlainTextBindingArgs(
Name = "AI",
Text = "@cf/meta/llama-3.1-8b-instruct"
)
WorkersScriptServiceBindingArgs(
Name = "VECTORIZE",
Service = "vectorize-index"
)
WorkersScriptR2BucketBindingArgs(
Name = "STORAGE",
BucketName = "ai-documents"
)
WorkersScriptD1DatabaseBindingArgs(
Name = "DATABASE",
DatabaseId = database.Id
)
]
)
)
// Deploy container for complex workloads
let mlContainer =
Container(
"ml-processor",
ContainerArgs(
Image = "fidelity/ml-processor:latest",
Secrets = [
Output.CreateSecret(config.GetSecret("ML_API_KEY"))
],
EnvironmentVariables = dict [
"PROCESSING_MODE", "production"
"GPU_ENABLED", "true"
],
ResourceRequirements = ContainerResourceRequirementsArgs(
Limits = ContainerResourceLimitsArgs(
Cpu = "4",
Memory = "8Gi",
Gpu = "1"
)
)
)
)
// Configure Zero Trust access
let accessApplication =
AccessApplication(
"ai-platform",
AccessApplicationArgs(
Domain = "ai.company.internal",
Type = "self_hosted",
SessionDuration = "24h",
AllowedIdps = ["azure_ad"],
AutoRedirectToIdentity = true
)
)
// Set up monitoring
let logpush =
LogpushJob(
"audit-logs",
LogpushJobArgs(
Dataset = "access_requests",
DestinationConf = "s3://audit-bucket/logs",
Ownership = LogpushOwnershipArgs(
DestinationConf = Output.CreateSecret(s3Config)
)
)
)
Output.Create({|
WorkerUrl = aiWorker.Id.Apply(fun id -> $"https://{id}.workers.dev")
ContainerEndpoint = mlContainer.Id
AccessPortal = accessApplication.Domain
|})
Performance and Cost Optimization
The multi-paradigm approach delivers measurable improvements across different workload types:
Workload Type | Traditional Approach | Optimized Architecture | Improvement |
---|---|---|---|
Simple Inference | Container + .NET Runtime | WASM Worker | 95% latency reduction |
Complex ML Pipeline | Kubernetes + GPUs | Container with GPU | 40% cost reduction |
Vector Search | Self-hosted Qdrant | Vectorize | 80% operational overhead reduction |
Batch Processing | Cron + Queue Service | Workflows + Queues | 60% reliability improvement |
Session Management | Redis Cluster | KV Store | 90% latency reduction |
Real-time Coordination | WebSocket Servers | Durable Objects | 70% infrastructure reduction |
Real-World Implementation Patterns
Pattern 1: Hybrid RAG System
module HybridRAG =
// Fast path for common queries
let cachedInference query = async {
match! KV.get (hashQuery query) with
| Some cached -> return cached
| None ->
let! embedding = WorkersAI.embed query
let! context = Vectorize.search embedding
let! response = WorkersAI.complete query context
do! KV.put (hashQuery query) response (ttl = 3600)
return response
}
// Complex path for specialized queries
let containerInference query context = async {
let! container = Container.get "specialized-llm"
return! container.process {|
query = query
context = context
use_tools = true
max_iterations = 5
|}
}
Pattern 2: Secure Document Processing
module SecureDocumentPipeline =
let processWithCompliance document = workflow {
// DLP scanning before processing
let! dlpResult = step.do_ "dlp-scan" (fun () ->
DLP.scan document.content
)
if dlpResult.hasSensitiveData then
// Route through secure container
let! redacted = step.do_ "redact" (fun () ->
Container.call "pii-redactor" document
)
// Audit log the redaction
do! step.do_ "audit" (fun () ->
Analytics.log {|
event = "sensitive_data_redacted"
document_id = document.id
timestamp = DateTime.UtcNow
|}
)
return redacted
else
// Standard processing path
return! step.do_ "process" (fun () ->
Worker.process document
)
}
Pattern 3: Progressive Enhancement
module ProgressiveDeployment =
// Start with Workers, graduate to containers
let adaptiveCompute workload =
match workload.complexity with
| Simple ->
Worker.execute workload
| Medium ->
Worker.execute workload
|> withCache KV
|> withState DurableObjects
| Complex ->
Container.execute workload
|> withGPU true
|> withMemory "8GB"
| Dynamic ->
Workflow.orchestrate [
Worker.preprocess
Container.analyze
Worker.postprocess
]
The Economics of Edge AI
The comprehensive platform approach delivers compelling economics:
- Reduced Egress Costs: Reduce costs with generous free tiers, transparent pricing, and no egress fees.
- Pay-per-use AI: No idle GPU costs with Workers AI
- Consolidated Services: Single vendor for compute, storage, security, and networking
- Operational Efficiency: Managed services reduce DevOps overhead by 70%
- Global Performance: 320+ locations without multi-region complexity
Future Evolution: The Convergence Path
Looking ahead, several Cloudflare developments will further enhance this architecture:
WebGPU and Browser-Based Inference
Progressive enhancement to client-side inference for ultimate privacy and zero latency.
Constellation: Distributed Data Platform
Analytics Engine is Cloudflare’s time-series and metrics database that allows you to write unlimited-cardinality analytics at scale, evolving toward a complete analytical platform.
Quantum-Safe Cryptography
Post-quantum security preparations ensuring long-term data protection.
Edge Databases Global Consistency
D1’s evolution toward globally distributed SQL with strong consistency guarantees.
Conclusion: Beyond Optimization to Transformation
What began as an exploration of WASM efficiency has evolved into a comprehensive platform strategy that fundamentally reimagines how enterprise AI systems should be built. By embracing Cloudflare’s full service portfolio; from Workers to containers, from Zero Trust to durable execution; we’ve moved beyond mere optimization to architectural transformation.
The key insight isn’t that WASM is faster (though it is), or that edge computing reduces latency (though it does). It’s that by choosing the right compute paradigm for each component, orchestrating services through functional composition, and leveraging platform-native capabilities, we can build AI systems that are simultaneously more secure, more efficient, and more maintainable than traditional approaches.
The Fidelity framework remains central to this vision; not as a WASM-only solution, but as a compilation strategy that can target multiple runtimes. Whether generating WASM modules through MLIR, JavaScript through Fable, or deploying containerized .NET workloads, F#’s functional paradigm provides the conceptual coherence that makes this multi-paradigm architecture manageable.
For enterprises looking to deploy AI at scale, this architecture offers a progressive path: start with Workers for immediate wins, gradually adopt platform services for operational efficiency, leverage containers where complexity demands it, and wrap everything in Zero Trust security for enterprise-grade protection. The result is an AI platform that’s ready for production today while remaining flexible enough for tomorrow’s innovations.