Leaner, Smarter AI Cloud Systems

Leveraging F# and Fidelity To Build An Efficient AI Powerhouse With Cloudflare
work-single-image

The promise of edge computing for AI workloads has evolved from experimental optimization to production-ready enterprise architecture. What began as our exploration of WASM efficiency gains has matured into a comprehensive platform strategy that leverages Cloudflare’s full spectrum of services; from Workers and AI inference to containers, durable execution, and Zero Trust security. This isn’t about forcing every workload through a WASM-shaped hole, but about orchestrating the right compute paradigm for each component while maintaining the efficiency gains that drew us to the edge in the first place.

A Pragmatic Approach

Our initial focus on pure WASM compilation through the Fidelity framework revealed both the tremendous potential and practical limitations of edge-first development. Cloudflare Containers are coming this June. Run new types of workloads on our network with an experience that is simple, scalable, global and deeply integrated with Workers. This container support fundamentally changes the architectural possibilities, allowing us to deploy traditional .NET F# workloads where they make sense while maintaining WASM’s efficiency advantages where they matter most.

The architecture now encompasses three complementary execution models:

  1. WASM via Fidelity/Firefly: Core business logic compiled to efficient WebAssembly modules
  2. JavaScript via Fable: Service orchestration and platform API integration
  3. Containers for Fidelity & .NET F#: Complex workloads requiring specialized capabilities or a .NET feature that’s unique to customer requirements

This pragmatic approach acknowledges that most systems are increasingly real-time and chatty, often holding open long-lived connections, performing tasks in parallel, requiring different compute strategies for different components.

The Complete Cloudflare Mosaic

Our comprehensive architecture leverages Cloudflare’s extensive service portfolio to build secure, scalable AI systems:

%%{init: {'theme': 'neutral'}}%% graph LR subgraph "Zero Trust Security Perimeter" ZT[Cloudflare Access
Identity Verification] CASB[Cloud Access Security
Broker] DLP[Data Loss Prevention] end subgraph "Application Layer" subgraph "Workers Platform" WORKER[Workers
WASM + JS] PAGES[Pages
Static Assets] EMAIL[Email Workers] end subgraph "Container Platform" CONTAINER[.NET or Fidelity
F# Containers
Complex Workloads] SANDBOX[Code Interpreter
Sandboxes] end end subgraph "AI & Compute Services" AI[Workers AI
GPU Inference] VECTOR[Vectorize
Vector Database] WORKFLOW[Workflows
Durable Execution] QUEUE[Queues
Event Processing] end subgraph "Data & Storage Layer" D1[D1
SQLite Database] R2[R2
Object Storage] KV[KV Store
Session Data] DO[Durable Objects
Stateful Coordination] end subgraph "Analytics & Monitoring" ANALYTICS[Analytics Engine
Time-Series Data] LOGS[Logpush
Audit Trail] TRACE[Trace
Distributed Tracing] end %% Security flow ZT --> WORKER ZT --> CONTAINER CASB --> R2 DLP --> D1 %% Application connections WORKER --> AI WORKER --> VECTOR WORKER --> WORKFLOW CONTAINER --> AI %% Data flow WORKFLOW --> QUEUE QUEUE --> DO DO --> D1 WORKER --> KV WORKER --> R2 %% Analytics WORKER --> ANALYTICS CONTAINER --> ANALYTICS AI --> ANALYTICS style ZT fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px style WORKER fill:#ff6d00,stroke:#e65100,stroke-width:2px,color:#ffffff style AI fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px

Extending the Enterprise Security Perimeter

Zero Trust security is an IT security model that requires strict identity verification for every person and device trying to access resources on a private network, regardless of whether they are sitting within or outside of the network perimeter. Cloudflare’s Zero Trust services enable enterprises to extend their security boundary into the cloud while maintaining complete control over data access:

// F# Pulumi configuration for Zero Trust architecture
module SecurityInfrastructure =
    open Pulumi.Cloudflare
    
    let configureZeroTrust (config: EnterpriseConfig) =
        // Access policies for AI services
        let aiAccessPolicy = 
            AccessPolicy(
                "ai-services-policy",
                AccessPolicyArgs(
                    ApplicationId = config.AIApplicationId,
                    Precedence = 1,
                    Decision = "allow",
                    Include = [
                        AccessRuleArgs(
                            Groups = [config.DataScienceGroupId]
                        )
                    ],
                    Require = [
                        AccessRuleArgs(
                            DevicePosture = ["compliant"]
                        )
                    ],
                    SessionDuration = "24h"
                )
            )
        
        // DLP rules for sensitive data
        let dlpProfile = 
            DlpProfile(
                "sensitive-data-protection",
                DlpProfileArgs(
                    Type = "predefined",
                    Entries = [
                        DlpEntryArgs(Pattern = PatternArgs(Regex = @"\d{3}-\d{2}-\d{4}")) // SSN
                        DlpEntryArgs(Pattern = PatternArgs(Regex = @"\d{16}")) // Credit cards
                    ],
                    AllowedMatchCount = 0
                )
            )
        
        // Tunnel for secure connectivity
        let secureTunnel = 
            Tunnel(
                "enterprise-tunnel",
                TunnelArgs(
                    AccountId = config.AccountId,
                    Name = "enterprise-ai-tunnel",
                    Secret = config.TunnelSecret,
                    ConfigSrc = "cloudflare"
                )
            )
        
        { AccessPolicy = aiAccessPolicy
          DlpProfile = dlpProfile
          Tunnel = secureTunnel }

This Zero Trust architecture ensures that security controls for public-facing applications have far outpaced applications on private networks, bringing enterprise-grade security to cloud AI deployments.

Container Platform: The Missing Piece

Cloudflare Containers are now available in beta for all users on paid plans. You can now run new kinds of applications alongside your Workers. This container support enables crucial capabilities:

When Containers Make Sense

// Complex F# workload requiring full .NET BCL
module EnterpriseAIProcessor =
    open Microsoft.ML
    open FSharp.Data
    open System.Data.SqlClient
    
    // This runs in a container with full .NET support
    let processComplexMLPipeline (data: DataFrame) = async {
        // Use ML.NET for complex preprocessing
        let! preprocessed = 
            MLContext()
            |> createPipeline data
            |> executeWithFullBCL
        
        // Leverage enterprise libraries
        let! enriched = 
            enrichWithSqlServer preprocessed
            |> integrateWithLegacySystems
        
        // Complex numerical computations with MathNet
        let! analyzed = 
            MathNet.Numerics.LinearAlgebra.Matrix.Build.DenseOfArray enriched
            |> performMatrixOperations
            
        return analyzed
    }
    
    // Container configuration in wrangler.jsonc
    let containerConfig = {|
        containers = [|
            {| 
                name = "ml-processor"
                image = "registry.hub.docker.com/fidelity/ml-processor:latest"
                port = 8080
                cpu = 2
                memory = "4GB"
                gpu = "nvidia-t4" // GPU support for ML workloads
            |}
        |]
    |}

Durable Execution with Workflows

Cloudflare Workflows is now in open beta. Workflows allows building reliable, repeatable, long-lived multi-step applications that can automatically retry, persist state, and scale out. This enables sophisticated AI pipelines:

module AIWorkflows =
    open CloudflareWorkflows
    
    type DocumentProcessor() =
        inherit Workflow<DocumentInput, ProcessingResult>()
        
        override this.run(input, step) = async {
            // Extract text from document
            let! extracted = step.do_("extract", fun () ->
                R2.fetch input.documentUrl
                |> extractText
            )
            
            // Generate embeddings with retry logic
            let! embedded = step.do_("embed", fun () ->
                WorkersAI.createEmbedding extracted
            ) |> step.withRetries 3
            
            // Store in vector database
            let! stored = step.do_("store", fun () ->
                Vectorize.upsert {
                    id = input.documentId
                    vector = embedded
                    metadata = input.metadata
                }
            )
            
            // Conditionally trigger analysis
            if input.requiresAnalysis then
                let! analysis = step.do_("analyze", fun () ->
                    // This could call a container for complex ML
                    Container.fetch "ml-analyzer" extracted
                )
                
                // Wait for human review if needed
                if analysis.requiresReview then
                    let! approved = step.waitForEvent("approval", TimeSpan.FromHours(24))
                    
                    if approved then
                        do! step.do_("finalize", fun () ->
                            D1.insert analysis.results
                        )
            
            return ProcessingResult.Success stored.id
        }

Event-Driven Architecture

you can configure changes to content in any R2 bucket to trigger sophisticated processing pipelines:

module EventDrivenAI =
    // R2 event triggers workflow
    let configureEventPipeline() =
        R2EventNotification.create {
            bucket = "user-uploads"
            event_types = ["object-create", "object-update"]
            destination = Queue "document-processor"
        }
    
    // Queue consumer processes events
    let documentQueueConsumer = 
        QueueConsumer.create {
            queue = "document-processor"
            batch_size = 10
            max_retries = 3
            
            handler = fun messages -> async {
                for msg in messages do
                    // Trigger workflow for each document
                    let! workflowId = 
                        Workflow.create "DocumentProcessor" {
                            documentUrl = msg.body.url
                            documentId = msg.body.id
                            requiresAnalysis = msg.body.size > 1_000_000L
                            metadata = msg.body.metadata
                        }
                    
                    // Track in Analytics Engine
                    do! Analytics.writeDataPoint {
                        dataset = "document_processing"
                        point = {
                            timestamp = DateTime.UtcNow
                            workflowId = workflowId
                            documentSize = msg.body.size
                        }
                    }
            }
        }

Infrastructure as Code with F# Pulumi

Managing this comprehensive architecture requires sophisticated infrastructure orchestration:

module CloudflareInfrastructure =
    open Pulumi
    open Pulumi.Cloudflare
    
    let deployAIPlatform() =
        // Configure Workers with AI bindings
        let aiWorker = 
            WorkersScript(
                "ai-orchestrator",
                WorkersScriptArgs(
                    Content = File.ReadAllText("./dist/worker.js"),
                    Module = true,
                    Bindings = [
                        WorkersScriptPlainTextBindingArgs(
                            Name = "AI",
                            Text = "@cf/meta/llama-3.1-8b-instruct"
                        )
                        WorkersScriptServiceBindingArgs(
                            Name = "VECTORIZE",
                            Service = "vectorize-index"
                        )
                        WorkersScriptR2BucketBindingArgs(
                            Name = "STORAGE",
                            BucketName = "ai-documents"
                        )
                        WorkersScriptD1DatabaseBindingArgs(
                            Name = "DATABASE",
                            DatabaseId = database.Id
                        )
                    ]
                )
            )
        
        // Deploy container for complex workloads
        let mlContainer = 
            Container(
                "ml-processor",
                ContainerArgs(
                    Image = "fidelity/ml-processor:latest",
                    Secrets = [
                        Output.CreateSecret(config.GetSecret("ML_API_KEY"))
                    ],
                    EnvironmentVariables = dict [
                        "PROCESSING_MODE", "production"
                        "GPU_ENABLED", "true"
                    ],
                    ResourceRequirements = ContainerResourceRequirementsArgs(
                        Limits = ContainerResourceLimitsArgs(
                            Cpu = "4",
                            Memory = "8Gi",
                            Gpu = "1"
                        )
                    )
                )
            )
        
        // Configure Zero Trust access
        let accessApplication = 
            AccessApplication(
                "ai-platform",
                AccessApplicationArgs(
                    Domain = "ai.company.internal",
                    Type = "self_hosted",
                    SessionDuration = "24h",
                    AllowedIdps = ["azure_ad"],
                    AutoRedirectToIdentity = true
                )
            )
        
        // Set up monitoring
        let logpush = 
            LogpushJob(
                "audit-logs",
                LogpushJobArgs(
                    Dataset = "access_requests",
                    DestinationConf = "s3://audit-bucket/logs",
                    Ownership = LogpushOwnershipArgs(
                        DestinationConf = Output.CreateSecret(s3Config)
                    )
                )
            )
        
        Output.Create({|
            WorkerUrl = aiWorker.Id.Apply(fun id -> $"https://{id}.workers.dev")
            ContainerEndpoint = mlContainer.Id
            AccessPortal = accessApplication.Domain
        |})

Performance and Cost Optimization

The multi-paradigm approach delivers measurable improvements across different workload types:

Workload TypeTraditional ApproachOptimized ArchitectureImprovement
Simple InferenceContainer + .NET RuntimeWASM Worker95% latency reduction
Complex ML PipelineKubernetes + GPUsContainer with GPU40% cost reduction
Vector SearchSelf-hosted QdrantVectorize80% operational overhead reduction
Batch ProcessingCron + Queue ServiceWorkflows + Queues60% reliability improvement
Session ManagementRedis ClusterKV Store90% latency reduction
Real-time CoordinationWebSocket ServersDurable Objects70% infrastructure reduction

Real-World Implementation Patterns

Pattern 1: Hybrid RAG System

module HybridRAG =
    // Fast path for common queries
    let cachedInference query = async {
        match! KV.get (hashQuery query) with
        | Some cached -> return cached
        | None ->
            let! embedding = WorkersAI.embed query
            let! context = Vectorize.search embedding
            let! response = WorkersAI.complete query context
            do! KV.put (hashQuery query) response (ttl = 3600)
            return response
    }
    
    // Complex path for specialized queries
    let containerInference query context = async {
        let! container = Container.get "specialized-llm"
        return! container.process {|
            query = query
            context = context
            use_tools = true
            max_iterations = 5
        |}
    }

Pattern 2: Secure Document Processing

module SecureDocumentPipeline =
    let processWithCompliance document = workflow {
        // DLP scanning before processing
        let! dlpResult = step.do_ "dlp-scan" (fun () ->
            DLP.scan document.content
        )
        
        if dlpResult.hasSensitiveData then
            // Route through secure container
            let! redacted = step.do_ "redact" (fun () ->
                Container.call "pii-redactor" document
            )
            
            // Audit log the redaction
            do! step.do_ "audit" (fun () ->
                Analytics.log {|
                    event = "sensitive_data_redacted"
                    document_id = document.id
                    timestamp = DateTime.UtcNow
                |}
            )
            
            return redacted
        else
            // Standard processing path
            return! step.do_ "process" (fun () ->
                Worker.process document
            )
    }

Pattern 3: Progressive Enhancement

module ProgressiveDeployment =
    // Start with Workers, graduate to containers
    let adaptiveCompute workload = 
        match workload.complexity with
        | Simple -> 
            Worker.execute workload
        | Medium -> 
            Worker.execute workload 
            |> withCache KV
            |> withState DurableObjects
        | Complex -> 
            Container.execute workload
            |> withGPU true
            |> withMemory "8GB"
        | Dynamic ->
            Workflow.orchestrate [
                Worker.preprocess
                Container.analyze
                Worker.postprocess
            ]

The Economics of Edge AI

The comprehensive platform approach delivers compelling economics:

  • Reduced Egress Costs: Reduce costs with generous free tiers, transparent pricing, and no egress fees.
  • Pay-per-use AI: No idle GPU costs with Workers AI
  • Consolidated Services: Single vendor for compute, storage, security, and networking
  • Operational Efficiency: Managed services reduce DevOps overhead by 70%
  • Global Performance: 320+ locations without multi-region complexity

Future Evolution: The Convergence Path

Looking ahead, several Cloudflare developments will further enhance this architecture:

WebGPU and Browser-Based Inference

Progressive enhancement to client-side inference for ultimate privacy and zero latency.

Constellation: Distributed Data Platform

Analytics Engine is Cloudflare’s time-series and metrics database that allows you to write unlimited-cardinality analytics at scale, evolving toward a complete analytical platform.

Quantum-Safe Cryptography

Post-quantum security preparations ensuring long-term data protection.

Edge Databases Global Consistency

D1’s evolution toward globally distributed SQL with strong consistency guarantees.

Conclusion: Beyond Optimization to Transformation

What began as an exploration of WASM efficiency has evolved into a comprehensive platform strategy that fundamentally reimagines how enterprise AI systems should be built. By embracing Cloudflare’s full service portfolio; from Workers to containers, from Zero Trust to durable execution; we’ve moved beyond mere optimization to architectural transformation.

The key insight isn’t that WASM is faster (though it is), or that edge computing reduces latency (though it does). It’s that by choosing the right compute paradigm for each component, orchestrating services through functional composition, and leveraging platform-native capabilities, we can build AI systems that are simultaneously more secure, more efficient, and more maintainable than traditional approaches.

The Fidelity framework remains central to this vision; not as a WASM-only solution, but as a compilation strategy that can target multiple runtimes. Whether generating WASM modules through MLIR, JavaScript through Fable, or deploying containerized .NET workloads, F#’s functional paradigm provides the conceptual coherence that makes this multi-paradigm architecture manageable.

For enterprises looking to deploy AI at scale, this architecture offers a progressive path: start with Workers for immediate wins, gradually adopt platform services for operational efficiency, leverage containers where complexity demands it, and wrap everything in Zero Trust security for enterprise-grade protection. The result is an AI platform that’s ready for production today while remaining flexible enough for tomorrow’s innovations.

Author
Houston Haynes
date
August 18, 2025
category
Architecture
reference:

We want to hear from you!

Contact Us