About OvalEdge
OvalEdge is an enterprise-grade AI-powered Data Intelligence platform that unifies Data Catalog, Data Governance, Data Quality, and Analytics into a single product offered in both SaaS & on-prem versions. Trusted by Fortune 500 companies globally, OvalEdge enables organizations to discover, govern, and operationalize their data assets at scale — accelerating data-driven decisions while ensuring compliance and trust.
The Role
We are looking for a Senior Enterprise Architect to serve as the primary technical authority for the OvalEdge AI Product Platform. This is a hands-on leadership role — you will own the end-to-end architecture strategy for both our SaaS & on-prem platforms spanning Generative AI, Agentic AI, RAG, Data Governance, Cloud-native infrastructure, and Enterprise Integrations while working closely with domain leads for execution.
You have done this before. You have shipped AI capabilities to production — not just designed them on whiteboards. You are equally comfortable presenting architecture strategy to the C-suite and doing a deep-dive code review with an engineering team. You thrive in a fast-moving product company where architecture decisions have direct customer impact.
What You'll Do
Architecture Vision & Governance
• Own the technology architecture strategy and maintain a rolling 24-month roadmap aligned to product and business objectives.
• Establish and enforce architecture standards, design principles, and decision records across all engineering initiatives.
• Lead the Architecture council and chair the Architecture Review Board; conduct design reviews for all major platform initiatives.
• Identify and proactively retire technical debt; ensure extensibility for emerging AI technologies.
• Manage, scale and evolve architecture of one product across multiple platforms (SaaS and on-prem).
AI & Agentic Platform
• Design and evolve production-grade Agentic AI systems — multi-agent orchestration, hierarchical supervisor-agent frameworks, goal-oriented task decomposition, and long-running autonomous workflows.
• Define standards for Generative AI integration: multi-LLM routing, model abstraction layers, prompt and context engineering, token management, and cost optimization across OpenAI, Anthropic, Gemini, and open-source models.
• Architect RAG pipelines end-to-end: vector databases, embedding strategies, retrieval optimization, hallucination mitigation, and evaluation frameworks.
• Ensure Responsible AI practices — AI security, data privacy, bias controls, and governance compliance — are baked into every AI system design.
• Drive LLMOps maturity: model versioning, observability, drift detection, and automated evaluation in production.
• Define and enforce AI evaluation frameworks in production — including DeepEval, RAGAS, TruLens, and Promptfoo or similar tools — as quality gates within CI/CD pipelines, not as one-off experiments.
• Establish continuous LLM evaluation: automated regression testing for response quality, faithfulness, context precision, and answer relevance across model upgrades and prompt changes.
SaaS & Cloud Platform
• Own the architecture for OvalEdge's SaaS platform on AWS — ECS/EKS, Lambda, S3, RDS, OpenSearch, Bedrock, IAM, and CloudWatch, and our licensed on-prem platform.
• Design for elastic scaling, disaster recovery, cost efficiency, and 99.9%+ availability SLAs.
• Establish self-healing, self-monitoring, and auto-remediation capabilities; drive observability and reliability engineering maturity.
• Define containerization, CI/CD, and Infrastructure-as-Code standards across engineering teams.
Data & Analytics Architecture
• Design scalable data catalog, governance, and analytics architectures — semantic data layers, query optimization, in-memory analytics, and AI-assisted analysis pipelines.
• Architect MCP-based enterprise integration services: tool discovery, agent interoperability, REST and event-driven APIs, and partner SDK frameworks.
Engineering Excellence & Team Leadership
• Build and run a structured mentorship program across AI, Platform, and Architecture teams (15–50+ engineers) — paired mentoring, design-review shadowing, and direct 1:1 coaching — with the explicit goal of producing the next generation of architects from within.
• Drive a culture where architecture decisions are taught, not just enforced — raising the team's collective architectural IQ rather than creating a single point of expertise.
• Own bench strength: track high-potential engineers proactively and maintain a live succession map for all key architecture and platform leadership roles.
• Drive AI-assisted development, code generation tooling, and developer productivity improvements across the SDLC.
• Own the AI testing strategy end-to-end: unit-level prompt tests (Promptfoo), component-level RAG evaluation (DeepEval), and system-level agent behavior testing (LangSmith, Braintrust).
• Define LLM quality gates — hallucination rate, groundedness, toxicity, latency — that block releases automatically when thresholds are breached.
• Evaluate and standardize tooling across the AI eval stack: DeepEval for metric-based LLM unit tests, RAGAS for RAG pipeline scoring, TruLens for feedback functions, LangSmith for trace-level debugging, and Braintrust for dataset-driven regression testing.
Cross-Functional Collaboration
• Partner with QA leadership on test strategy — including AI-assisted testing, performance, security, and reliability testing.
• Work closely with Product Management to assess feasibility, define solution approaches, and shape the product roadmap.
• Engage executive stakeholders to communicate technology strategy, manage risk, and translate architectural decisions into business value.
• Security: Partner with the Security team on threat modeling, secure-by-design reviews, and AI-specific risk assessments — treating security sign-off as a release gate. Own compliance architecture for SOC 2, ISO 27001, and emerging AI governance standards.
• DevOps / Platform Engineering: Co-own CI/CD, SLOs, and incident response architecture with the DevOps Lead — platform and infrastructure decisions are made jointly, not handed over.
• Customer Success: Maintain a direct feedback loop with Customer Success — translate enterprise-scale performance, integration, and reliability signals from the field into architectural improvements before they become escalations.
What You'll Bring
Required
• 15+ years in software engineering and architecture; 8+ years in enterprise or product architecture leadership roles.
• Hands-on, demonstrated experience shipping AI/ML capabilities to production.
• 5+ years building and scaling multi-tenant SaaS products on cloud-native platforms (AWS preferred).
• 5+ years building and scaling licensed, on-prem products on multiple platforms.
• Experience in managing architecture for products deployed across SaaS and multiple, on-prem platforms.
• Deep expertise in Agentic AI frameworks (LangChain, LangGraph, CrewAI, AutoGen, MCP) and RAG architectures in production.
• Strong command of Generative AI ecosystem: LLM providers (OpenAI, Anthropic, Gemini), model abstraction, prompt engineering, and LLMOps.
• Expert-level proficiency in Python and Java; solid understanding of microservices, REST APIs, and event-driven systems.
• Proven track record leading cross-functional engineering teams through large-scale platform transformations.
• Experience with enterprise architecture frameworks (TOGAF or equivalent).
Nice to Have
• 3+ years designing and shipping Generative AI or Agentic AI products commercially.
• Experience with Data Catalog, Data Governance, Data Quality, or Analytics platforms.
• Exposure to vector databases (pgvector, Pinecone, Weaviate, OpenSearch) and semantic search in production.
• Experience with AWS Bedrock, SageMaker, or similar managed AI/ML services.
• Background supporting enterprise-scale, Fortune 500 customers.
• Master's degree in Computer Science, AI/ML, or a related field.
Education
• Bachelor's degree in Computer Science, Engineering, or a related field (required).
• Master's degree or AI/ML specialization (preferred).
What Success Looks Like
In your first 90 days, you will have assessed the current architecture, identified the top 3 risks, and delivered an initial 12-month roadmap. Within 12 months, the platform will demonstrate measurable improvements across these dimensions:
Platform Reliability - Uptime - > 99.9%
AI Quality - Agent success rate & RAG accuracy - Improving QoQ
Engineering Delivery - Roadmap predictability - ≥ 90%
Scalability - Customer growth support w/o redesign - 3× headroom
Cost Efficiency - AI & infra cost per transaction - Optimized YoY
Why OvalEdge
• Build at the intersection of AI and Data — one of the highest-impact domains in enterprise software.
• Greenfield AI architecture ownership with real production scale and Fortune 500 customer exposure.
• Collaborative, engineering-first culture with a direct line to product strategy and executive leadership.
• Competitive compensation, equity participation, and dynamic work environment.