Team Lead - Platform Engineering

Hybrid
- Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia

Job description

Who We Are

NEXT Ventures is where ambition takes shape and momentum becomes movement. As a global platform revolutionising access to performance-based capital, we empower the world’s most driven individuals to rise. Through our flagship brand, FundedNext, we empower dreamers to become doers, and potential to turn into performance. With 500+ driven minds across five countries, we power a global rhythm — 220,000+ daily users from 170+ nations, each chasing greatness in their own way.

Your Role in Our Mission

As Head of Platform Engineering, you own the foundation everything else at FundedNext is built on — the scalability, reliability, and security posture of the entire platform that 220,000+ daily users depend on. You don’t just keep the lights on; you decide how the system grows, where it bends before it breaks, and how fast every other squad can ship.

You lead the Platform Engineering Squad, set technical direction, and partner with product, cybersecurity, and business leadership to make sure platform work accelerates delivery rather than blocking it. This is a builder’s mandate with a leader’s reach: hands-on enough to architect a sharding strategy or chase a slow query, senior enough to set performance budgets across the company and grow the engineers who enforce them. And it runs on an AI-native engineering culture — one you’re expected to champion, not just adopt.

How You’ll Make an Impact

Scalability & Performance Engineering

Own the scalability posture of the entire FundedNext platform — proactively identify bottlenecks, design horizontal and vertical scaling strategies, and ensure the infrastructure handles 2–5x traffic growth without degradation.
Lead database sharding, partitioning, and replication strategies across the platform’s MySQL/PostgreSQL databases — designing data distribution approaches that hold query performance as volume grows from millions to billions of rows.
Institutionalise regular query optimisation cycles — identify slow queries across all services, analyse execution plans, implement indexing strategies, and set performance baselines that prevent regressions.
Architect and implement data archiving solutions — policies and pipelines that move historical trade logs, transaction records, and audit trails to cold storage without impacting production performance or compliance.
Champion a performance engineering culture across all squads — establish performance budgets, bring load testing into the delivery pipeline, and provide tooling that lets product squads catch and fix issues early.

Reliability, DR & Security

Own Business Continuity and Disaster Recovery readiness — design, implement, and regularly drill failover procedures to achieve minimal RTO and RPO across all critical services.
Drive system reliability to 99.9% uptime across all services — health checks, circuit breakers, graceful degradation patterns, and automated recovery mechanisms.
Own resolution of findings from the Cyber Security Squad — take in vulnerability reports, audit findings, and pen-test results, then prioritise, remediate, and verify fixes across infrastructure and application layers.
Design centralised log management and unified observability — ensuring MTTD under 15 minutes and MTTR under 60 minutes through proper alerting, dashboarding, and runbooks.
Establish and enforce deployment discipline — CI/CD reliability, rollback procedures, canary deployments, and DORA metrics tracking (deployment frequency, lead time, change failure rate, MTTR) so every squad deploys at least weekly, safely.

Infrastructure & Architecture

Architect and manage the container orchestration layer (Docker, Kubernetes or ECS) — consistent, reproducible environments across development, staging, and production.
Own the AWS cloud infrastructure — VPC design, compute scaling (EC2, ECS, Lambda), managed database services (RDS, ElastiCache), storage (S3), CDN configuration, and cost optimisation.
Design and scale the event-driven architecture layer — message queue infrastructure (RabbitMQ, Kafka, or similar) for asynchronous processing, event sourcing, and inter-service communication across the microservices ecosystem.
Drive the microservices strategy — service decomposition decisions, API gateway management, service mesh considerations, distributed tracing, and keeping the platform from becoming a distributed monolith.

Product Engineering & Leadership

Contribute to customer-facing FundedNext product engineering when needed — this is not a pure infrastructure role. When product squads hit architectural challenges, performance bottlenecks, or need infrastructure-aware features, the platform team steps in.
Lead the Platform Engineering Squad — own sprint planning, technical direction, delivery cadence, and cross-squad coordination so platform work never blocks product delivery.
Manage stakeholder relationships across all product squads, cybersecurity, and business leadership — prioritising platform work by its impact on overall engineering velocity and system reliability.
Establish architecture decision records (ADRs), infrastructure-as-code practices, and documentation standards that keep the platform self-documenting and maintainable.
Mentor and grow the platform engineers on your team — fostering a performance-obsessed culture where everyone proactively looks for ways to make the system faster, more reliable, and more scalable.

What You Bring

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
7+ years of professional software engineering experience, with at least 3 years focused on platform engineering, site reliability engineering (SRE), or infrastructure engineering at scale.
At least 2 years in a technical leadership or squad/team lead role — setting technical direction, running delivery cadences, and mentoring engineers.
Expert-level proficiency in PHP and Laravel — the platform team must deeply understand the application layer it’s scaling. You can optimise Laravel for high throughput, diagnose bottlenecks at the framework level, and design application-level caching strategies.
Strong proficiency in Node.js for high-performance backend services and Next.js for internal tooling and dashboards — comfortable contributing across the full stack when needed.
Deep expertise in database architecture and optimisation — sharding strategies, replication topologies, partitioning schemes, query optimisation, indexing strategies, and data archiving pipelines (MySQL and/or PostgreSQL at scale).
Strong hands-on experience with Docker containerisation and container orchestration (Kubernetes, ECS, or equivalent) in production environments.
Deep AWS experience — VPC architecture, compute (EC2, ECS, Lambda), databases (RDS, Aurora, ElastiCache/Redis), storage (S3), networking (ALB, CloudFront, Route53), and cost management; able to design and manage production-grade AWS infrastructure.
Hands-on experience with message queue systems (RabbitMQ, Kafka, or similar) for event-driven architecture, asynchronous processing, and inter-service communication.
Experience designing and implementing microservices architectures — service decomposition, API design, distributed tracing, service discovery, and managing the complexity of distributed systems.
A proven track record in BC/DR planning and execution — designing failover architectures, running disaster recovery drills, and achieving measurable RTO/RPO targets.
Experience with CI/CD pipeline design and deployment automation (GitHub Actions, Jenkins, ArgoCD, or similar) — canary deployments, rollback strategies, and DORA metrics tracking.
Experience with monitoring and observability stacks (Prometheus, Grafana, ELK/OpenSearch, Datadog, or equivalent) — able to design alerting strategies that achieve MTTD under 15 minutes.
Familiarity with security remediation workflows — receiving findings from security teams, prioritising by severity and exploitability, implementing fixes, and verifying through retesting.
A performance-obsessed engineering mindset — someone who instinctively profiles, benchmarks, and optimises, and who gets genuine satisfaction from making systems faster and more reliable.

AI-Native Engineering (Mandatory)

This one is non-negotiable, and at this level you set the bar for everyone else. You must demonstrate active, daily use of modern AI agentic workflows — well beyond basic ChatGPT prompts or Copilot autocomplete — and the ability to drive adoption across squads. We expect fluency with AI coding agents (Claude Code, Cursor, Windsurf, or similar), project-level AI configuration (CLAUDE.md, rules files), agentic task delegation, and AI-driven code review, targeting 5–10x productivity through AI-augmented development. Leaders who are not AI-native in their own engineering practice will not advance.

Your X-Factor

You have scaled a platform through real hypergrowth — not in theory, but through the incidents, migrations, and re-architectures that come with it.
Experience building or maturing a platform engineering function from the ground up — standing up the practices, tooling, and team, not just inheriting them.
A FinOps instinct — you have meaningfully cut cloud spend without trading away reliability or performance.
Background in fintech, trading, or another high-throughput, low-latency, transaction-heavy domain.
Thought leadership — conference talks, writing, or open-source contributions in infrastructure, reliability, or developer experience.
A history of turning incidents into permanent systemic fixes, and of growing engineers who carry that discipline forward.

Your Journey After Applying

30-minute HR session with the Talent Acquisition team.
60-minute session with the hiring manager.
Technical & architecture deep-dive — a platform design discussion with senior engineers.
Final leadership session with Engineering and business leadership.

Why Join NEXT

At NEXT Ventures, performance is more than numbers — it’s the pulse that drives innovation and impact. Join us to own the engine room of a global trading platform and decide how fast, how reliably, and how far it can scale. Here, platform engineering isn’t a support function — it’s the foundation that everything else stands on, and the person who leads it shapes the trajectory of the entire business.

Your next chapter in building at scale begins here.

Hybrid

Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia

Team Lead - Platform Engineering

Job description

All done!

You've already applied for this job