Performance Engineering

A True Performance Engineering team

Incredibly-skilled engineering performance team

  • Most teams in industry are really Performance QA (aka Run, Report, Repeat)

Every code we’ve ever touched we’ve made faster - No matter how brilliant the original coders<

  • Many examples: Amazon Cloud, GPUs, ML Training &Inference, Oracle DB, Adv Analytics, Oracle Apps, Graph Analytics, Storage, OS, HW, …

We are passionate and never satisfied with performance

  • Don’t accept other preconceptions
  • Work cross stack, at scale, and at getting near-linear scaling
  • Work at all levels of SW/HW, dive deep to find real performance issues
  • Innovative at using things for unintended purposes

Decades of being heavily involved with HW/SW Codesign

Performance Engineering team History

A truly unique blend of the right experiences which uniquely positions us for AI. (overview graphic and details)

Team accelerated Amazon Services (2017 - Present)

  • Team Accelerated greater than 60 AWS Services - ML, Analytics, Database, Streaming, Graviton, Trainium, Inferentia, EC2, Networking, Containers, Lambda, and the newest services.

Team accelerated Apps/Analytics and Codesigned SW-in-Silicon with Oracle acquisition (2010)

  • Focused on Commercial Apps, DB IM columnar, ETL/DSL/Java Streams, Oracle Advanced Analytics with Featurization of complex DB data, ML Analytics, Spark, Graph , NoSQL, Java/JVM, HW accelerators HW/SW Co-design, Cloud (SaaS, PaaS, IaaS) All along engineering better performance across the stack, prelim Tensorflow

Team merged App/DB/HPC tuning with Sun acquisition (1996)

  • Sun thought big servers were boutique, but made greater than $50B for Sun on high-end servers. Over 540 world records and innovations throughout the stack for high-end near-linear parallelism and scalability on APPs (ERP, SCM, HR, CRM) and DBs, including 1st Columnar DBs, and HPC

Team codesigned 1st high-end general servers, then learned Commercial Apps/DB with Cray/SGI acquisition (1991)

  • Patent with 84-way i860 SMP (FPS Matrix Co-Processor), then SPARC vector server which led to 1st effective 64-way SPARC SMP server. Became Oracle DB expert, then created 1st scalable multi-TB DB and Data Warehousing + OLTP optimizations established Cray as fastest commercial database and app servers

Team started at Floating Point Systems (attached HPC accelerator of mid 80’s) and stuck together through 4 acquisitions

  • HPC apps, Math Library optimization Microcode VLIW SW pipelining, (Eigen, Sparse solvers, multi-radix FFTs, Intrinsics which led to Auto Diff innovations for Non-Linear solvers, MPP HW/SW codesign (Hypercube, Torus), compute intensity optimization for attached GPU

On Prem Acceleration

Below is a list of the wide variety of our unique performance engineering experiences that inform our AI performance Tuning!

Architecture

  • Enterprise Architecture
  • Processor Architecture
  • HW Accelerator Architecture
  • Server Architecture
  • System Architecture

Analytics Apps (Full-stack optimization)

  • Spark ML (training/scoring asymmetry), BLAS3 opt
  • Oracle Adv Analytics (training/scoring asymmetry)
  • Oracle PGX Graph
  • TensorFlow
  • SAS, SPSS, FPSMath(made public ‘89)
  • Homegrown statistics packages
  • Oracle Spatial
  • TensorFlow, MXNet, Gluon, Numpy, R, Python, matplotlib, ...

Transactional Apps (Full-stack optimization)

  • Fusion Apps(Java)
  • SOA
  • Oracle E-Business, SAP, Peoplesoft, Siebel, JD Edwards, Fusion Apps, Manugistics, Baan…

Data Management

  • ETL for ML, BigData SQL, SAS ETL, Informatica ETL,
  • Spark SQL
  • Columnar In-memory: Oracle, SybaseIQ, Expressway,
  • Oracle NoSQL, Cassandra NoSQL, key-value
  • Oracle DB, MySQL, DB2, ...
  • Data Warehousing, Datamarts, In-memory Aggregation
  • Kafka Streaming

Java/JVM/GC

  • Java Streams (HW DAX)
  • REST (Jersey, Grizzly)
  • Intrinsics - inline assembly accelerators

Cross-stack examples

  • Moving DB functions to Disk controllers
  • Hybrid Columnar Compression

MPP/Cloud

  • Oracle Cloud, MPP, 3D torus, Vector Hypercube, Dataflow machine, …
  • Storage, Network, Compute optimization
  • Matrix co-processor
  • Attached Processors (GPU)

Parallel Performance

  • Near-linear Scaling (MPP, NUMA, SMP), major restructuring algorithms for parallel
  • Modeling/estimation
  • Instrumentation (w/ myriad of tools)
  • Analysis(Shortfall), Rectification
  • OS: sched, thread tuning, lock splitting

CPU

  • VLIW SW pipelining
  • RISC/CISC optimization: RAW, etc
  • Vectorization
  • SPARC, x86, i860, Cray YMP, FPS-VLIW, FPS XP-32,Transputer, systolic arrays,…
  • In-memory accelerations (DAX,…)

HPC

  • Financial Derivatives
  • Signal processing, Beam-forming ,…
  • Structural Analysis
  • Computation Chemistry
  • Physics (CFD, MHD, QCD, QED,..)
  • EDA
  • Seismic Oil/Gas
  • Gov
  • Ad Hoc Customer
  • MPI
  • OpenMP
  • OpenCL
  • InfiniBand, Ethernet Clusters

Math Library

  • Solvers, Eigen, mixed-radix FFT, Derivative, Seismic, conv/deconv, linear prog, conjugate gradient, Strassen matmul, Winograd, compression, simulation, signal processing…
  • Out-of-Core equation solvers
  • BLAS3,2,1
  • Intrinsics (various precision)
  • Automatic Differentiation & nonlinear solvers
  • High-accuracy long accumulator solvers
  • Interval Math

Memory

  • Compute intensity optimization
  • Data vectorization
  • BW, bisection BW

Network

  • REST (Jersey, Grizzly)
  • small-packet optimization
  • Large-packet optimization
  • Structured Asynch Pipelined for MPPs
  • Interrupt tuning/scalability
  • IB, various network techm
  • Storage optimization, Filesystem, QFS,…

Security/Crypto

  • Security Kernels
  • Secure Network
  • Secure Filesystem
  • Oracle TDE
  • Oracle Data Redaction
  • SSM (Silicon Secured Memory)

Virtualization

  • LDoms, Zones,
  • Optimized Virtualized Storage, Network, & CPU

This journey gave us a truly rare blend of experiences that are critical for the next steps in AI performance.