
AI Perf
AI Performance Engineering
Unique Skills
On its way to you, the members of our team have been on a unique journey. Our first jobs were with a company that focused on the highest possible HPC performance, and while there we built systems with a variety of distributed architectures that are relevant today. Since then we have been acquired four times, and each time we have explored new architectures, layers of stack, new application areas, and industry segments. It hasn’t been easy to re-invent ourselves so many times, but the knowledge we have accumulated has been enormous. This experience will be to the benefit of whichever company acquires us. (unique experiences)
Crucial to our team’s success and longevity is our culture of collaboration, which has consistently produced super-star performance engineers. We work well together across organizational boundaries and time zones. We have mastered the art of joining new teams, as well as absorbing new team members. We collaborate with deep technologists and coders. We engage with customers and internal executives. We write and present our findings to a variety of audiences. Above all, we make every code go faster.
Let me tell you what we’ve been doing.
We are some of the best performance engineers in the field. While at Amazon, Oracle, Sun, and Cray the performance engineering team that Brad led set over 540 world records across a wide range of standard benchmarks and worked with hundreds of real-world customer apps. We optimized them across the entire stack from disk to apps, on many different systems, and with a deep understanding of the business needs, the use case, the algorithms, and the math. These accomplishments covered the gamut and are highlighted in the section “Team experience and strengths.”
What can we do for you? ...make AI go faster!

Holistic Problem Solving
Shahin Khan - Industry Visionary Comments on the Team
"There are very few people I would recommend as strongly as I recommend Brad and his world-class team.
With a deep understanding of application development, CPU, GPU, Memory, Storage, Networking, and system architecture (Dataflow, MPP, SMP, Vector, Clusters...), Brad, and his team, can make any app go fast on any system, characterize IT deployments, work with customers and new product introduction, and much else.
Artificial Intelligence (AI), Deep Learning (DL), Machine Learning (ML), HPC, Analytics, Big Data, Complex Queries, OLTP, eCommerce, Web, ...
In-Memory, Graph, Streaming, ...
numerical, transactional, visual, ...
multi-tier, scale-out, Cloud, ...
Java/C/Fortran, ...
Data Center design, power, cooling, management, ...
Competitive system analysis, hands-on tuning, ...
you name it, they have done it personally on real machines and with real world apps.
Add to that his ability to deliver the best possible outcome on time, give you solid advice, manage and grow a high performing team or participate in a team, work across the organization to make it happen, ability to present to customers at executive and technical levels, to large audiences, and internal colleagues and executives, ... and you have someone who is right up there in the peerless category."
Old way to do performance
AI Performance Engineering Done in a New Way
A more skillful approach to performance
Our team of experts has improved the performance of every code of every application it has ever touched. Regardless of the brilliance of the original coders, this has been proven multiple times within every application area.
This kind of success is possible because our team looks beyond the specific algorithm and across the layers in the stack. Sometimes the opportunity for superior performance is in reconsidering discarded “slow” algorithms and optimizing them appropriately for the solution. Some coders will make a trade-off between algorithms based on the assumed performance of the alternatives and may choose the lesser algorithm without considering if the “slow” algorithm can be optimized by carefully looking at its specific characteristics.
We work across the stack and are always thinking about how applications will perform distributed and at scale. In addition, we have a deep understanding of a wide variety of architectures that are being used or are being revisited at this time (unique skills). We usually work embedded with subject-matter experts, but collaborate together to look for innovations and additional optimizations.
We are passionate about this work. We are the team that is never satisfied with the current performance and that takes the initiative to improve it. Most teams will stop working when preconceptions tell them that performance can’t be improved. This is precisely the time to innovate. Innovations come in many forms. Some of the innovations come from optimizing at a different layer of the stack or by using functions for unintended purposes.
Not every optimization can be done in software alone. Future hardware will be steered towards AI-focused workloads. Because of the innovative spirit and ability to work across both HW and SW groups, our team has often been involved in HW/SW codesign.
A few examples of these codesign experiences include 3D torus for solving non-linear PDEs using AutoDiff and long accumulators with interval math, SPARC DAX (in-memory columnar SW-in-Silicon) and future designs for accelerating DNNs.