CUDA/OpenCL/C++ development or conversion from MATLAB or Python, writing highly efficient, readable, unit tested code
Deep Learning Neural Networks, segmentation, object detection, tracking, depth estimation, and more. Development, optimization and deployment
Fast efficient implementations, on CPU and/or GPU
Deep Learning Neural networks for Segmentation, object detection, tracking, depth estimation, data compression and more. Development and training of networks for solving the most challenging tasks.
CPU Optimizations using SIMD SSE/AVX, parallel programming. Usage of atomic commands, lock free data structures, etc. Finding hotspots, detecting and analyzing of various types of bottlenecks.
Design, architect and implement new systems. GPU optimizations of CUDA/OpenCL code, supporting high memory bandwidth requirements and high compute efficiency.