Modern Big Data Systems for Machine Learning

clock Friday July 10, 2015

This talk was given at Thomson Reuters in London for an audience of Quants and Data Scientists, and comprised of 3 sections:

  1. the first section described how data has many different properties, some of which are mutually exclusive, and how these aspects define how an optimal system must be designed. This section also included a data description from a high-level C-suit perspective (4 V’s) all the way down to a systems engineer that has to worry about how to efficiently transmit and store data;
  2. the second part focused on a number of Machine Learning algorithms and how these always boil down to an optimization problem, and how some algorithms map better to sequential processors (e.g. CPUs) while others ideally map into fine-grained parallel architectures (e.g. FPGAs);
  3. finally, the previous two sections are tied together by exemplifying a number of large scale systems that combine huge-data sets with rapid SDLC.