Ruben Wolff

Ruben Wolff Profile Picture

Current Projects

  • Designed data architecture and incentive engineering for the decentralized "LinkedIn Killer". The product idea is simple: reproduce the recruiting features of LinkedIn on community owned federated infrastructure with revenue split between users, investors and recruiters on-chain.
      Company: Stealth mode startup with a Ukrainian core team. The project is on hold since the beginning of the Russian invasion of Ukraine.

  • Designed and lead the implementation team for a new enterprise wide real time message bus. The new system could auto-scale up to 30000 messages per second which could be read via streaming, big data bulk or API access with full access control over active directory. Real time Machine learning models where utilized to align multiple IOT sensors for most accurate ball and player tracking as well as player action classification.
      Consulting Client: Sports events company in the top 5 worldwide

Past Independent Consulting

  • Prototyped a series of models to improve "eDiscovery" like work in the Swiss legal context for one of the countries largest white collar crime cases. Models included NLI (sentence contradiction / agreement classification), semantic document similarity search, document classification image+text and simple OCR.
      Client: Initially a Swiss law firm but now the project as spun out into a separate company competing with Relativity in the Swiss market
      Legal Tech NLPXLM-robertaBERTDistilBERTFAISS
  • Machine learning advisor for a oracle node reputation scoring system.
      Client: small community governed oracle network on Binance smartchain
      BlockchainBinance Smartchain
  • Advanced predictive risk models (for NEC, ROP, IVH, CLD, and Sepsis) for the neonatal ICU.
      Client: One of the largest Neonatal care providers in the USA
      Deep learningH2OR
  • Length of Stay prediction in burn patients.
      Client: Altonaer Children’s Hospital, Hamburg. (Related Publication)
      RRandom Forest

Past Startups

  • Corepo is a knowledge graph generation project which performs monthly whole web scans to identify all companies with a web presence. A machine learning system identifies which sites are company homepages followed by a deeper crawl and information extraction to identify metadata such as founding year, location, industry and relevant employees. The B2C end user product is designed to bridge the gap between the limitations of human curated company databases and general purpose web search engines.
      Funding: Angel
      Conclusion: B2B data access agreements achieved revenue stage. Operations and sales has been handed off to a third party.

  • the AI that reads pubmed for you. This NLP project aimed to create automated literature review on the fly every day as new publications appear on and The target customers were medical directors in big pharma who need to make daily decisions on a diversity of drugs and conditions and have little time to read but also can not rely on the always out of date published literature reviews. The machine learning model first performs named entity recognition for interventions(drugs), conditions(MeSH) and outcome measures (such as mOS, PFS etc.) followed by a linking model to group together the outcome number to the appropriate study arm. This first version of the system was focused on clinical oncology only.
      Funding: Entrepreneur First
      Conclusion: No product market fit; unlike our early adopters the broader market expected a level of accuracy and granularity only possible by full human curation

  • Neoglia LTD was a London based health tech startup which I co-founded with the goal of increasing the utility of sensitive medical data siloed in hospital EHR. We prototyped a federated machine learning approach where clinical data would remain within the individual hospitals servers while allowing pharma companies to build models on the data from all participating hospitals.
      Funding: Entrepreneur First
      Conclusion: No product market fit; most major players had unpublicized alternative means of accessing sensitive data via research collaborations or curated anonymized downloadable datasets

Past Consulting with DOne Solutions

  • Data Architecture Big Data for Stock Exchange HadoopSparkCloudera
  • Distributed Algorithm for Order Book state reconstruction from historical trade data SparkScala
  • Medical Claim Embedding Space from Medical Insurance Metadata RSent2Vec
  • Medical Records Diabetes Risk Prediction RirlbaH2O
  • Medical Insurance Claims Adjudication Automation RC++Python
  • Claim Processing Time/Complexity prediction R
  • Automate Model refresh loop of Productive Machine learning Model (Data acquisition, Model training, Model Evaluation, Hot swap productive model behind live API) RC++SQL-ServerREST API
  • Data warehouse Modeling and Generation Package C#SQL Server
  • Data Architecture Data warehouse Belgium Health insurance Data-Vault
  • Deployment Automation Data warehouse AnsibleOracleSybaseInformaticaSQL Server

Past Jobs

  • Data Science intern at Credit Suise investigating market behavior time series datasets and developing algorithmic trading strategies Gamma TradingVolatility PredictionRScalaC#
  • Software Developer at Tufts Medical Center developing a Personalized Oncology Drug Suggestion Tool Reinforcement LearningMatlabPHP
  • Neuroscience Researcher at University Of Texas At Dallas leading a Guanfacine Selective Attention inVivo Study NeurobiologyAdrenergic receptorsMatlab


Past Talks

  • Automating insurance claim processing: Using NLP methods to learn sparse spaces - at - SDS 2018 Slides
  • The surprising Utility of NLP algorithms on non Text data - at - SwissText 2018 - agenda
  • Machine Learning mustn't be a Black box - at - Certificate of Advanced Studies ZFH in Claims Management (12 ECTS) - slides
  • Real Life Benchmarking - at - SQL server 2016 launch event - slides
  • Digitize the Data Store - at - Common Sense 1702 - slides - video