Ease.ml/ci & ease.ml/meter 
Towards Data Management for Statistical Generialization

What is ease.ml/ci & ease.ml/meter?

When training a machine learning model becomes fast, and model selection and hyper-parameter tuning become automatic, will non-CS experts finally have the tool they need to build ML applications all by themselves? We at DS3Lab focus on those users who are still struggling — not because of the speed and the lack of automation of an ML system, but because it is so powerful that it is easily misused as an overfitting machine. For many of these users, the quality of their ML applications might actually decrease with these powerful tools without proper guidelines and feedback (like what software engineering provides for traditional software development). We introduce two systems, ease.ml/ci and ease.ml/meter, which we built as an early attempt at an ML system that tries to enforce the right user behavior during the development process of ML applications. The core technical challenge is how to answer adaptive statistical queries in a rigorous but practical (in terms of label complexity) way. Interestingly, both systems can be seen as a new type of data management system which, instead of managing the (relational) querying of the data, manages the statistical generalization power of the data.


Projects and Publications

Ease.ml/ci

Ease.ml/ci is a continuous integration engine for ML that gives developers a pass/fail signal for each developed ML model depending on whether they satisfy certain predefined properties over the (unknown) true distribution.

Publications

Demo

Ease.ml/meter

Ease.ml/meter is a system that continuously returns some notion of the degree of overfitting to the developer.

 

 

Publication

Demo


People

External Collaborators

  • Wentao Wu (Microsoft Research)
  • Bolin Ding (Alibaba)

DS3Lab Members

  • Cedric Renggli
  • Bojan Karlaš
  • Frances Ann Hubis (previously)
  • Ce Zhang
JavaScript has been disabled in your browser