Publications

Most of our publications fall into two umbralla projects, focusing on different aspects of the usability of machine learning systems

Zip.ML: Scalable and Efficient ML via joint co-design of ML algorithms, data ecosytems, and hardware acceleration.

Ease.ML: End-to-end Lifecycle Managements for MLDevs and MLOps. 

Visit these project pages for a list of relevant publications and how they fit together. 


All Publications in Chronological Order

You can also find the same list in the external pageGoogle Scholar of Ce Zhang.

2022

1. Shaoduo Gan, Xiangru Lian, Rui Wang, Jianbin Chang, Chengjun Liu, Hongmei Shi, Shengzhuo Zhang, Xianghong Li, Tengxu Sun, Jiawei Jiang, Binhang Yuan, Sen Yang, Ji Liu, Ce Zhang. BAGUA: Scaling up Distributed Learning with System Relaxations. VLDB 2022.

2. Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Zhiyao Chen, Yinan Shan, Yang Zhao, Ce Zhang. xFraud: Explainable Fraud Transaction Detection. VLDB 2022.

3. Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Ce Zhang, Byung-Gon Chun, Markus Weimer, Matteo Interlandi. WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model. VLDB 2022.

4. Yang Li, Yu Shen, Huaijun Jiang, Wentao Zhang, Jixiang Li, Ji Liu, Ce Zhang, Bin Cui. Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale. VLDB 2022.

5. Zitao Li, Bolin Ding, Ce Zhang, Ninghui Li, Jingren Zhou. Federated Matrix Factorization with Privacy Guarantee. VLDB 2022.

6. Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang. In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle. SIGMOD 2022.

7. Baoqing Cai, Yu Liu, Ce Zhang, Guangyu Zhang, Ke Zhou, Li Liu, Chunhua Li, Bin Cheng, Jie Yang, Jiashu Xing. HUNTER: An Online Cloud Database Hybrid Tuning System for Personalized Requirements. SIGMOD 2022.

8. Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu,
Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos. Data Science Through the Looking Glass: Analysis of
Millions of GitHub Notebooks and ML.NET Pipelines. SIGMOD Records 2022.

9. external pageDAPHNE Consortium. DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines. CIDR 2022.

10. Maurice Weber, Linyi Li, Boxin Wang, Zhikuan Zhao, Bo Li, Ce Zhang. Certifying Out-of-Domain Generalization for Blackbox Functions. ICML 2022.

11. Alfonso Amayuelas, Shuai Zhang, Xi Susie Rao, Ce Zhang. Neural Methods for Logical Reasoning over Knowledge Graphs. ICLR 2022.

12. Yuexiang Xie, Zhen WANG, Yaliang Li, Ce Zhang, Jingren Zhou, Bolin Ding. iFlood: A Stable and Effective Regularizer. ICLR 2022.

13. Xiangru Lian, Binhang Yuan, Xuefeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen Yang, Ce Zhang, Ji Liu. Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters. KDD (Applied Data Science) 2022.

14. Yang Li, Yu Shen, Huaijun Jiang, Tianyi Bai, Wentao Zhang, Ce Zhang, Bin Cui. Transfer Learning based Search Space Design for Hyperparameter Tuning. KDD 2022.

15. Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic. Which Model to Transfer? Finding the Needle in the Growing Haystack. CVPR 2022.

16. Thórhildur Thorleiksdóttir, Cedric Renggli, Nora Hollenstein and Ce Zhang. Dynamic Human Evaluation for Relative Model Comparisons. LREC 2022.

17. Gyri Reiersen, David Dao, Björn Lütjens, Konstantin Klemmer, Kenza Amara, Attila Steinegger, Ce Zhang, Xiaoxiang Zhu. ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery. AAAI 2022.

18. Yilmazcan Özyurt, Tobias Hatt, Ce Zhang, Stefan Feuerriegel. A Deep Markov Model for Clickstream Analytics in Online Shopping. WWW 2022.

2021

1. Bojan Karlaš*, Peng Li*, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang. Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions. VLDB 2021.

2. Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Yaliang Li, Bolin Ding, Jingren Zhou, Zhi Yang. Wentao Wu, Ce Zhang, Bin Cui. VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition. VLDB 2021.

3. Jiawei Jiang*, Shaoduo Gan*, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang. Towards Demystifying Serverless Machine Learning Training. SIGMOD 2021.

4. Leonel Aguilar, David Dao, Shaoduo Gan, Nezihe Merve Gurel, Nora Hollenstein, Jiawei Jiang, Bojan Karlas, Thomas Lemmin, Tian Li, Yang Li, Susie Rao, Johannes Rausch, Cedric Renggli, Luka Rimanic, Maurice Weber, Shuai Zhang, Zhikuan Zhao, Kevin Schawinski, Wentao Wu, Ce Zhang. Ease.ML: A Lifecycle Management System for Machine Learning. CIDR 2021.

5. Peng Li, Xi Rao, Jeffinifer Blase, Yue Zhang, Xu Chu, Ce Zhang. CleanML: A Benchmark for Evaluating the Impact of Data Cleaning on ML Classification Tasks. ICDE 2021.

6. Cedric Renggli, Luka Rimanic, Nezihe Merve Gurel, Bojan Karlas, Wentao Wu, and Ce Zhang. A Data Quality-Driven View of MLOps. IEEE Data Engineering Bulletin 2021.

7. Nezihe Merve Gürel*, Xiangyu Qi*, Luka Rimanic, Ce Zhang, Bo Li. Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks. ICML 2021.

8. Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He. 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. ICML 2021.

9. Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong. Evolving Attention with Residual Convolutions. ICML 2021.

10. Cedric Renggli, Luka Rimanic, Nora Hollenstein, Ce Zhang. Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee. NeurIPS (Dataset and Benchmark) 2021.

11. Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li. TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness. NeurIPS 2021.

12. Maurice Weber, Nana Liu, Bo Li, Ce Zhang, Zhikuan Zhao. Optimal Provable Robustness of Quantum Classification via Quantum Hypothesis Testing. npj Quantum Information 2021.

13. Xupeng Miao*, Nezihe Merve Gürel*, Wentao Zhang*, Zhichao Han, Bo Li, Wei Min, Susie Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang. DeGNN: Improving Graph Neural Networks with Graph Decomposition. KDD 2021.

14. Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huai Jun Jiang, Ming Chao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui. OpenBox: A Generalized Black-box Optimization Service. KDD (Applied Data Science) 2021.

15. Yuexiang Xie, Zhen Wang, Yaliang Li, Bolin Ding, Nezihe Merve Gürel, Ce Zhang, Minlie Huang, Wei Lin, Jingren Zhou. FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data. KDD (Applied Data Science) 2021.

16. Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso. FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters. KDD (Applied Data Science) 2021.

17. Wenqi Jiang, Zhenhao He, Thomas B. Preußer, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. Accelerating Deep Recommendation Systems to Microseconds by Hardware and Data StructureSolutions. MLSys 2021.

18. Mohammad Reza Karimi*, Nezihe Merve Gürel*, Bojan Karlaš, Johannes Rausch, Ce Zhang, and Andreas Krause. Online Active Model Selection for Pre-​trained Classifiers. AISTATS 2021.

19. Linyi Li*, Maurice Weber*, Xiaojun Xu, Luka Rimanic, Bhavya Kailkhura, Tao Xie, Ce Zhang, Bo Li. TSS: Transformation-Specific Smoothing for Robustness Certification. CCS 2021.

20. Boxin Wang, Fan Wu, Yunhui Long, Luka Rimanic, Ce Zhang, Bo Li. DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation. CCS 2021.

21. Johannes Rausch, Octavio Martinez, Fabian Bissig, Ce Zhang, Stefan Feuerriegel. DocParser: Hierarchical Structure Parsing of Document Renderings. AAAI 2021.

22. Yang Li, Yu Shen, Jiawei Jiang, Jinyang Gao, Ce Zhang, Bin Cui. MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements. AAAI 2021.

23. Nora Hollenstein, Federico Pirovano, Ce Zhang, Lena Jäger, Lisa Beinborn. Multilingual language models predict human reading behavior. NAACL 2021.

24. Nora Hollenstein, Cedric Renggli, Benjamin Glaus, Maria Barrett, Marius Troendle, Nicolas Langer, Ce Zhang. Decoding EEG brain activity for multi-modal natural language processing. Frontiers in Human Neuroscience 2021.

25. Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, Dawn Song. Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification? CVPR 2021.

26. Shuai Zhang, Huoyu Liu, Aston Zhang, Yue Hu, Ce Zhang, Yumeng Li, Tanchao Zhu, Shaojian He, Wenwu Ou. Learning User Representations with Hypercuboids for Recommender Systems. WSDM 2021

2020

1. Nezihe Merve Gürel, Kaan Kara, Alen Stojanov, Tyler Smith, Thomas Lemmin, Dan Alistarh, Markus Püschel and Ce Zhang. Compressive Sensing Using Iterative Hard Thresholding with Low Precision Data Representation: Theory and Applications. IEEE Transactions on Signal Processing 2020.

2. Ji Liu, Ce Zhang. Distributed Learning Systems with First-Order Methods. (Foundations and Trends® in Databases series) 2020.

3. Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui. Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript. ICML 2020.

4. Luka Rimanic, Cedric Renggli, Bo Li, Ce Zhang. On Convergence of Nearest Neighbor Classifiers over Feature Transformations. NeurIPS 2020.

5. Zhiqiang Tao, Yaliang Li, Bolin Ding, Ce Zhang, Jingren Zhou, Yun Fu. Learning to Mutate with Hypergradient Guided Population. NeurIPS 2020.

6. Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, Qi Zhang. Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting. NeurIPS 2020. (Spotlight Presentation: 280/9454)

7. Bojan Karlaš, Matteo Interlandi, Cedric Renggli, Wentao Wu, Ce Zhang, Deepak Mukunthu Iyappan Babu, Jordan Edwards, Chris Lauren, Andy Xu and Markus Weimer. Building Continuous Integration Services for Machine Learning. KDD 2020 (Applied Data Science, Oral Presentation 44/756).

8. Yunyan Guo, Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Bin Cui, Jianzhong Li. Model Averaging in Distributed Machine Learning: A Case Study with Apache Spark. VLDB Journal 2020.

9. Zhipeng Zhang, Wentao Wu, Jiawei Jiang, Lele Yu, Bin Cui, Ce Zhang. ColumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent. ICDE 2020.

10. Giuseppe Russo, Nora Hollenstein, Claudiu Musat, Ce Zhang. Control, Generate, Augment: A Scalable Framework for Multi-Attribute Text Generation. Findings of EMNLP 2020.

11. Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song. A Principled Approach to Data Valuation for Federated Learning. Book Chapter in Federated Learning: Privacy and Incentive 2020.

12. Nora Hollenstein, Marius Troendle, Ce Zhang and Nicolas Langer. ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation. LREC 2020.

13. Maurice Weber, Cedric Renggli, Helmut Grabner, Ce Zhang. Observer Dependent Lossy Image Compression. GCPR 2020.

14. Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang, Bin Cui. Efficient Automatic CASH via Rising Bandits. AAAI 2020.

15. Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, Lidong Zhou. TextNAS: A Neural Architecture Search Space tailored for Text Representation. AAAI 2020.

16. Christian Pfeiffer, Nora Hollenstein, Ce Zhang, Nicolas Langer. Neural dynamics of sentiment processing during naturalistic sentence reading. NeuroImage Journal 2020.

17. Hussein Hassan-Harrirou, Ce Zhang, and Thomas Lemmin. RosENet: Improving Binding Affinity Prediction by Leveraging Molecular Mechanics Energies with an Ensemble of 3D Convolutional Neural Networks. J. Chem. Inf. Model. 2020.

18. Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, Ce Zhang. Ease.ml/snoopy in Action: Towards Automatic Feasibility Analysis for Machine Learning Application Development. VLDB Demo 2020.

2016-2019 (ETH Zurich)

Before 2016 (Peking University, Wisconsin, Stanford)

Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris S. Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Christopher Ré, Benjamin Recht: CYCLADES: Conflict-free Asynchronous Machine Learning. NIPS 2016.

Kun-Hsing Yu, Ce Zhang, Gerald J Berry, Russ B Altman, Christopher Ré, Daniel L Rubin, Michael Snyder. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature Communications 2016.

Christopher De Sa, Alex Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang. DeepDive: declarative knowledge base construction. ACM SIGMOD Record 2016.

Christopher De Sa, Alex Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang. Incremental knowledge base construction using DeepDive. The VLDB Journal 2016. (“Bests of VLDB 2015”)

Ce Zhang, Arun Kumar, and Christopher Ré. Materialization optimizations for feature selection workloads. TODS 2016. (“Bests of SIGMOD 2014”)

Ce Zhang, Jaeho Shin, Christopher Ré, Michael Cafarella, Feng Niu. Extracting databases from dark data with DeepDive. SIGMOD (Industry Track) 2016.

Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré. Asynchrony begets momentum, with an application to deep learning. Allerton 2016.

Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Ré. Incremental knowledge base construction using DeepDive. VLDB 2015. (“SIGMOD Research Highlight Award”)

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. Rapidly mixing Gibbs sampling for a class of factor graphs using hierarchy width. NIPS 2015.

Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. Taming the wild: A unified analysis of Hogwild!-style algorithms. NIPS 2015.

Emily Mallory, Ce Zhang, Christopher Ré, and Russ Altman. Large-scale extraction of gene interactions from full text literature using DeepDive. Bioinformatics 2015.

Stefan Hadjis, Firas Abuzaid, Ce Zhang, and Christopher Ré. Caffe con Troll: Shallow ideas to speed up deep learning. DanaC 2015.

Gabor Angeli, Sonal Gupta, Melvin Johnson Premkumar, Christopher D Manning, Christopher Ré, Julie Tibshirani, Jean Y Wu, Sen Wu, and Ce Zhang. Stanford’s distantly supervised slot filling systems for KBP 2014. Text Analysis Conference Proceedings 2015.

Ce Zhang, Arun Kumar, and Christopher Ré. Materialization optimizations for feature selection workloads. SIGMOD 2014. (SIGMOD Best Paper Award)

Ce Zhang and Christopher Ré. DimmWitted: A study of main-memory statistical analytics. VLDB 2014.

Yingbo Zhou, Utkarsh Porwal, Ce Zhang, Hung Q. Ngo, Long Nguyen, Christopher Ré, and Venu Govindaraju. Parallel feature selection inspired by group testing. NIPS 2014.

Shanan Peters, Ce Zhang, Miron Livny, and Christopher Ré. A machine-compiled macroevolutionary history of Phanerozoic life. PLoS One 2014.

Ce Zhang and Christopher Ré. Towards high-throughput Gibbs sampling at scale: a study across storage managers. SIGMOD 2013.

Srikrishna Sridhar, Stephen J. Wright, Christopher Ré, Ji Liu, Victor Bittorf, and Ce Zhang. An approximate, efficient LP solver for LP rounding. NIPS 2013.

Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, and Shanan Peters. GeoDeepDive: statistical inference using familiar data-processing languages. SIGMOD (Demo) 2013.

Vidhya Govindaraju, Ce Zhang, and Christopher Ré. Understanding tables in context using standard NLP toolkits. ACL (Short Paper) 2013.

Michael Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J. Cafarella, Arun Ku- mar, Feng Niu, Yongjoo Park, Christopher Ré, and Ce Zhang. Brainwash: A data system for feature engineering. CIDR 2013.

Young Chol Song, Henry A. Kautz, James F. Allen, Mary D. Swift, Yuncheng Li, Jiebo Luo, and Ce Zhang. A Markov logic framework for recognizing complex events from multimodal data. ICMI 2013.

John R. Frank, Max Kleiman-Weiner, Daniel A. Roberts, Feng Niu, Ce Zhang, Christopher Ré, Ian Soboroff. Building an entity-centric stream filtering test collection for TREC 2012. TREC 2013.

Ce Zhang, Feng Niu, Christopher Ré, and Jude W. Shavlik. Big data versus the crowd: Looking for relationships in all the right places. ACL 2012.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. Scaling inference for Markov logic via dual decomposition. ICDM 2012.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. VLDS 2012.

Feng Niu, Ce Zhang, Christopher Ré, and Jude W. Shavlik. Elementary: Large-scale knowledge-base construction via machine learning and statistical inference. Int. J. Semantic Web Inf. Syst 2012.

John R. Frank, Max Kleiman-Weiner, Daniel A. Roberts, Feng Niu, Ce Zhang, Christopher Ré, Ian Soboroff. Building an Entity-Centric Stream Filtering Test Collection for TREC 2012. TREC 2012.

Junjie Yao, Bin Cui, Qiaosha Han, Ce Zhang, Yanhong Zhou. Modeling User Expertise in Folksonomies by Fusing Multi-type Features. DASFAA 2011.

Bin Cui, Anthony K. H. Tung, Ce Zhang, Zhe Zhao. Multiple feature fusion for social media applications. SIGMOD 2010.

Bin Cui, Ce Zhang, Gao Cong. Content-enriched classifier for web video classification. SIGIR 2010.

Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen, Ce Zhang. The use of categorization information in language models for question retrieval. CIKM 2009.

Ce Zhang, Bin Cui, Gao Cong, Yu-Jing Wang. A Revisit of Query Expansion with Different Semantic Levels. DASFAA 2009.

Bin Cui, Bei Pan, Heng Tao Shen, Ying Wang, Ce Zhang. Video Annotation System Based on Categorizing and Keyword Labelling. DASFAA 2009.

Ce Zhang, Yu-Jing Wang, Bin Cui, Gao Cong. Semantic similarity based on compact concept ontology. WWW (Poster) 2008.

JavaScript has been disabled in your browser