Publication Date


Document Type


Committee Members

Keke Chen (Advisor), Xiaoyu Lu (Committee Member), Krishnaprasad Thirunarayan (Committee Member), Junjie Zhang (Committee Member)

Degree Name

Doctor of Philosophy (PhD)


With massive data collections and needs for building powerful predictive models, data owners may choose to outsource storage and expensive machine learning computations to public cloud providers (Cloud). Data owners may choose cloud outsourcing due to the lack of in-house storage and computation resources or the expertise of building models. Similarly, users, who subscribe to specialized services such as movie streaming and social networking, voluntarily upload their data to the service providers' site for storage, analytics, and better services. The service provider, in turn, may also choose to benefit from ubiquitous cloud computing. However, outsourcing to a public cloud provider may raise privacy concerns when it comes to sensitive personal or corporate data. Cloud and its associates may misuse sensitive data and models internally. Moreover, if Cloud's resources are poorly secured, the confidential data and models become vulnerable to privacy attacks by external adversaries. Such potential threats are out of the control of the data owners or general users. One way to address these privacy concerns is through confidential machine learning (CML). CML frameworks enable data owners to protect their data with encryption or other data protection mechanisms before outsourcing and facilitates Cloud training the predictive models with the protected data. Existing cryptographic and privacy-protection methods cannot be immediately lead to the CML frameworks for outsourcing. Although theoretically sound, a naive adaptation of fully homomorphic encryption (FHE) and garbled circuits (GC) that enable evaluation of any arbitrary function in a privacy-preserving manner is impractically expensive. Differential privacy (DP), on the other hand, cannot specifically address the confidentiality issues and threat model in the outsourced setting as DP generally aims to protect an individual's participation in a dataset from an adversarial model consumer. Moreover, a practical CML framework must ensure a fair cost distribution between the data owner and Cloud with by moving the expensive and scalable components to Cloud while limiting data owner's costs to the minimum. Therefore, constructing novel CML solutions, which maintain a good balance among privacy protection, costs, and model quality, is necessary. In this dissertation, I present three confidential machine learning frameworks for the outsourcing setting: 1) PrivateGraph for unsupervised learning (e.g., graph spectral analysis), 2) SecureBoost for supervised learning (e.g., boosting), 3) DisguisedNets for deep learning (e.g., convolutional neural networks). The first two frameworks provide semantic security and follow the decomposition-mapping-composition (DMC) process. The DMC process includes three critical steps: 1) Decomposition of the target machine learning algorithm into its sub-components, 2) Mapping of the selected sub-components to appropriate cryptographic and privacy primitives, and finally, 3) Composition of the CML protocols. A critical aspect of these frameworks is the identification of the ``crypto-unfriendly" subcomponents and their alteration or replacement with ``crypto-friendly" subcomponents before the final composition of the CML frameworks. The Disguised-Nets framework, however, due to the intrinsically expensive nature of deep neural networks (DNN) and size of the training images, relies on a perturbation based CML construction. By relaxing the overall security and disguising the training images with cheaper transformations, Disguised-Nets enables training confidential DNN models over the protected images very efficiently. I present the formal cost and security analysis for all three CML frameworks and back them with extensive experiments. The results show that these frameworks are practical in real-world scenarios and generate robust models comparable to models that train on unprotected data.

Page Count


Department or Program

Department of Computer Science and Engineering

Year Degree Awarded