Online Public Access Catalogue (OPAC)
Library,Documentation and Information Science Division

“A research journal serves that narrow

borderland which separates the known from the unknown”

-P.C.Mahalanobis


Image from Google Jackets

Privacy aware machine learning/ Chandan Biswas

By: Material type: TextTextPublication details: Kolkata: Indian Statistical Institute, 2023Description: xx, 157 figs, tablesSubject(s): DDC classification:
  • 23 006.31 B621
Online resources:
Contents:
Introduction -- Related Work -- Preliminaries -- Datasets and Evaluation Metrics -- Privacy Aware Unsupervised Learning -- Privacy Aware Semi-Supervised Learning -- Privacy Aware Supervised Learning -- Privacy Aware Approximate Nearest Neighbour Search -- Conclusions and Future Work
Production credits:
  • Guided by Prof. Ujjwal Bhattacharya and Prof. Debasis Ganguly
Dissertation note: Thesis (Ph.D.) - Indian Statistical Institute, 2023 Summary: Privacy preserving computation is of utmost importance in a cloud computing environ- ment where a client often requires to send sensitive data to servers, offering computing services, for computational purposes over untrusted networks. Sharing the raw or an ab- stract representation of a labelled or unlabelled dataset on cloud platforms can potentially expose sensitive information of the data to an adversary, e.g., in the case of an emotion classification task from text, an adversary-agnostic abstract representation of the text data may eventually lead an adversary to identify the demographics of the authors, such as their gender and age, etc. The leakage of sensitive information from the data may take place due to eavesdropping over the network or malware residing at the server. Privacy preserving computation workflows aim to prevent such leakage of sensitive information by introducing a suitable encoding transformation on sample data points. Such an encoding strategy has dual objectives, the first being that it should be difficult to reconstruct the original data in the absence of any knowledge of the encoding strategy and its parameters. Secondly, the computational results obtained using the encoded data should not be substantially different from those obtained using the same data in its original form. Standard encoding mechanisms, such as locality sensitive hashing (LSH), caters to the first objective of privacy preserving computation workflow, the second objective may not always be adequately satisfied. In this thesis, we focus on the second objective and the computational activity that we focus on is a supervised classification task in addition to the K-means clustering, which has been widely used for various data mining jobs. Here, we have addressed the problem of privacy preserving computation on the above two tasks in three different ways, Initially, we have proposed a new variant of the K-means algorithm which is capable of privacy preservation in the sense that it takes binary encoded data as input, and does not require access to the data in its original form at any stage of the computation. The proposed strategy is capable of producing the required number of clusters which are sufficiently close to the respective clusters computed from the original non-encoded data. The results of the proposed strategy on image or text data are either comparable or outperform the standard K-means clustering algorithm. Secondly, we have explored a deep metric learning approach to learn a parameterized encoding transformation with an objective of maximizing the alignment of the clusters obtained in the encoded space with the same obtained from the original data. To this end, we train a weakly supervised deep network using triplets constructed from the output of a clustering algorithm on a subset of the non-encoded data. Our proposed method of weakly- supervised approach yields more effective encoding in comparison to approaches where the encoding process is agnostic of the clustering objective. Finally, we propose a universal defense mechanism against malicious attempts of stealing sensitive information from data shared on cloud platforms. More specifically, our proposed method employs an informative subspace based multi-objective approach to produce a sensitive information aware encoding of the data representation. A number of experiments conducted on both standard text and image datasets demonstrate the ability of our proposed approach to reduce the effectiveness of the adversarial task without remarkably affecting the effectiveness of the primary task itself.
Tags from this library: No tags from this library for this title. Log in to add tags.
Holdings
Item type Current library Call number Status Notes Date due Barcode Item holds
THESIS ISI Library, Kolkata 006.31 B621 (Browse shelf(Opens below)) Available E-thesis Guided by Prof. Ujjwal Bhattacharya and Prof. Debasis Ganguly TH579
Total holds: 0

Thesis (Ph.D.) - Indian Statistical Institute, 2023

Includes bibliography

Introduction -- Related Work -- Preliminaries -- Datasets and Evaluation Metrics -- Privacy Aware Unsupervised Learning -- Privacy Aware Semi-Supervised Learning -- Privacy Aware Supervised Learning -- Privacy Aware Approximate Nearest Neighbour Search -- Conclusions and Future Work

Guided by Prof. Ujjwal Bhattacharya and Prof. Debasis Ganguly

Privacy preserving computation is of utmost importance in a cloud computing environ- ment where a client often requires to send sensitive data to servers, offering computing services, for computational purposes over untrusted networks. Sharing the raw or an ab- stract representation of a labelled or unlabelled dataset on cloud platforms can potentially expose sensitive information of the data to an adversary, e.g., in the case of an emotion classification task from text, an adversary-agnostic abstract representation of the text data may eventually lead an adversary to identify the demographics of the authors, such as their gender and age, etc. The leakage of sensitive information from the data may take place due to eavesdropping over the network or malware residing at the server. Privacy preserving computation workflows aim to prevent such leakage of sensitive information by introducing a suitable encoding transformation on sample data points. Such an encoding strategy has dual objectives, the first being that it should be difficult to reconstruct the original data in the absence of any knowledge of the encoding strategy and its parameters. Secondly, the computational results obtained using the encoded data should not be substantially different from those obtained using the same data in its original form. Standard encoding mechanisms, such as locality sensitive hashing (LSH), caters to the first objective of privacy preserving computation workflow, the second objective may not always be adequately satisfied. In this thesis, we focus on the second objective and the computational activity that we focus on is a supervised classification task in addition to the K-means clustering, which has been widely used for various data mining jobs. Here, we have addressed the problem of privacy preserving computation on the above two tasks in three different ways, Initially, we have proposed a new variant of the K-means algorithm which is capable of privacy preservation in the sense that it takes binary encoded data as input, and does not require access to the data in its original form at any stage of the computation. The proposed strategy is capable of producing the required number of clusters which are sufficiently close to the respective clusters computed from the original non-encoded data. The results of the proposed strategy on image or text data are either comparable or outperform the standard K-means clustering algorithm. Secondly, we have explored a deep metric learning approach to learn a parameterized encoding transformation with an objective of maximizing the alignment of the clusters obtained in the encoded space with the same obtained from the original data. To this end, we train a weakly supervised deep network using triplets constructed from the output of a clustering algorithm on a subset of the non-encoded data. Our proposed method of weakly- supervised approach yields more effective encoding in comparison to approaches where the encoding process is agnostic of the clustering objective. Finally, we propose a universal defense mechanism against malicious attempts of stealing sensitive information from data shared on cloud platforms. More specifically, our proposed method employs an informative subspace based multi-objective approach to produce a sensitive information aware encoding of the data representation. A number of experiments conducted on both standard text and image datasets demonstrate the ability of our proposed approach to reduce the effectiveness of the adversarial task without remarkably affecting the effectiveness of the primary task itself.

There are no comments on this title.

to post a comment.
Library, Documentation and Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata 700108, INDIA
Phone no. 91-33-2575 2100, Fax no. 91-33-2578 1412, ksatpathy@isical.ac.in