Online Public Access Catalogue (OPAC)
Library,Documentation and Information Science Division

“A research journal serves that narrow

borderland which separates the known from the unknown”

-P.C.Mahalanobis


Image from Google Jackets

Algorithms for feature selection: structure preservation, scale invariance, and stability/ Snehalika Lall

By: Material type: TextTextPublication details: Kolkata: Indian Statistical Institute, 2022Description: 154 pagesSubject(s): DDC classification:
  • 23 005.1 L193
Online resources:
Contents:
Introduction and Scope of the Thesis -- Structure Aware Principal Component Analysis for High Dimensional Data -- Stable Feature Selection using Copula in a Supervised Framework -- Feature Selection using Copula in an Unsupervised Framework -- Entropy based feature selection for high dimensional single cell RNA sequence data -- Generating realistic cell samples for gene selection in scRNA-seq data: A novel generative framework -- Conclusions and Future Scope of Research
Production credits:
  • Guided by Prof. Sanghamitra Bandyopadhyay
Dissertation note: Thesis (Ph.D.) - Indian Statistical Institute, 2022 Summary: With the advancement of science and technology, data has increased both in sample size and dimension. Examples of high-dimensional data include genomic data, text data, image retrieval, bioinformatics, etc. One of the major problems in handling such data is that all the features are not equally important. Hence, feature engineering, feature selection and feature reduction are considered important pre-processing tasks to discard redundant, irrelevant features while preserving the prominent features of the data as much as possible. Feature selection, in practice, often improves the accuracy of down-stream machine learning problems, including clustering and classification. In this thesis, we aim to devise some novel and robust feature selection mechanisms in diverse domains of applications with a special focus on high dimensional biological data such as gene expression and single cell transcriptomic data. We develop a series of feature selection techniques equipped with structure-aware data sampling at its core. We adopt several concepts from statistics (e.g. copula and its variant), information theory (entropy), and advanced machine learning domain (variational graph autoencoder, generative adversarial network, and its variant) to design the feature selection models for high dimensional and noisy data. The proposed models perform extremely well both in supervised and unsupervised cases, even if the sample size is very low. Important outcomes from all the proposed methods are discussed in chapters. Moreover, an overall discussion about the applicability along with a brief mention of the shortcomings of all the discussed methods is provided. Some suggestions and guidance are provided to overcome the disadvantages which direct the future scope of improvement of all the devised methods.
Tags from this library: No tags from this library for this title. Log in to add tags.
Holdings
Item type Current library Call number Status Notes Date due Barcode Item holds
THESIS ISI Library, Kolkata 005.1 L193 (Browse shelf(Opens below)) Available E-Thesis. Guided by Prof. Sanghamitra Bandyopadhyay TH568
Total holds: 0

Thesis (Ph.D.) - Indian Statistical Institute, 2022

Includes bibliography

Introduction and Scope of the Thesis -- Structure Aware Principal Component Analysis for High Dimensional Data -- Stable Feature Selection using Copula in a Supervised Framework -- Feature Selection using Copula in an Unsupervised Framework -- Entropy based feature selection for high dimensional single cell RNA sequence data -- Generating realistic cell samples for gene selection in scRNA-seq data: A novel generative framework -- Conclusions and Future Scope of Research

Guided by Prof. Sanghamitra Bandyopadhyay

With the advancement of science and technology, data has increased both in sample size and dimension. Examples of high-dimensional data include genomic
data, text data, image retrieval, bioinformatics, etc. One of the major problems in
handling such data is that all the features are not equally important. Hence, feature engineering, feature selection and feature reduction are considered important
pre-processing tasks to discard redundant, irrelevant features while preserving
the prominent features of the data as much as possible. Feature selection, in
practice, often improves the accuracy of down-stream machine learning problems,
including clustering and classification.
In this thesis, we aim to devise some novel and robust feature selection mechanisms in diverse domains of applications with a special focus on high dimensional
biological data such as gene expression and single cell transcriptomic data. We
develop a series of feature selection techniques equipped with structure-aware
data sampling at its core. We adopt several concepts from statistics (e.g. copula
and its variant), information theory (entropy), and advanced machine learning
domain (variational graph autoencoder, generative adversarial network, and its
variant) to design the feature selection models for high dimensional and noisy
data. The proposed models perform extremely well both in supervised and unsupervised cases, even if the sample size is very low. Important outcomes from all
the proposed methods are discussed in chapters. Moreover, an overall discussion
about the applicability along with a brief mention of the shortcomings of all the discussed methods is provided. Some suggestions and guidance are provided to overcome the disadvantages which direct the future scope of improvement of all the devised methods.

There are no comments on this title.

to post a comment.
Library, Documentation and Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata 700108, INDIA
Phone no. 91-33-2575 2100, Fax no. 91-33-2578 1412, ksatpathy@isical.ac.in