Sample and query complexities of some estimation problems/ Sayantan Sen

By:

Sen, Sayantan

Material type: Text

TextPublication details: Kolkata: Indian Statistical Institute, 2023Description: xii, 297 pagesSubject(s):

DDC classification:

23rd. SA.031 Sa274

Online resources:

Full Text

Contents:

Introduction -- Preliminaries -- Results in the Sampling Model -- Results in the Huge Object Model -- Results in the Adjacency matrix Model

Production credits:

Guided by Prof. Sourav Chakraborty

Dissertation note: Thesis (Ph.D) - Indian Statistical Institute, 2023 Summary: Given data from some experiment, inferring information from the underlying distribution is of prime importance, and has been extensively studied. However, due to the huge size of the data, traditional methods are often no longer applicable. Thus new tools and techniques are being developed for inferring useful information from large amounts of data. This thesis makes progress in this direction. The primary goal is to design efficient randomized algorithms aka. testers that can distinguish whether a given unknown object is “close” or “far” from a property of interest with as few accesses as possible. This is referred to as distribution testing when the unknown object is a probability distribution, and graph property testing when it is a graph. The minimum number of samples required to decide a property in distribution testing is referred to as sample complexity, while in graph property testing, it is referred to as query complexity. In this thesis, we study several fundamental problems in distribution and graph property testing such as (i) Can one design a tolerant tester for any distribution property with only black-box access to a non-tolerant tester? (ii) Does there exist distribution properties with global structure that can be learnt efficiently? (iii) the role of adaptivity in distribution testing, and tolerant testing for (iv) graph isomorphism and (v) bipartiteness. The results of the thesis are divided into three parts. In the first part, we study the connection between the sample complexities of non-tolerant and tolerant testing of distributions and prove a tight quadratic gap for label-invariant (symmetric) properties, while providing lower bounds for non-concentrated properties. We also present an algorithm that can learn a concentrated distribution even when its support set is unknown apriori. In the second part, we investigate problems (ii) and (iii) in huge object model, where distributions are defined over n-dimensional Hamming cube and the tester obtains nbit strings as samples. Since reading the string in its entirety may not be feasible for large n, the tester has query access to the sampled strings. We define the notion of index-invariant properties, properties that are invariant under the permutations of the indices {1, . . . , n} and prove that any index-invariant property whose VC-dimension is bounded has a tester whose query complexity is independent of n and depends only on VC-dimension. Moreover, the dependencies of sample and query complexities of our tester on the VC-dimension are tight. We also study the power of adaptiveness in this model and prove a tight quadratic separation between query complexities of adaptive and non-adaptive testers for index-invariant properties, compared to tight exponential separation for its non-index-invariant counterpart. In the third part, we study property testing of dense graphs and give positive answers to problems (iv) and (v). We prove that tolerant graph isomorphism testing is equivalent to the problem of estimating the Earth Mover Distance of two distributions, constructed from the graphs. Moreover, our equivalence proof is model-independent. Finally, we design a tester for tolerant bipartiteness testing whose query and time complexities are significantly better compared to previous works.

Tags from this library: No tags from this library for this title. Log in to add tags.

Holdings
Item type	Current library	Call number	Status	Notes	Date due	Barcode	Item holds
THESIS	ISI Library, Kolkata	SA.031 Sa274 (Browse shelf(Opens below))	Available	E-Thesis Guided by Prof. Sourav Chakraborty		TH589

Total holds: 0

Browsing ISI Library, Kolkata shelves Close shelf browser (Hides shelf browser)

Previous				No cover image available		No cover image available		Next
Previous	SA.031 J66 Distributions in statistics: continuous univariate distributions - 2/	SA.031 K18 Handbook of fitting statistical distribution with R/	SA.031 M847 Advanced statistics: statistical distributions/	SA.031 Sa274 Sample and query complexities of some estimation problems/	SA.031 W947 Statistical distributions: a comprehensive approach/	SA.044 M271 Multi-View discriminant canonical correlation analysis:	SA.05 St874 Introduction to statistics: analyzing data/	Next

Thesis (Ph.D) - Indian Statistical Institute, 2023

Includes bibliography

Introduction -- Preliminaries -- Results in the Sampling Model -- Results in the Huge Object Model -- Results in the Adjacency matrix Model

Guided by Prof. Sourav Chakraborty

Given data from some experiment, inferring information from the underlying distribution is of prime importance, and has been extensively studied. However, due to the
huge size of the data, traditional methods are often no longer applicable. Thus new tools
and techniques are being developed for inferring useful information from large amounts
of data. This thesis makes progress in this direction.
The primary goal is to design efficient randomized algorithms aka. testers that can
distinguish whether a given unknown object is “close” or “far” from a property of interest
with as few accesses as possible. This is referred to as distribution testing when the
unknown object is a probability distribution, and graph property testing when it is a
graph. The minimum number of samples required to decide a property in distribution
testing is referred to as sample complexity, while in graph property testing, it is referred
to as query complexity.
In this thesis, we study several fundamental problems in distribution and graph property testing such as (i) Can one design a tolerant tester for any distribution property with
only black-box access to a non-tolerant tester? (ii) Does there exist distribution properties with global structure that can be learnt efficiently? (iii) the role of adaptivity in
distribution testing, and tolerant testing for (iv) graph isomorphism and (v) bipartiteness.
The results of the thesis are divided into three parts. In the first part, we study
the connection between the sample complexities of non-tolerant and tolerant testing of
distributions and prove a tight quadratic gap for label-invariant (symmetric) properties,
while providing lower bounds for non-concentrated properties. We also present an algorithm that can learn a concentrated distribution even when its support set is unknown
apriori.
In the second part, we investigate problems (ii) and (iii) in huge object model, where
distributions are defined over n-dimensional Hamming cube and the tester obtains nbit strings as samples. Since reading the string in its entirety may not be feasible for
large n, the tester has query access to the sampled strings. We define the notion of
index-invariant properties, properties that are invariant under the permutations of the indices {1, . . . , n} and prove that any index-invariant property whose VC-dimension is
bounded has a tester whose query complexity is independent of n and depends only on
VC-dimension. Moreover, the dependencies of sample and query complexities of our
tester on the VC-dimension are tight. We also study the power of adaptiveness in this
model and prove a tight quadratic separation between query complexities of adaptive
and non-adaptive testers for index-invariant properties, compared to tight exponential
separation for its non-index-invariant counterpart.
In the third part, we study property testing of dense graphs and give positive answers
to problems (iv) and (v). We prove that tolerant graph isomorphism testing is equivalent
to the problem of estimating the Earth Mover Distance of two distributions, constructed
from the graphs. Moreover, our equivalence proof is model-independent. Finally, we
design a tester for tolerant bipartiteness testing whose query and time complexities are
significantly better compared to previous works.

There are no comments on this title.

to post a comment.

Place hold
Print
Add to your cart (remove)
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com)