
This book is an introduction to the modelbased approach to survey sampling. It consists of three parts, with Part I focusing on estimation of population totals. Chapters 1 and 2 introduce survey sampling, and the modelbased approach, respectively. Chapter 3 considers the simplest possible model, the homogenous population model, which is then extended to stratified populations in Chapter 4. Chapter 5 discusses simple linear regression models for populations, and Chapter 6 considers clustered populations. The general linear population model is then used to integrate these results in Chapter 7. Part II of this book considers the properties of estimators based on incorrectly specified models. Chapter 8 develops robust sample designs that lead to unbiased predictors under model misspecification, and shows how flexible modelling methods like nonparametric regression can be used in survey sampling. Chapter 9 extends this development to misspecfication robust prediction variance estimators and Chapter 10 completes Part II of the book with an exploration of outlier robust sample survey estimation. Chapters 11 to 17 constitute Part III of the book and show how modelbased methods can be used in a variety of problem areas of modern survey sampling. They cover (in order) prediction of nonlinear population quantities, subsampling approaches to prediction variance estimation, design and estimation for multipurpose surveys, prediction for domains, small area estimation, efficient prediction of population distribution functions and the use of transformations in survey inference. The book is designed to be accessible to undergraduate and graduate level students with a good grounding in statistics and applied survey statisticians seeking an introduction to modelbased survey design and estimation.

This book has its origin in the need for developing and analyzing mathematical models for phenomena that evolve in time and influence each another, and aims at a better understanding of the structure and asymptotic behavior of stochastic processes. This monograph has double scope. First, to present tools for dealing with dependent structures directed toward obtaining normal approximations. Second, to apply the normal approximations presented in the book to various examples. The main tools consist of inequalities for dependent sequences of random variables, leading to limit theorems, including the functional central limit theorem (CLT) and functional moderate deviation principle (MDP). The results will point out large classes of dependent random variables which satisfy invariance principles, making possible the statistical study of data coming from stochastic processes both with short and long memory. Over the course of the book different types of dependence structures are considered, ranging from the traditional mixing structures to martingalelike structures and to weakly negatively dependent structures, which link the notion of mixing to the notions of association and negative dependence. Several applications have been carefully selected to exhibit the importance of the theoretical results. They include random walks in random scenery and determinantal processes. In addition, due to their importance in analyzing new data in economics, linear processes with dependent innovations will also be considered and analyzed.

Thorvald Nicolai Thiele was a brilliant Danish researcher of the 19th century. He was a professor of Astronomy at the University of Copenhagen and the founder of Hafnia, the first Danish private insurance company. Thiele worked in astronomy, mathematics, actuarial science, and statistics, his most spectacular contributions were in the latter two areas, where his published work was far ahead of his time. This book is concerned with his statistical work. It evolves around his three main statistical masterpieces, which are now translated into English for the first time: 1) his article from 1880 where he derives the Kalman filter; 2) his book from 1889, where he lays out the subject of statistics in a highly original way, derives the halfinvariants (today known as cumulants), the notion of likelihood in the case of binomial experiments, the canonical form of the linear normal model, and develops model criticism via analysis of residuals; and 3) an article from 1899 where he completes the theory of the halfinvariants. This book also contains three chapters, written by A. Hald and S. L. Lauritzen, which describe Thiele's statistical work in modern terms and puts it into an historical perspective.

Procrustean methods are used to transform one set of data to represent another set of data as closely as possible. This book unifies several strands in the literature and contains new algorithms. It focuses on matching two or more configurations by using orthogonal, projection, and oblique axes transformations. Groupaverage summaries play an important part, and links with other groupaverage methods are discussed. The text is multidisciplinary and also presents a unifying ANOVA framework.

This book provides the first comprehensive treatment of Benford's law, the surprising logarithmic distribution of significant digits discovered in the late nineteenth century. Establishing the mathematical and statistical principles that underpin this intriguing phenomenon, the text combines uptodate theoretical results with overviews of the law's colorful history, rapidly growing body of empirical evidence, and wide range of applications. The book begins with basic facts about significant digits, Benford functions, sequences, and random variables, including tools from the theory of uniform distribution. After introducing the scale, base, and suminvariance characterizations of the law, the book develops the significantdigit properties of both deterministic and stochastic processes, such as iterations of functions, powers of matrices, differential equations, and products, powers, and mixtures of random variables. Two concluding chapters survey the finitely additive theory and the flourishing applications of Benford's law. Carefully selected diagrams, tables, and close to 150 examples illuminate the main concepts throughout. The book includes many open problems, in addition to dozens of new basic theorems and all the main references. A distinguishing feature is the emphasis on the surprising ubiquity and robustness of the significantdigit law. The book can serve as both a primary reference and a basis for seminars and courses.

This book sets out a body of rigorous mathematical theory for finite graphs with nodes placed randomly in Euclidean dspace according to a common probability density, and edges added to connect points that are close to each other. As an alternative to classical random graph models, these geometric graphs are relevant to the modelling of real networks having spatial content, arising for example in wireless communications, parallel processing, classification, epidemiology, astronomy, and the internet. Their study illustrates numerous techniques of modern stochastic geometry, including Stein's method, martingale methods, and continuum percolation. Typical results in the book concern properties of a graph G on n random points with edges included for interpoint distances up to r, with the parameter r dependent on n and typically small for large n. Asymptotic distributional properties are derived for numerous graph quantities. These include the number of copies of a given finite graph embedded in G, the number of isolated components isomorphic to a given graph, the empirical distributions of vertex degrees, the clique number, the chromatic number, the maximum and minimum degree, the size of the largest component, the total number of components, and the connectivity of the graph.

At the crossroads between statistics and machine learning, probabilistic graphical models provide a powerful formal framework to model complex data. Probabilistic graphical models are probabilistic models whose graphical components denote conditional independence structures between random variables. The probabilistic framework makes it possible to deal with data uncertainty while the conditional independence assumption helps process high dimensional and complex data. Examples of probabilistic graphical models are Bayesian networks and Markov random fields, which represent two of the most popular classes of such models. With the rapid advancements of highthroughput technologies and the ever decreasing costs of these next generation technologies, a fastgrowing volume of biological data of various types—the socalled omics—is in need of accurate and efficient methods for modeling, prior to further downstream analysis. Network reconstruction from gene expression data represents perhaps the most emblematic area of research where probabilistic graphical models have been successfully applied. However these models have also created renew interest in genetics, in particular: association genetics, causality discovery, prediction of outcomes, detection of copy number variations, epigenetics, etc.. For all these reasons, it is foreseeable that such models will have a prominent role to play in advances in genomewide analyses.

The Valencia International Meetings on Bayesian Statistics – established in 1979 and held every four years – have been the forum for a definitive overview of current concerns and activities in Bayesian statistics. These are the edited Proceedings of the Ninth meeting, and contain the invited papers each followed by their discussion and a rejoinder by the author(s). In the tradition of the earlier editions, this encompasses an enormous range of theoretical and applied research, highlighting the breadth, vitality and impact of Bayesian thinking in interdisciplinary research across many fields as well as the corresponding growth and vitality of core theory and methodology. The Valencia 9 invited papers cover a broad range of topics, including foundational and core theoretical issues in statistics, the continued development of new and refined computational methods for complex Bayesian modelling, substantive applications of flexible Bayesian modelling, and new developments in the theory and methodology of graphical modelling. They also describe advances in methodology for specific applied fields, including financial econometrics and portfolio decision making, public policy applications for drug surveillance, studies in the physical and environmental sciences, astronomy and astrophysics, climate change studies, molecular biosciences, statistical genetics or stochastic dynamic networks in systems biology.

This book discusses novel advances in informatics and statistics in molecular cancer research. Through eight chapters it discusses specific topics in cancer research, talks about how the topics give rise to development of new informatics and statistics tools, and explains how the tools can be applied. The focus of the book is to provide an understanding of key concepts and tools, rather than focusing on technical issues. A main theme is the extensive use of array technologies in modern cancer research — gene expression and exon arrays, SNP and copy number arrays and methylation arrays — to derive quantitative and qualitative statements about cancer, its progression and aetiology, and to understand how these technologies at one hand allow us learn about cancer tissue as a complex system and at the other hand allow us to pinpoint key genes and events as crucial for the development of the disease. Cancer is characterized by genetic and genomic alterations that influence all levels of the cell's machinery and function.

Several recent advances in smoothing and semiparametric regression are presented in this book from a unifying, Bayesian perspective. Simulationbased full Bayesian Markov chain Monte Carlo (MCMC) inference, as well as empirical Bayes procedures closely related to penalized likelihood estimation and mixed models, are considered here. Throughout, the focus is on semiparametric regression and smoothing based on basis expansions of unknown functions and effects in combination with smoothness priors for the basis coefficients. Beginning with a review of basic methods for smoothing and mixed models, longitudinal data, spatial data, and event history data are treated in separate chapters. Worked examples from various fields such as forestry, development economics, medicine, and marketing are used to illustrate the statistical methods covered in this book. Most of these examples have been analysed using implementations in the Bayesian software, BayesX, and some with R Codes.

Starting with the construction of stochastic processes, the book introduces Brownian motion and martingales. After proving the DoobMeyer decomposition, quadratic variation processes and local martingales are discussed. The book proceeds to construct stochastic integrals, prove the Itô formula, derive several important applications of the formula such as the martingale representation theorem and the BurkhölderDavisGundy inequality, and establish the Girsanov theorem on change of measures. Next, attention is focused on stochastic differential equations which arise in modeling physical phenomena, perturbed by random forces. Diffusion processes are solutions of stochastic differential equations and form the main theme of this book. After establishing the existence and uniqueness of strong solutions to stochastic differential equations, weak solutions and martingale problems posed by stochastic differential equations are studied in detail. The StroockVaradhan martingale problem is a powerful tool in solving stochastic differential equations and is discussed in a separate chapter. The connection between diffusion processes and partial differential equations is quite important and fruitful. Probabilistic representations of solutions of partial differential equations, and a derivation of the Kolmogorov forward and backward equations are provided. Gaussian solutions of stochastic differential equations, and Markov processes with jumps are presented in successive chapters. The final objective of the book consists in giving a careful treatment of the probabilistic behavior of diffusions such as existence and uniqueness of invariant measures, ergodic behavior, and large deviations principle in the presence of small noise.

An antidote to techniqueoriented service courses, this book studiously avoids the recipebook style and keeps algebraic details of specific statistical methods to the minimum extent necessary to understand the underlying concepts. Instead, it aims to give the reader a clear understanding of how core statistical ideas of experimental design, modelling, and data analysis are integral to the scientific method. Aimed primarily towards a range of scientific disciplines (albeit with a bias towards the biological, environmental, and health sciences), this book assumes some maturity of understanding of scientific method, but does not require any prior knowledge of statistics, or any mathematical knowledge beyond basic algebra and a willingness to come to terms with mathematical notation. Any statistical analysis of a realistically sized dataset requires the use of specially written computer software. An Appendix introduces the reader to our opensource software of choice. All of the material in the book can be understood without using either R or any other computer software.

This book discusses the fitting of parametric statistical models to data samples. Emphasis is placed on (i) how to recognize situations where the problem is nonstandard, when parameter estimates behave unusually, and (ii) the use of parametric bootstrap resampling methods in analysing such problems. Simple and practical model building is an underlying theme. A frequentist viewpoint based on likelihood is adopted, for which there is a wellestablished and very practical theory. The standard situation is where certain widely applicable regularity conditions hold. However, there are many apparently innocuous situations where standard theory breaks down, sometimes spectacularly. Most of the departures from regularity are described geometrically in the book, with mathematical detail only sufficient to clarify the nonstandard nature of a problem and to allow formulation of practical solutions. The book is intended for anyone with a basic knowledge of statistical methods typically covered in a university statistical inference course who wishes to understand or study how standard methodology might fail. Simple, easytounderstand statistical methods are presented which overcome these difficulties, and illustrated by detailed examples drawn from real applications. Parametric bootstrap resampling is used throughout for analysing the properties of fitted models, illustrating its ease of implementation even in nonstandard situations. Distributional properties are obtained numerically for estimators or statistics not previously considered in the literature because their theoretical distributional properties are too hard to obtain theoretically. Bootstrap results are presented mainly graphically in the book, providing easytounderstand demonstration of the sampling behaviour of estimators.

This book gives a comprehensive and selfcontained introduction to the theory of symmetric Markov processes and symmetric quasiregular Dirichlet forms. In a detailed and accessible manner, the book covers the essential elements and applications of the theory of symmetric Markov processes, including recurrence/transience criteria, probabilistic potential theory, additive functional theory, and time change theory. The book develops the theory in a general framework of symmetric quasiregular Dirichlet forms in a unified manner with that of regular Dirichlet forms, emphasizing the role of extended Dirichlet spaces and the rich interplay between the probabilistic and analytic aspects of the theory. It then addresses the latest advances in the theory, presented here for the first time in any book. Topics include the characterization of timechanged Markov processes in terms of Douglas integrals and a systematic account of reflected Dirichlet spaces, and the important roles such advances play in the boundary theory of symmetric Markov processes. This book is an ideal resource for researchers and practitioners, and can also serve as a textbook for advanced graduate students. It includes examples, appendixes, and exercises with solutions.

This book presents analytics within a framework of mathematical theory and concepts, building upon firm theory and foundations of probability theory, graphs, and networks, random matrices, linear algebra, optimization, forecasting, discrete dynamical systems, and more. Following on from the theoretical considerations, applications are given to data from commercially relevant interests: supermarket baskets; loyalty cards; mobile phone call records; smart meters; ‘omic‘ data; sales promotions; social media; and microblogging. Each chapter tackles a topic in analytics: social networks and digital marketing; forecasting; clustering and segmentation; inverse problems; Markov models of behavioural changes; multiple hypothesis testing and decisionmaking; and so on. Chapters start with background mathematical theory explained with a strong narrative and then give way to practical considerations and then to exemplar applications.

Bayesian epistemology aims to answer the following question: How strongly should an agent believe the various propositions expressible in her language? Subjective Bayesians hold that.it is largely (though not entirely) up to the agent as to which degrees of belief to adopt. Objective Bayesians, on the other hand, maintain that appropriate degrees of belief are largely (though not entirely) determined by the agent's evidence. This book states and defends a version of objective Bayesian epistemology. According to this version, objective Bayesianism is characterized by three norms: (i) Probability: degrees of belief should be probabilities; (ii) Calibration: they should be calibrated with evidence; and (iii) Equivocation: they should otherwise equivocate between basic outcomes. Objective Bayesianism has been challenged on a number of different fronts: for example, it has been accused of being poorly motivated, of failing to handle qualitative evidence, of yielding counter‐intuitive degrees of belief after updating, of suffering from a failure to learn from experience, of being computationally intractable, of being susceptible to paradox, of being language dependent, and of not being objective enough. The book argues that these criticisms can be met and that objective Bayesianism is a promising theory with an exciting agenda for further research.

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from postgenomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents stateoftheart realization theory for hidden Markov models. Among biological applications, it offers an indepth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.

This book provides the mathematical foundations for the analysis of a class of degenerate elliptic operators defined on manifolds with corners, which arise in a variety of applications such as population genetics, mathematical finance, and economics. The results discussed in this book prove the uniqueness of the solution to the martingale problem and therefore the existence of the associated Markov process. The book uses an “integral kernel method” to develop mathematical foundations for the study of such degenerate elliptic operators and the stochastic processes they define. The precise nature of the degeneracies of the principal symbol for these operators leads to solutions of the parabolic and elliptic problems that display novel regularity properties. Dually, the adjoint operator allows for rather dramatic singularities, such as measures supported on high codimensional strata of the boundary. The book establishes the uniqueness, existence, and sharp regularity properties for solutions to the homogeneous and inhomogeneous heat equations, as well as a complete analysis of the resolvent operator acting on Hölder spaces. It shows that the semigroups defined by these operators have holomorphic extensions to the right half plane. The book also demonstrates precise asymptotic results for the longtime behavior of solutions to both the forward and backward Kolmogorov equations.

The Joy of Statistics consists of a series of 42 “short stories,” each illustrating how elementary statistical methods are applied to data to produce insight and solutions to the questions data are collected to answer. The text contains brief histories of the evolution of statistical methods and a number of brief biographies of the most famous statisticians of the 20th century. Also throughout are a few statistical jokes, puzzles, and traditional stories. The level of the Joy of Statistics is elementary and explores a variety of statistical applications using graphs and plots, along with detailed and intuitive descriptions and occasionally using a bit of 10th grade mathematics. Examples of a few of the topics are gambling games such as roulette, blackjack, and lotteries as well as more serious subjects such as comparison of black/white infant mortality rates, coronary heart disease risk, and ethnic differences in Hodgkin’s disease. The statistical description of these methods and topics are accompanied by easy to understand explanations labeled “how it works.”

This book presents a comprehensive treatment of the state space approach to time series analysis. The distinguishing feature of state space time series models is that observations are regarded as being made up of distinct components such as trend, seasonal, regression elements and disturbance elements, each of which is modelled separately. The techniques that emerge from this approach are very flexible. Part I presents a full treatment of the construction and analysis of linear Gaussian state space models. The methods are based on the Kalman filter and are appropriate for a wide range of problems in practical time series analysis. The analysis can be carried out from both classical and Bayesian perspectives. Part I then presents illustrations to real series and exercises are provided for a selection of chapters. Part II discusses approximate and exact approaches for handling broad classes of nonGaussian and nonlinear state space models. Approximate methods include the extended Kalman filter and the more recently developed unscented Kalman filter. The book shows that exact treatments become feasible when simulationbased methods such as importance sampling and particle filtering are adopted. Bayesian treatments based on simulation methods are also explored.