Jump to ContentJump to Main Navigation
Statistics, Data Mining, and Machine Learning in AstronomyA Practical Python Guide for the Analysis of Survey Data$
Users without a subscription are not able to see the full content.

Željko Ivezic, Andrew J. Connolly, Jacob T VanderPlas, and Alexander Gray

Print publication date: 2014

Print ISBN-13: 9780691151687

Published to Princeton Scholarship Online: October 2017

DOI: 10.23943/princeton/9780691151687.001.0001

Show Summary Details
Page of

PRINTED FROM PRINCETON SCHOLARSHIP ONLINE (www.princeton.universitypressscholarship.com). (c) Copyright Princeton University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in HSO for personal use (for details see www.princeton.universitypressscholarship.com/page/privacy-policy).date: 09 December 2018

Fast Computation on Massive Data Sets

Fast Computation on Massive Data Sets

Chapter:
(p.43) 2 Fast Computation on Massive Data Sets
Source:
Statistics, Data Mining, and Machine Learning in Astronomy
Author(s):

Željko Ivezi

Andrew J. Connolly

Jacob T. VanderPlas

Alexander Gray

Željko Ivezi

Andrew J. Connolly

Jacob T. VanderPlas

Alexander Gray

Publisher:
Princeton University Press
DOI:10.23943/princeton/9780691151687.003.0002

This chapter describes basic concepts and tools for tractably performing the computations described in the rest of this book. The need for fast algorithms for such analysis subroutines is becoming increasingly important as modern data sets are approaching billions of objects. With such data sets, even analysis operations whose computational cost is linearly proportional to the size of the data set present challenges, particularly since statistical analyses are inherently interactive processes, requiring that computations complete within some reasonable human attention span. For more sophisticated machine learning algorithms, the often worse-than-linear runtimes of straightforward implementations become quickly unbearable. The chapter looks at some techniques that can reduce such runtimes in a rigorous manner that does not sacrifice the accuracy of the analysis through unprincipled approximations. This is far more important than simply speeding up calculations: in practice, computational performance and statistical performance can be intimately linked. The ability of a researcher, within his or her effective time budget, to try more powerful models or to search parameter settings for each model in question, leads directly to better fits and predictions.

Keywords:   fast algorithms, runtimes, data management, algorithmic efficiency, machine learning algorithms, computations

Princeton Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs , and if you can't find the answer there, please contact us.