Commit 2f5f8afa authored by Aaron Saxton's avatar Aaron Saxton
Browse files

initial commit

parents
## Core latex/pdflatex auxiliary files:
*.aux
*.lof
*.log
*.lot
*.fls
*.out
*.toc
*.fmt
*.fot
*.cb
*.cb2
.*.lb
## Intermediate documents:
*.dvi
*.xdv
*-converted-to.*
# these rules might exclude image files for figures etc.
# *.ps
# *.eps
# *.pdf
## Generated if empty string is given at "Please type another file name for output:"
.pdf
## Bibliography auxiliary files (bibtex/biblatex/biber):
*.bbl
*.bcf
*.blg
*-blx.aux
*-blx.bib
*.run.xml
## Build tool auxiliary files:
*.fdb_latexmk
*.synctex
*.synctex(busy)
*.synctex.gz
*.synctex.gz(busy)
*.pdfsync
## Build tool directories for auxiliary files
# latexrun
latex.out/
## Auxiliary and intermediate files from other packages:
# algorithms
*.alg
*.loa
# achemso
acs-*.bib
# amsthm
*.thm
# beamer
*.nav
*.pre
*.snm
*.vrb
# changes
*.soc
# comment
*.cut
# cprotect
*.cpt
# elsarticle (documentclass of Elsevier journals)
*.spl
# endnotes
*.ent
# fixme
*.lox
# feynmf/feynmp
*.mf
*.mp
*.t[1-9]
*.t[1-9][0-9]
*.tfm
#(r)(e)ledmac/(r)(e)ledpar
*.end
*.?end
*.[1-9]
*.[1-9][0-9]
*.[1-9][0-9][0-9]
*.[1-9]R
*.[1-9][0-9]R
*.[1-9][0-9][0-9]R
*.eledsec[1-9]
*.eledsec[1-9]R
*.eledsec[1-9][0-9]
*.eledsec[1-9][0-9]R
*.eledsec[1-9][0-9][0-9]
*.eledsec[1-9][0-9][0-9]R
# glossaries
*.acn
*.acr
*.glg
*.glo
*.gls
*.glsdefs
# gnuplottex
*-gnuplottex-*
# gregoriotex
*.gaux
*.gtex
# htlatex
*.4ct
*.4tc
*.idv
*.lg
*.trc
*.xref
# hyperref
*.brf
# knitr
*-concordance.tex
# TODO Comment the next line if you want to keep your tikz graphics files
*.tikz
*-tikzDictionary
# listings
*.lol
# luatexja-ruby
*.ltjruby
# makeidx
*.idx
*.ilg
*.ind
# minitoc
*.maf
*.mlf
*.mlt
*.mtc[0-9]*
*.slf[0-9]*
*.slt[0-9]*
*.stc[0-9]*
# minted
_minted*
*.pyg
# morewrites
*.mw
# nomencl
*.nlg
*.nlo
*.nls
# pax
*.pax
# pdfpcnotes
*.pdfpc
# sagetex
*.sagetex.sage
*.sagetex.py
*.sagetex.scmd
# scrwfile
*.wrt
# sympy
*.sout
*.sympy
sympy-plots-for-*.tex/
# pdfcomment
*.upa
*.upb
# pythontex
*.pytxcode
pythontex-files-*/
# tcolorbox
*.listing
# thmtools
*.loe
# TikZ & PGF
*.dpth
*.md5
*.auxlock
# todonotes
*.tdo
# vhistory
*.hst
*.ver
# easy-todo
*.lod
# xcolor
*.xcp
# xmpincl
*.xmpi
# xindy
*.xdy
# xypic precompiled matrices
*.xyc
# endfloat
*.ttt
*.fff
# Latexian
TSWLatexianTemp*
## Editors:
# WinEdt
*.bak
*.sav
# Texpad
.texpadtmp
# LyX
*.lyx~
# Kile
*.backup
# KBibTeX
*~[0-9]*
# auto folder when using emacs and auctex
./auto/*
*.el
# expex forward references with \gathertags
*-tags.tex
# standalone packages
*.sta
# emacs
*~
\ No newline at end of file
%%% ====================================================================
%%% BibTeX-file{
%%% author = "Gerry Murray",
%%% version = "1.2",
%%% date = "2 April 2012",
%%% filename = "acmsmall-sample-bibfile.bib",
%%% address = "ACM, NY",
%%% email = "murray at hq.acm.org",
%%% codetable = "ISO/ASCII",
%%% keywords = "ACM Reference Format, bibliography, citation, references",
%%% supported = "yes",
%%% docstring = "This BibTeX database file contains 'bibdata' entries
%%% that 'match' the examples provided in the Specifications Document
%%% AND, also, 'legacy'-type bibs. It should assist authors in
%%% choosing the 'correct' at-bibtype and necessary bib-fields
%%% so as to obtain the appropriate ACM Reference Format output.
%%% It also contains many 'Standard Abbreviations'. "
%%% }
%%% ====================================================================
% Journals
% First the Full Name is given, then the abbreviation used in the AMS Math
% Reviews, with an indication if it could not be found there.
% Note the 2nd overwrites the 1st, so swap them if you want the full name.
%{AMS}
@String{AMSTrans = "American Mathematical Society Translations" }
@String{AMSTrans = "Amer. Math. Soc. Transl." }
@String{BullAMS = "Bulletin of the American Mathematical Society" }
@String{BullAMS = "Bull. Amer. Math. Soc." }
@String{ProcAMS = "Proceedings of the American Mathematical Society" }
@String{ProcAMS = "Proc. Amer. Math. Soc." }
@String{TransAMS = "Transactions of the American Mathematical Society" }
@String{TransAMS = "Trans. Amer. Math. Soc." }
%ACM
@String{CACM = "Communications of the {ACM}" }
@String{CACM = "Commun. {ACM}" }
@String{CompServ = "Comput. Surveys" }
@String{JACM = "J. ACM" }
@String{ACMMathSoft = "{ACM} Transactions on Mathematical Software" }
@String{ACMMathSoft = "{ACM} Trans. Math. Software" }
@String{SIGNUM = "{ACM} {SIGNUM} Newsletter" }
@String{SIGNUM = "{ACM} {SIGNUM} Newslett." }
@String{AmerSocio = "American Journal of Sociology" }
@String{AmerStatAssoc = "Journal of the American Statistical Association" }
@String{AmerStatAssoc = "J. Amer. Statist. Assoc." }
@String{ApplMathComp = "Applied Mathematics and Computation" }
@String{ApplMathComp = "Appl. Math. Comput." }
@String{AmerMathMonthly = "American Mathematical Monthly" }
@String{AmerMathMonthly = "Amer. Math. Monthly" }
@String{BIT = "{BIT}" }
@String{BritStatPsych = "British Journal of Mathematical and Statistical
Psychology" }
@String{BritStatPsych = "Brit. J. Math. Statist. Psych." }
@String{CanMathBull = "Canadian Mathematical Bulletin" }
@String{CanMathBull = "Canad. Math. Bull." }
@String{CompApplMath = "Journal of Computational and Applied Mathematics" }
@String{CompApplMath = "J. Comput. Appl. Math." }
@String{CompPhys = "Journal of Computational Physics" }
@String{CompPhys = "J. Comput. Phys." }
@String{CompStruct = "Computers and Structures" }
@String{CompStruct = "Comput. \& Structures" }
@String{CompJour = "The Computer Journal" }
@String{CompJour = "Comput. J." }
@String{CompSysSci = "Journal of Computer and System Sciences" }
@String{CompSysSci = "J. Comput. System Sci." }
@String{Computing = "Computing" }
@String{ContempMath = "Contemporary Mathematics" }
@String{ContempMath = "Contemp. Math." }
@String{Crelle = "Crelle's Journal" }
@String{GiornaleMath = "Giornale di Mathematiche" }
@String{GiornaleMath = "Giorn. Mat." } % didn't find in AMS MR., ibid.
%IEEE
@String{Computer = "{IEEE} Computer" }
@String{IEEETransComp = "{IEEE} Transactions on Computers" }
@String{IEEETransComp = "{IEEE} Trans. Comput." }
@String{IEEETransAC = "{IEEE} Transactions on Automatic Control" }
@String{IEEETransAC = "{IEEE} Trans. Automat. Control" }
@String{IEEESpec = "{IEEE} Spectrum" } % didn't find in AMS MR
@String{ProcIEEE = "Proceedings of the {IEEE}" }
@String{ProcIEEE = "Proc. {IEEE}" } % didn't find in AMS MR
@String{IEEETransAeroElec = "{IEEE} Transactions on Aerospace and Electronic
Systems" }
@String{IEEETransAeroElec = "{IEEE} Trans. Aerospace Electron. Systems" }
@String{IMANumerAna = "{IMA} Journal of Numerical Analysis" }
@String{IMANumerAna = "{IMA} J. Numer. Anal." }
@String{InfProcLet = "Information Processing Letters" }
@String{InfProcLet = "Inform. Process. Lett." }
@String{InstMathApp = "Journal of the Institute of Mathematics and
its Applications" }
@String{InstMathApp = "J. Inst. Math. Appl." }
@String{IntControl = "International Journal of Control" }
@String{IntControl = "Internat. J. Control" }
@String{IntNumerEng = "International Journal for Numerical Methods in
Engineering" }
@String{IntNumerEng = "Internat. J. Numer. Methods Engrg." }
@String{IntSuper = "International Journal of Supercomputing Applications" }
@String{IntSuper = "Internat. J. Supercomputing Applic." } % didn't find
%% in AMS MR
@String{Kibernetika = "Kibernetika" }
@String{JResNatBurStand = "Journal of Research of the National Bureau
of Standards" }
@String{JResNatBurStand = "J. Res. Nat. Bur. Standards" }
@String{LinAlgApp = "Linear Algebra and its Applications" }
@String{LinAlgApp = "Linear Algebra Appl." }
@String{MathAnaAppl = "Journal of Mathematical Analysis and Applications" }
@String{MathAnaAppl = "J. Math. Anal. Appl." }
@String{MathAnnalen = "Mathematische Annalen" }
@String{MathAnnalen = "Math. Ann." }
@String{MathPhys = "Journal of Mathematical Physics" }
@String{MathPhys = "J. Math. Phys." }
@String{MathComp = "Mathematics of Computation" }
@String{MathComp = "Math. Comp." }
@String{MathScand = "Mathematica Scandinavica" }
@String{MathScand = "Math. Scand." }
@String{TablesAidsComp = "Mathematical Tables and Other Aids to Computation" }
@String{TablesAidsComp = "Math. Tables Aids Comput." }
@String{NumerMath = "Numerische Mathematik" }
@String{NumerMath = "Numer. Math." }
@String{PacificMath = "Pacific Journal of Mathematics" }
@String{PacificMath = "Pacific J. Math." }
@String{ParDistComp = "Journal of Parallel and Distributed Computing" }
@String{ParDistComp = "J. Parallel and Distrib. Comput." } % didn't find
%% in AMS MR
@String{ParComputing = "Parallel Computing" }
@String{ParComputing = "Parallel Comput." }
@String{PhilMag = "Philosophical Magazine" }
@String{PhilMag = "Philos. Mag." }
@String{ProcNAS = "Proceedings of the National Academy of Sciences
of the USA" }
@String{ProcNAS = "Proc. Nat. Acad. Sci. U. S. A." }
@String{Psychometrika = "Psychometrika" }
@String{QuartMath = "Quarterly Journal of Mathematics, Oxford, Series (2)" }
@String{QuartMath = "Quart. J. Math. Oxford Ser. (2)" }
@String{QuartApplMath = "Quarterly of Applied Mathematics" }
@String{QuartApplMath = "Quart. Appl. Math." }
@String{RevueInstStat = "Review of the International Statisical Institute" }
@String{RevueInstStat = "Rev. Inst. Internat. Statist." }
%SIAM
@String{JSIAM = "Journal of the Society for Industrial and Applied
Mathematics" }
@String{JSIAM = "J. Soc. Indust. Appl. Math." }
@String{JSIAMB = "Journal of the Society for Industrial and Applied
Mathematics, Series B, Numerical Analysis" }
@String{JSIAMB = "J. Soc. Indust. Appl. Math. Ser. B Numer. Anal." }
@String{SIAMAlgMeth = "{SIAM} Journal on Algebraic and Discrete Methods" }
@String{SIAMAlgMeth = "{SIAM} J. Algebraic Discrete Methods" }
@String{SIAMAppMath = "{SIAM} Journal on Applied Mathematics" }
@String{SIAMAppMath = "{SIAM} J. Appl. Math." }
@String{SIAMComp = "{SIAM} Journal on Computing" }
@String{SIAMComp = "{SIAM} J. Comput." }
@String{SIAMMatrix = "{SIAM} Journal on Matrix Analysis and Applications" }
@String{SIAMMatrix = "{SIAM} J. Matrix Anal. Appl." }
@String{SIAMNumAnal = "{SIAM} Journal on Numerical Analysis" }
@String{SIAMNumAnal = "{SIAM} J. Numer. Anal." }
@String{SIAMReview = "{SIAM} Review" }
@String{SIAMReview = "{SIAM} Rev." }
@String{SIAMSciStat = "{SIAM} Journal on Scientific and Statistical
Computing" }
@String{SIAMSciStat = "{SIAM} J. Sci. Statist. Comput." }
@String{SoftPracExp = "Software Practice and Experience" }
@String{SoftPracExp = "Software Prac. Experience" } % didn't find in AMS MR
@String{StatScience = "Statistical Science" }
@String{StatScience = "Statist. Sci." }
@String{Techno = "Technometrics" }
@String{USSRCompMathPhys = "{USSR} Computational Mathematics and Mathematical
Physics" }
@String{USSRCompMathPhys = "{U. S. S. R.} Comput. Math. and Math. Phys." }
@String{VLSICompSys = "Journal of {VLSI} and Computer Systems" }
@String{VLSICompSys = "J. {VLSI} Comput. Syst." }
@String{ZAngewMathMech = "Zeitschrift fur Angewandte Mathematik und
Mechanik" }
@String{ZAngewMathMech = "Z. Angew. Math. Mech." }
@String{ZAngewMathPhys = "Zeitschrift fur Angewandte Mathematik und Physik" }
@String{ZAngewMathPhys = "Z. Angew. Math. Phys." }
% Publishers % ================================================= |
@String{Academic = "Academic Press" }
@String{ACMPress = "{ACM} Press" }
@String{AdamHilger = "Adam Hilger" }
@String{AddisonWesley = "Addison-Wesley" }
@String{AllynBacon = "Allyn and Bacon" }
@String{AMS = "American Mathematical Society" }
@String{Birkhauser = "Birkha{\"u}ser" }
@String{CambridgePress = "Cambridge University Press" }
@String{Chelsea = "Chelsea" }
@String{ClaredonPress = "Claredon Press" }
@String{DoverPub = "Dover Publications" }
@String{Eyolles = "Eyolles" }
@String{HoltRinehartWinston = "Holt, Rinehart and Winston" }
@String{Interscience = "Interscience" }
@String{JohnsHopkinsPress = "The Johns Hopkins University Press" }
@String{JohnWileySons = "John Wiley and Sons" }
@String{Macmillan = "Macmillan" }
@String{MathWorks = "The Math Works Inc." }
@String{McGrawHill = "McGraw-Hill" }
@String{NatBurStd = "National Bureau of Standards" }
@String{NorthHolland = "North-Holland" }
@String{OxfordPress = "Oxford University Press" } %address Oxford or London?
@String{PergamonPress = "Pergamon Press" }
@String{PlenumPress = "Plenum Press" }
@String{PrenticeHall = "Prentice-Hall" }
@String{SIAMPub = "{SIAM} Publications" }
@String{Springer = "Springer-Verlag" }
@String{TexasPress = "University of Texas Press" }
@String{VanNostrand = "Van Nostrand" }
@String{WHFreeman = "W. H. Freeman and Co." }
%Entries
@online{WilsonAS16,
author = "H. James Wilson and Allan Alter and Prashant Shukla",
year = "2016",
title = "Companies Are Reimagining Business Processes with Algorithms",
url = "https://hbr.org/2016/02/companies-are-reimagining-business-processes-with-algorithms",
lastaccessed = "April 9, 2019",
}
@online{MongoManShard,
title = "MongoDB Manual, Sharding",
url = "https://docs.mongodb.com/manual/sharding/",
lastaccessed = "April 9, 2019",
}
@online{MongoManConfigServ,
title = "MongoDB Manual, Config Server",
url = "https://docs.mongodb.com/manual/core/sharded-cluster-config-servers/",
lastaccessed = "April 9, 2019",
}
@online{MongoManRouter,
title = "MongoDB Manual, Mongos",
url = "https://docs.mongodb.com/manual/core/sharded-cluster-query-router/",
lastaccessed = "April 9, 2019",
}
@online{MongoManShardServ,
title = "MongoDB Manual, Shard",
url = "https://docs.mongodb.com/manual/core/sharded-cluster-shards/",
lastaccessed = "April 9, 2019",
}
@misc{akiba2017extremely,
title={Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes},
author={Takuya Akiba and Shuji Suzuki and Keisuke Fukuda},
year={2017},
eprint={1711.04325},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
%
% The first command in your LaTeX source must be the \documentclass command.
\documentclass[sigconf,review,anonymous=false]{acmart}
%
% \BibTeX command to typeset BibTeX logo in the docs
\AtBeginDocument{%
\providecommand\BibTeX{{%
\normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}
\usepackage{booktabs}
% Rights management information.
% This information is sent to you when you complete the rights form.
% These commands have SAMPLE values in them; it is your responsibility as an author to replace
% the commands and values with those provided to you when you complete the rights form.
%
% These commands are for a PROCEEDINGS abstract or paper.
\copyrightyear{2020}
\acmYear{2020}
\setcopyright{acmlicensed}
\acmConference[Super Computing '20]{Machine Learning for Computing Systems 2nd Workshop}{November 13, 2020}{Virtual}
%\acmBooktitle{Woodstock '18: ACM Symposium on Neural Gaze Detection, June 03--05, 2018, Woodstock, NY}
%\acmPrice{15.00}
%\acmDOI{10.1145/1122445.1122456}
%\acmISBN{978-1-4503-9999-9/18/06}
%
% These commands are for a JOURNAL article.
%\setcopyright{acmcopyright}
%\acmJournal{TOG}
%\acmYear{2018}\acmVolume{37}\acmNumber{4}\acmArticle{111}\acmMonth{8}
%\acmDOI{10.1145/1122445.1122456}
%
% Submission ID.
% Use this when submitting an article to a sponsored event. You'll receive a unique submission ID from the organizers
% of the event, and this ID should be used as the parameter to this command.
%\acmSubmissionID{123-A56-BU3}
%
% The majority of ACM publications use numbered citations and references. If you are preparing content for an event
% sponsored by ACM SIGGRAPH, you must use the "author year" style of citations and references. Uncommenting
% the next command will enable that style.
%\citestyle{acmauthoryear}
%
% end of the preamble, start of the body of the document source.
\begin{document}
%
% The "title" command has an optional parameter, allowing the author to define a "short title" to be used in page headers.
\title{ML Training Pipeline in HPC: A Use Case}
%
% The "author" command and its associated commands are used to define the authors and their affiliations.
% Of note is the shared affiliation of the first two authors, and the "authornote" and "authornotemark" commands
% used to denote shared contribution to the research.
\author{Aaron Saxton}
\email{saxton@illinois.edu}
\affiliation{%
\institution{University of Illinois}
\institution{National Center For Super Computing Applications}
\institution{Blue Waters Project Office}
\city{Urbana}
\state{Illinois}
}
%
% By default, the full list of authors will be used in the page headers. Often, this list is too long, and will overlap
% other information printed in the page headers. This command allows the author to define a more concise list
% of authors' names for this purpose.
\renewcommand{\shortauthors}{Saxton}
%
% The abstract is a short summary of the work to be presented in the article.
\begin{abstract}
Developing ML algorithms is as much about the data as it is the model. The success of "ResNet 50 in 15 min"\cite{akiba2017extremely} showed that it's possible to scale model training, but ImageNet was major contributing factor to its success having been carefully curated. Frameworks like Tensorflow, PyTorch, and Flux simplified many aspects of model design and left the data curation and wrangling up to the practitioner. As a result, applications new novel of ML to original dataset often struggle achieving the same scalability as "ResNet 50 in 15 min"\cite{akiba2017extremely}. One hinderance is scaleable and queryable access to truly large datasets. In this paper we describe a HPC workflow that has allowed the Blue Waters team to develop ML models on large and previously un-curated data.
\end{abstract}
%
% The code below is generated by the tool at http://dl.acm.org/ccs.cfm.
% Please copy and paste the code instead of the example below.
%
\begin{CCSXML}
<ccs2012>
<concept>
<concept_id>10002951.10002952</concept_id>
<concept_desc>Information systems~Data management systems</concept_desc>
<concept_significance>500</concept_significance>
</concept>
</ccs2012>
\end{CCSXML}
%\ccsdesc[500]{}
%
% Keywords. The author(s) should pick words that accurately describe the work being
% presented. Separate the keywords with commas.
%\keywords{distributed datastore, Mongodb, high performentce computing, shard filesystem}
%
% This command processes the author and affiliation and title information and builds
% the first part of the formatted document.
\maketitle
\input{HPC_AI_Pipeline_Introduction.tex}
%
% The acknowledgments section is defined using the "acks" environment (and NOT an unnumbered section). This ensures
% the proper identification of the section in the article metadata, and the consistent spelling of the heading.
\begin{acks}
This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
\end{acks}
%
% The next two lines define the bibliography style to be used, and the