Strategic Planning: Data, Models and Statistics

Brian S. Yandell, Chair, 29 February 2012

We now live in an information age with access to huge amounts of data in our daily lives through IT advances, but with great uncertainty about what these data actually mean. These data arise from business, manufacturing, finance, science, medicine, engineering, political science and a great many other fields. Every UW student must be quantitatively literate to survive and thrive in their future careers, regardless of area. With few exceptions, to advance and remain competitive campus researchers must have the skills and tools to analyze and interpret the massive data streams that permeate the contemporary research world. Training in statistical methods, and collaboration with statisticians, is central to this vital part of the University of Wisconsin-Madison mission.

Professor Paul Ahlquist, director of the virology group of MIR and working with the Statistics Department on the first joint UW-MIR faculty hire, recently stated that new advances in cancer biology are “increasingly linked with the creative application of computational and statistical methods”. Many other prominent faculty at UW-Madison, including Jamie Thomson, David Schwartz and Alan Attie, highly value their collaborations with statistics colleagues on forefront research. The rise of statistics as an essential tool in society has been widely noticed in the New York Times and other media (see “Arcane statistical analysis, the business of making sense of our growing data mountains, has become high tech’s hottest calling.” “In field after field, … the digital data surge only promises to accelerate.” “The big problem is going to be the ability of humans to use, analyze and make sense of the data.” “Companies from all sectors look for statistics experts, including pharmaceutical and insurance companies and Wall Street firms.... 'You need statistical analysis to do anything regarding research and to assess various alternatives, whether it’s in alternative energy or health care'.”

Among quantitative sciences, statistics is at the center of data analysis and interpretation. The Department of Statistics, and the affiliated Biometry Program and Department of Biostatistics & Medical Informatics, have been extremely active and successful leaders in teaching and research of quantitative methods over the past 50 years. However, the information age is changing rapidly, expanding into more and more fields, and the size of the data acquisition is orders of magnitude larger than ever existed or imagined before. The statistical methods to analyze these mega data sets have to a large extent not yet been developed. For now, we must rely on inadequate, legacy methods designed for simpler problems. To keep pace, the Department of Statistics has reviewed its current teaching and training programs, examined its faculty research portfolio, and reflected on campus and society needs to develop this strategic plan for the next decade.

We propose a revival and strengthening of George Box's 1960 vision wherein the department of statistics has a strong central core of faculty appointed at 100% in the department, combined with specialists with joint appointments who collaborate with researchers across the campus. Today, however, we need new resources coupled with a renewed vision to excel. Other top statistics programs across the country are moving in this direction (e.g. University of California-Berkeley, Stanford University, and University of Chicago). More broadly, many universities are increasingly recognizing the need to examine data in context of its source—precisely the domain of statistics—as a central component of training for all students.

To achieve a renewed vision, the Department will continue to transform its training program, including its curriculum. Implementing this strategic plan will require an increase in personnel, both 100% and joint appointments. Over the next 5-10 years, we propose hiring 3 new staff FTE and 5 new tenure-track faculty FTE, two or three to be at a more senior level, to address these mushrooming needs. Staff FTE will cover undergraduate advising and oversight of undergraduate instruction, instructional computing and tools for data analysis, and research computing and big data analytics infrastructure. New statistics faculty will focus on tomorrow’s big data problems, with emphasis on joint appointments in research disciplines and in other campus data sciences programs.

Strategic Design Themes

This strategic plan embodies three broad themes essential for teaching, training, methodology development, and research collaboration focused on modern statistical analysis and inference.
• Students learn statistical concepts best through experience with data. To illustrate how the theoretical results work, instructors need to teach students basic skills of data management and analysis in settings similar to science laboratories. Having centralized staff and teaching resources will eliminate duplication, reduce confusion, and improve educational outcomes.
• Today’s problems require strong data modeling skills to derive methods to analyze big data sets with complicated structures. These pervade collaborative research and are increasingly involved in instruction in many degree programs. Having staff knowledgeable in best practices, and new faculty conversant in big data methods for statistical models, will deepen and enhance the quality of scholarly activity.
• Modern statistical inference demands that students develop skills to efficiently implement data analysis methods. These include numerical methods, symbol manipulation, and methods for complicated data objects. Students in intermediate and advanced statistics courses need to learn best practices for building, maintaining and evolving algorithms. This is essential for big data research, where the volume of data requires development of high throughput or high performance tools.

There is currently a disconnect between the forefront “big data” problems that drive research across campus and the small data problems taught in the classroom. We propose to fill this gap with major educational innovations, including flexible modularity in course design, to import modern research methods into the classroom, and to provide opportunities for young researchers to become involved in quantitative research teams early in their academic experience.

Transforming UW-Madison

The Statistics Department is being asked to provide leadership on many fronts to transform the campus landscape, through increased demand for introductory and intermediate courses, requests for collaborations, and guidance on research data policy at all levels. We can only meet these external demands with increased resources, notably in faculty and staff lines, as recommended during the 2010 department review. This is in the broad interest of L&S and of UW-Madison as a whole.

The Biometry Program Consulting Facility and the BMI Comprehensive Cancer Shared Resource are two examples of George Box’s 1960 vision, demonstrating top quality communication of technical tasks across multiple biological disciplines, and resulting in highly productive research collaborations. Most statistics faculty and students are involved in interdisciplinary research across campus—with Jamie Thomson, David Schwarz, Paul Ahlquist, Richard Davidson, Miron Livny, Michael Ferris, and many others. Many more researchers across campus need our expertise in design and analysis; their demands already exceed available resources. Our Statistics Department is one of the very few places on campus (and the nation) where graduate students receive formal instruction and credentials in the art and science of collaboration with researchers from other disciplines. We want to extend Box vision in today’s information tsunami, and we welcome new opportunities, backed by resources, to revolutionize how big data scholarship evolves at UW-Madison.

As an example of the scope of the problem, consider the explosion of genomic data, with expectations of petabytes and exabytes in the near future (Schadt et al. 2010, Trelles et al. 2011). Unfortunately, methods to interpret these data are at a very early stage. Tragically, poorly designed studies yield vast but useless data, resulting in much lost time and scarce research funds. The Economist, New York Times, Science and Nature Medicine, as well as CBS 60 Minutes, covered the unraveling of a widely heralded genomics-based cancer therapy in 2011-12: “Medical researchers see the story as a call to action. With such huge data sets and complicated analyses, researchers can no longer trust their hunches that a result does–or does not–make sense”. The data problems were detected and published two years earlier by statisticians Baggerly and Coombes at MD Anderson Cancer Center, but it took evidence of fraud to bring the case to public attention.

Statistics Department Mission

The ongoing mission of the Department of Statistics has four objectives:
Research: To meet continuing advances in science and technology, (including evolving challenges of big data), develop new statistical theory, concepts, experimental designs, methodology, and computational tools to improve understanding of complex processes through data reduction, analysis, interpretation, and inference.
Instruction: To train BS, MS, and PhD statistics students for careers in academia, industry and government; to teach students in other degree programs the essential statistical concepts and tools for their research; to train non-traditional students in data analytic concepts and methods.
Quantitative Reasoning: To educate all students on campus about quantitative reasoning with data; to address fears of statistics by building the confidence to tackle data-rich problems with minimal theory.
Collaboration: To provide and promote statistical collaboration in a multitude of disciplines, within as well as outside the university, through collaborative research, consulting, and training; to foster a climate of mutual respect among quantitative and discipline-focused researchers.

Data Sciences at UW-Madison

“Computer Science has historically been strong on data structures…. Statistics has historically been … strong on inference from data. [We must] draw on the strengths of both disciplines.” Michael Jordan, Professor at UC-Berkeley Statistics & EECS, in 2010 UW lecture).

Statistics is one of the data sciences, with the unique role of studying how to infer from specific data to a more general setting, taking into account the randomness and uncertainty inherent in data. UW-Madison has six other data science programs, all building on the foundations of Mathematics: Computer Science (L&S), Biostatistics & Medical Informatics (SMPH), Electrical & Computer Engineering (CoE), Industrial & Systems Engineering (CoE), Operations and Information Management (Business School) and Library and Information Studies (SLIS). In particular, Statistics and Computer Science have had a long history of collaboration and mutual respect at UW-Madison, with both having joint faculty in BMI.

Statistics and Big Data

Historically, almost all statisticians have shared a similar core education. Unlike fields such as history, where a specialist in Russian history might not fill a need in colonial Latin American history, statistics enables broadly trained individuals to move easily into a number of areas. Our department has had the tradition of hiring the best individual within broad boundaries, and this has been largely successful. Overall, the Statistics Department has provided internationally regarded excellence that balances theory and applications. However, this approach needs to be modified.

Specialization in training of new PhD graduates has increased as the discipline has grown. While we will always emphasize the need to hire from the pool of strongest candidates, changes in the discipline encourage addressing particular needs, notably in the area of big data. As noted in our 2002 vision statement, we expect a majority of future hires to involve heavy computation and attention to large data problems. Recent hires reflect this trend. Thus, a revised hiring paradigm is to hire the individual who can make the most impact in modern statistics, and who can do that in ways that strengthen education and collaboration on campus and globally. A 2008 conference on teaching statistics emphasized important shifts in the field of statistics in terms of training:

Nolan and Temple Lange (2010) argued, “The nature of statistics is changing significantly with many opportunities to…impact…science and policy…. Computational literacy and programming are as fundamental to statistical practice and research as mathematics.” Students should gain “the ability to reason about computational resources, work with large datasets, and perform computationally intensive tasks” of “statistical inquiry…working with data.” “Students need to gain explicit experience with…programming language concepts ....”

Data skills are critical for modern statisticians, and central to our instructional and research programs. Statisticians do not simply use computers as tools; rather, we derive and implement new data science methods. This requires programming skills. Indeed, the statistical methods and computational tools needed for tomorrow’s data do not exist today; they barely exist for today’s data. Thus, students need deeper skills valuable to conducting research and developing new data reduction and analysis methods.

Consequences of Not Hiring in Statistics

The consequences of not hiring new staff and faculty in statistics are extensive. The campus cannot maintain the quality of instruction for undergraduates in the design of experiments and the analysis of data using appropriate data models and analytic methods—the core subjects of statistics—without having more long-term staff who are engaged in state-of-the-art statistics research. On a broader scale, many universities are increasingly recognizing the need to examine data in context of its source—precisely the domain of statistics—as a central component of training for all students.

Statistics faculty and graduate students engage in collaborative research across campus involving undergraduates, graduates, post-docs, staff and faculty. How many hundreds of millions of dollars in grants have come to UW-Madison through the involvement of statistics faculty? Currently $1.5M per year are processed directly through the Statistics Department. External funding of biostatistics and medical informatics faculty in BMI related to Institute for Clinical and Translational Research (ICTR) and other collaborations across campus brings in $1.2M per year. Statistics faculty in the Biometry Program are PIs or co-PIs on roughly $1.7M of extramural annual federal funding. Many other researchers on campus owe their grant success to collaborations with statisticians.

The statistics faculty are stretched to the limit, and have been for many years. The demand for statistics collaboration is growing at a tremendous pace, and is leading to a new splintering of statistics leadership on campus, which is oddly similar to the situation encountered by Box upon his arrival in 1960. In short, the health of the Statistics Department is a campus-level concern. While directly administered by L&S, Statistics has involvement and needs critical thinking and leadership from schools and colleges across the campus. For example. the Department of Biostatistics and Medical Informatics depends on a strong Statistics Department for training programs and instruction as well as fostering a synergy between faculty for methodology research. How will UW-Madison respond to this challenge? How will it meet this need? We offer this strategic plan as a roadmap for meeting these exciting new challenges. Details are in separate appendices concerning instruction, research and a hiring plan.

Return to Strategic Plans & Department Reviews.