For my first post to Fireside Science, I would like to discuss some advances in scientific computing from a “blue sky” perspective. I will approach this exploration by talking about how computing can improve both the the modeling of our world and the analysis of data.
Better Science Through Computation
The traditional model of science has been (ideally) an interplay between theory and data. With the rise of high-fidelity and high-performance computing, however, simulation and data analysis has become a critical component in this dialogue.
Simulation: controllable worlds, better science?
The need to deliver high-quality simulations of real-world scenarios in a controlled manner has many scientific benefits. Uses for these virtual environments include simulating hard-to-observe events (Supernovae or other events in stellar evolution) or provide highly-controlled environments for cognitive neuroscience experimentation (simulations relevant to human behavior).
A CAVE environment, being used for data visualization.
Virtual environments that achieve high levels of realism and customizability are rapidly becoming an integral asset to experimental science. Not only can stimuli be presented in a controlled manner, but all aspects of the environment (and even human interactions with the environment) can be quantified and tracked. This allows for three main improvements on the practice of science (discussed in greater detail in ):
1) Better ecological validity. In psychology and other experimental sciences, high ecological validity allows for the results of a given experiment to be generalized across contexts. High ecological validity results from environments which do not differ greatly from conditions found in the real-world.
Modern virtual settings allow for high degrees of environmental complexity to be replicated in a way that does not impede normal patterns of interaction. Modern virtual worlds allows for interaction using gaze, touch, and other means often used in the real-world. Contrast this with a 1980s era video game: we have come a long way since crude interactions with 8-bit characters using a joystick. And it will only get better in the future.
2) The customization of environmental variables. While behavioral and biological scientists often talk about the effects of environment, these effects must often remain qualitative (or at best crudely quantitative). With virtual environments, environmental variables be added, subtracted, and manipulated in a controlled fashion.
Not only can the presence/absence and intensities of these variables be directly measured, but the interactions between virtual environment objects and an individual (e.g. human or animal subject) can be directly detected and quantified as well.
3) Greater compatibility with big data and computational dynamics: The continuous tracking of all environmental and interaction information results in the immediate conversion of this information to computable form . This allows us to build more complete models of the complex processes underlying behavior or discover subtle patterns in the data.
Big Data Models
Once you have data, what do you do with it? That’s a question that many social scientists and biologists have traditionally taken for granted. With the concurrent rise of high-throughput data collection (e.g. next-gen sequencing) and high-performance computing (HPC), however, this is becoming an important issue for reconsideration. Here I will briefly highlight some recent developments in big data-related computing.
Big data can come from many sources. High-throughput experiments in biology (e.g. next-generation sequencing) is one such example. The internet and sensor networks also provide a source of large datasets. Big datasets and difficult problems  require computing resources that are many times more powerful than what is currently available to the casual computer user. Enter petabyte (or petascale) computing.
Most new laptop computers (circa 2013) are examples of gigabyte computing. These computers utilize 2 to 4 processors (often using only one at a time). Supercomputers such as the Blue Waters computer at UIUC have many more processors, and operate at the petabyte scale . Supercomputers such as IBM’s Roadrunner, had well over 10,000 processors. Some of the most powerful computers even run at the exascale (e.g. 1000x faster than petascale). The point of all this computing power is to perform many calculations quickly, as the complexity of a very large dataset can make its analysis impractical using small-scale devices.
Even using petascale machines, difficult problems (such as drug discovery or very-large phylogenetic analyses) can take an unreasonable amount of time when run serially. So increasingly, scientists are also using parallel computing as a strategy for analyzing and processing big data. Parallel computing involves dividing up the task of computation amongst multiple processors so as to reduce the overall amount of compute time. This requires specialized hardware and advances in software, as the algorithms and tools designed for small-scale computing (e.g. analyses done on a laptop) are often inadequate to take full advantage of the parallel processing that supercomputers enable.
Media-based Computation and Natural Systems Lab
This is an idea I presented to a Social Simulation conference (hosted in Second Life) back in 2007. The idea involves building a virtual world that would be accessible to people from around the world. Experiments could then be conducted through the use of virtual models, avatars, secondary data, and data capture interfaces (e.g. motion sensors, physiological state sensors).
The CNS Lab (as proposed) features two components related to experiments not easily done in the real-world . This is an extension of virtual environments to a domain that is relatively unexplored using virtual environments: the interface between the biological world and the virtual world. With increasingly sophisticated I/O devices and increases in computational power, we might be able to simulate and replicate the black box of physiological processes and the hard-to-observe process of long-term phenotypic adaptation.
Component #1: A real-time experiment demonstrating the effect of extreme environments on the human body.
This would be a simulation to demonstrate and understand the limits of human physiological capacity usually observed in limited contexts . In the virtual world, an avatar would enter a long tube or tank, the depth of which would serve as a environmental gradient. As the avatar moves deeper into the length of the tube, several parameters representing variables such as atmospheric pressure, temperature, and medium would increase or decrease accordingly.
There should also be ways to map individual-level variation to the avatar in order to provide some connection between the participant and the simulation of human physiology. Because this experience is distributed on the internet (originally proposed as a Second Life application) a variety of individuals could experience and participate in an experiment once limited to a physiology laboratory.
Examples of deep-sea fishes (from top): Barreleye (Macropinna microstoma), Fangtooth (Anoplogaster cornuta), Frilled Shark (Chlamydoselachus anguineus). COURTESY: National Geographic and Monterey Bay Aquarium.
Component #2: An exploration of deep sea fish anatomy and physiology.
Deep sea fishes are used as an example of organisms that adapted to deep sea environments that may have evolved from ancestral forms originating in shallow, coastal environments . The object of this simulation is to observe a “population” change over from ancestral pelagic fishes to derived deep sea fishes as environmental parameters within the tank change. The participant will be able to watch evolution “in progress” through a time-elapsed overview of fish phylogeny.
This would be an opportunity to observe adaptation as it happens, in a way not necessarily possible in real-world experimentation. The key components of the simulation would be: 1) time-elapsed morphological change and 2) the ability to examine a virtual model of the morphology before and after adaptation. While these capabilities would be largely (and in some cases wholly) inferential, it would provide an interactive means to better appreciate the effects of macroevolution.
A highly stylized (e.g. scala naturae) view of improving techniques in human discovery, culminating in computing.
NOTES: These journal covers are in reference to the following articles: Science cover, Bainbridge, W.S. The Scientific Research Potential of Virtual Worlds. Science, 317, 412 (2007). Nature Reviews Neuroscience cover, Bohil, C., Alicea, B., and Biocca, F. Virtual Reality in Neuroscience Research and Therapy. Nature Reviews Neuroscience, 12, 752-762 (2011).  Raw numeric data, measurement indices, and, ultimately, zeros and ones.  Garcia-Risueno, P. and Ibanez, P.E. A review of High Performance Computing foundations for scientists. arXiv, 1205.5177 (2012).
For a very basic introduction to big data, please see: Mayer-Schonberger, V. and Cukier, K. Big Data: a revolution that will transform how we live, work, and think. Eamon Dolan (2013). Hemsoth, N. Inside the National Petascale Computing Facility. HPCWire blog, May 12 (2011).  Alicea, B. Reverse Distributed Computing: doing science experiments in Second Life. European Social Simulation Association/Artificial Life Group (2007).
For an example of how human adaptability in extreme environments has traditionally been quantified, please see: LeScanff, C., Larue, J., and Rosnet, E. How to measure human adaptation in extreme environments: the case of Antarctic wintering-over. Aviation, Space, and Environmental Medicine, 68(12), 1144-1149 (1997).