ISI TALK - Dec, 19, 2001
ABSTRACT
Title: Grand problems in biocomputing, 2002-2010
There will be a significant shift of interest to biocomputing during
the coming decade. In this talk, I identify the features of these "grand
problems" of biocomputing -- too large to solve on desktops and having
features suitable to new distributed hardware, software, and networking
technologies. Four specific biocomputing "tracks" for future
development are then addressed: (1)integrating evolutionary and
bioengineering data (2) structure determination through evolutionary analysis
(3) analysis of complex gene regulation systems (with the goal of simulating
cells and tissues), and (4) the "Unfinished Revolution" of
human-centric computing in health environments. For each of these potential
long-term biocomputing paths, a specific problem is discussed in greater
detail. Finally, the role of effectively packaging/marketing biocomputing
developments is considered.
Basic outline - (full presentation with images at talk)
Summary
- Why biocomputing will expand relative to other computing realms
- Features of the “grand problems” (GP) of biocomputing
- 4 specific “grand problems” in genomics and proteomics
- Examples where new hardware & software solutions could prove useful
Background and interests
- PhD. in Biophysics & Theoretical Biology at U. Chicago (1984)
- Vaccine R & D at USAMRIID - "The Hot Zone" (1984 - 1988)
- UCLA Molecular & Cell Biology - protein structure/evolution (1988 - 1993)
- Internet-based technology & strategic marketing indiespace.com (1994 - 2001)
Strengths
- Seeing the “big picture”
- Associative connection across disciplines
- Knowledge of the 4 information processing systems on the planet (language,
cells, immune systems, computers)
Features of "grand problems" (GP) in biology
- Too big for desktop computers
- Data naturally fits multiprocessor/networked computing paradigms
- Standard analysis is faltering
- May require exotic H/S for effective solution
'Follow the money'
- Dotcoms have cashed in their chips
- Long-term shift to life science in US
- Aging Boomer population will push health and “alternative” solutions
- 80-year shift point in economics
The 'forbidden top' and 'trivial bottom'
- Structural analysis of some biomolecules is well-understood (e.g. genome
assembly)
- Expanding or improving networks doesn’t break new ground (e.g., wireless)
- Molecular “first principles” solution to 3-D structures (e.g. proteins)
is beyond current computing technology
The "grand problem" middle ground
- Develop strategies for analyzing the uniquely biological features of the
system.
- Look for problems that need different hardware/software approaches
- Look for problems which imply the "convergence" of biology and
computing (e.g. two of the three information-processing systems on the
planet) Cell computers, evolutionary programming
4 GPS for the 2002-2010 timeframe
- Evolvability versus engineerability/computability
- Structure determination via combined evolutionary/engineering analysis
- Models of complex cell regulatory systems
- Human-centric computing - a shift in computing metaphors
GP#1: Evolvability versus Engineerability
- Why don’t Nature & engineers take the same design path?
- Engineerability = modification via genetic engineering
- Evolvability = modification during evolutionary history
- Nature is not "clumsy" relative to human engineering
GP#1: Can we predict protein structure by combining engineering
& evolution?
- Prediction of 3-D protein structure from sequence & “first principles” has
hit a wall
- Evolution has already rebuilt the same proteins many times, from different
parts
- Protein engineers mutate protein sequences to change structure and function
GP#2: How is the genome organized?
- 1000s of complete genomes will be available by 2010
- Tremendous sequence variability masks similarity of operation
- Determining exact location of structural genes & regulatory components essential
for building gene expression models
GP#3: Modeling of gene regulatory networks
- Gene expression networks = modeling information processing in portions of
a cell
- Gene expression involves hundreds of cell components with multiple, loose
interactions
- Empirical “gene chip” & "protein chip" data provides window
into time-dependent expression of some components (many hidden)
GP#4: The Unfinished Revolution - can we develop human-centric
computing?
- Airport toilets smarter than Deep Blue
- Current computer systems require us to enter a virtual world in order to
work
- Doctors, etc. are unwilling to go farther into cyberspace
Role of exotic hardware/software
- Most GPs are easily decomposed to parallel or networked solutions
- Hardware that directly embodies a biological process enjoys greater success
(UPenn/Boahen’s vision chip)
- Evolutionary programming may duplicate information processing in biology
Specific biocomputing GP tracks
- Combine evolutionary and engineering data to predict protein structure
- Determine genome structure by evolutionary analysis
- Deduce gene expression networks from “gene chip” data
- Introduce a “human-centric” computing model into healthcare environments
GP Track I - Predict protein structure
& function from primary sequence
- Optimization/”hill climbing” models based on chemistry have failed to fold
proteins
- Mass crystalization studies (Eisenberg) will provide 1000s of 3-D protein
structures by 2005
- Genome sequencing will provide an even larger (20-100,000) sequence database
of proteins
GP Track I - Biocomputing strategy
- Align primary protein sequences according to evolutionary trees
- Align 3-D protein structures in “structure space”
- Align additional protein sequences without solved 3-D structures
- Combine aligned datasets into predictive models
GP Track II - Determine genome
structure via evolutionary analysis
- Comparative anatomy helped to determine the function of organs in classical
biology
- Molecular comparative anatomy may reveal the underlying structure of genome
organization and regulatory systems
- 1000s of complete genome sequences will be available by 2010
GP Track II - Biocomputing strategy
- Create database for handling 1000s of gigabit-sized genome sequences
- Simultaneous cross analysis of gigabit genomes (super-clustering, dynamic
pg.)
- Use sequence annotations in alignment via rule-based systems
- Output families of models rather than one optimal model
GP Track III -Determine gene expression
from "gene chip/protein chip" data
- New “gene chips” reveal time-dependent expression of genes & their products
- Data potentially provides guide to structure of gene expression networks
GP Track III - Biocomputing strategy
- Develop databases to handle expression data for different organisms, conditions
(data warehousing)
- Develop techniques for normalizing data for cross-comparison between events,
species
- Use expression data to develop a “model space” of allowed networks fitting
the data
GP Track IV - Introduce
"human centric" computing model into healthcare environments
- Current attempts to implement wireless networks, databases in healthcare
meets with resistance
- Systems need to fit the way healthcare professionals think, rather than
bending their minds to the will of cyberspace
GP Track IV - Standard solutions are inappropriate
- Standard computing solutions ignore “biological” component
- Biocomputing needs different approach Doctor PDA/case notes example
GP Track IV - Biocomputing strategy
- Develop realtime OS systems that actively sense same environment as people
(robots)
- Collect & recover information using associative models
- Present data as an advisor, rather than an oracle
- Convert existing literature to truly interactive systems (e.g. textbooks)
A few final thoughts
- Don't underestimate the importance of "dramatic" experiments (marketing
value)
- Don't underestimate the importance of "style" and design (Japanese
vs. US robots)