Biology of Aging
From digitalHPS
Contents |
Science Informatics Group (SIG), aka Biology of Aging
The Biology of Aging portal (BoA), funded by the Ellison Medical Foundation and hosted at the MBL/WHOI Library, aims to put knowledge and issues surrounding aging and lifespan in front of a world-wide audience ranging from school children to expert scientists. Through interactions with the Encyclopedia of Life (EOL), BoA will enable the capability to develop comparative biology hypotheses within the context of development, aging, and disease, thus accelerating the pace of understanding and discovery. By revealing similarities or differences across entire spectra of life, BoA may help specialists in aging research identify additional target genes or species for original research, which might further the development of vaccines and potentially lead to prophylactic therapies or cures for age-related conditions. Our goal is to create tools to help scientists, students, and everyone learn more about the biology of aging viewed across all species on Earth.
Applications
Ligercat (http://ligercat.ubio.org/)
LigerCat it is a tool that allows you to search PubMed using words or even a DNA/protein sequence. The articles retrieved are processed to create a tag cloud showing an overview of important concepts and trends. The Medical Subject Headings or MeSH descriptors are combined and weighted by frequency (more frequent terms have a larger font size) to create the cloud. Clicking on one or more MeSH descriptors in the tag cloud adds it to the box on the right of the cloud and LigerCat searched PubMed for those terms instantly. Selecting more than one term finds articles tagged with all selected MeSH descriptors.
New to Ligercat 2.1 is the creation of histograms of numbers of articles published per year for a given article search. These publication history graphs are interactive.
Ontospecies (http://ontospecies.ubio.org/)
Ontospecies is a tool to traverse the tree of life and view longevity information of a particular taxonomic group. Searching can be confined to certain lifespan longevities or further restricted via various gene ontologies or location ontologies.
uBio Universal Biological Indexer and Organizer (http://www.ubio.org)
uBio is a set of applications and web services to create and utilize a comprehensive and collaborative catalog of known names of all living (and once-living) organisms. The Taxonomic Name Server (TNS) catalogs names and classifications to enable tools that can help users find information on living things using any of the names that may be related to an organism.
uBio encompasses the following:
- uBioRSS monitors hundreds of RSS feeds from academic journals, science news, and other sources. It parses any scientific name from these feeds and presents them via classifications for browsing and specific RSS feeds.
- uBioPortal is a set of applications and modules that acts as a meta-search engine linking taxonomic intelligence with content served by search engines and expert sources. By indexing the names within expert sites we can redirect users to these sites when a search intersects a relevant taxonomic domain.
- Nomenclator Zoologicus is a catalog of all zoological genera. Nomenclator Zoologicus is a continuous record of the bibliographical origins of the names of every genus and subgenus in zoology published since the 10th ed. of Linnaeus' Systema Naturae in 1758 up to 1994 in nine volumes. Names are listed alphabetically, with a bibliographic reference to the original description of each one and an indication of the animal group to which it belongs. There are an estimated 340,000 genera represented in the text as well as approximately 3000 supplemental corrections.
- Catalog of Living Whales is a Smithsonian bulletin represented here from classificationBank taxon concepts.
- ParseIT - This tool accepts a complex scientific name and breaks it into it's component parts. This tool is useful for identifying different forms of the same name, especially in combination with the author abbreviation service and findiT SOAP.
- CanonizeIT - This function complements the parser by deconstructing a scientific name into is canonical (or simplest) form.
- X:ID is identification key software for the creation and display of interactive taxonomic keys. It is written entirely in OpenSource code and is XML-based. It uses XSLT to allow provide multiple display options for a given key.
- CompareIT takes a URL or a list of names as input and compares the taxon names with a current taxonomy such as Species 2000 or ITIS and reports on the current status of the name and other metrics. URL input is first piped to FindIt so a web page with names can be checked against a current taxonomy.
- TaxonFinder identifies scientific names in free text. Open Source repository at google code project page.
- Webservices - Currently SOAP + Restful webservices, including 2nd unreleased version
Taxatoy (http://taxatoy.ubio.org)
Taxatoy is a web application that allows one to visualize the number of species discoveries for different taxa for a user-specified time range. Data is culled from published articles and searches are downloadable in Excel format and graphs are also downloadable as images.
Cell Image Library (http://cellimagelibrary.org/)
Working with the American Society for Cell Biology, the SIG is developing the front-end graphical user interface (GUI) that will handle the image uploads, searching, and presentation of images. The goal of the Cell Image Library is to create a repository for cells from a variety of organisms. It will demonstrate cellular architecture and functions with high quality images, videos, and animations. The comprehensive and easily accessible Library under development will be a public resource first and foremost for research but also a tool for education. The long-term goal is the construction of a library of images that will serve as primary data for research.
In addition to providing the repository infrastructure for the Library, the SIG is also helping with the systematic protocol for acquisition, evaluation, annotation, and uploading of images, videos, and animations.
Taxonomic Names Processing (http://code.google.com/p/taxon-name-processing/)
SIG is involved in developing applications and services for finding, parsing, and processing taxonomic names.
BHL Mirroring
The SIG (and MBLWHOI Library) is mirroring the entire Biodiversity Heritage Library's content, which will be stored locally in Woods Hole for both a backup of the content as well as a repository on which to do local development. The infrastructure for this immense task of transferring all the data via the internet (over 70 TB) was done using commodity hardware and open-source software at a fraction of the price of other large-scale data mirroring projects.
Infrastructure
Virtualization
Virtualization provides an abstraction layer, or "Hypervisor" between an operating system and its underlying hardware. This technology provides BoA with the following benefits:
- Consolidation: Applications which require independent environments traditionally required independent hardware, even if their resource utilization on this hardware was low. Virtualization allows us to consolidate a number of these applications and operating systems on a single physical host machine, therefore effectively using the resources of the physical machine and cutting server and data center running costs.
- Deployment time: We have a number of pre-built templates for BoA machines, this enables us to rapidly bring servers online for both development and production. Typically the time to bring a new server online was anywhere from 10minutes to 30minutes+. With virtualization we are able to bring new servers online in less than 5 minutes.
- Scaling: Some applications are not able to take advantage of the multiple cores available on the current generation of server processes. On an 8-core machine, we are able to now create 8 instances of that application and dedicate one core to each, significantly increasing the application speed by harnessing the full resources of the host machine.
- Redundancy and backup: Virtual machines can be stored on disk and deployed quickly to multiple hosts or be migrated between hosts. This reduces the requirement on physical hardware, with downtime no longer meaning long outage for applications.
BoA are currently using XenServer, a product of Citrix Systems. This product uses the open source Xen hypervisor which provides "paravirtualized" virtual machines. Paravirtualization is a technology whereby the operating system which is being virtualized is modified to take advantage of faster communication with the hypervisor and the physical host's hardware. Combining this technology with the ability to create logical volumes from physical hard drives and give virtual machines dedicated access to them, BoA are able to take advantage of the benefits of virtualization while keeping any resource overhead generated by virtualization negligible.
In practice, virtualization allows BoA to keep a higher standard of uptime, to respond to new server and application requirements faster than ever before and to effectively utilize our available hardware to its maximum capacity.
Future work in virtualization at BoA will likely involve a move to a shared storage environment, further increasing uptime and providing improved load balancing of servers.
Cloud Processing
Cloud Processing, used in conjunction with virtualization is used by BoA to perform large calculations and other tasks over a network of scalable, virtualized hosts. Individual hosts can be scaled in terms of processors and RAM with minimal downtime, or additional processing machines can be brought online with no downtime.
This allows BoA to scale up processing operations to maximize the available resources, while keeping it simple enough to disable machines, should the need to re-prioritize resources arise. Using technologies such as message queueing, the cloud environment allows us to dedicate any number of machines to a task, horizontally scaling the processing power up or down with no impact on the underlying application or task.
Tools
- Text-mining and natural language processing (NLP)
- Ruby on Rails framework
- Virtualization setup
- Web services
- Web design