snowdeal.org > {bio,medical}informatics

"XEMBL is all about bringing EMBL Nucleotide Sequence data to our users in a variety of formats. The publically available EMBL/GenBank/DDBJ data is kept at the EBI in an Oracle database, from which flatfiles are created at every release for the purpose of distribution. As you might be aware of, flat-files have severe limitations, and we have been asked various times if we are going to distribute the EMBL data in different formats as well, XML being the one most prominently mentioned. In short, the XEMBL project will bring to the user several alternative formats of EMBL data."

"DoubleTwist, Inc. today announced that the genomic annotation XML format used to create its annotated human genome database is now freely available as an open standard to the life sciences community. AGAVE (Architecture for Genomic Annotation, Visualization and Exchange) allows users to manage, visualize and share annotations of genomic sequences using the document type definition (DTD) and associated tools available through www.agavexml.org."

"AGAVE was originally developed as part of DoubleTwist's bioinformatics architecture for high-throughput analysis of the human genome, which relies heavily on XML and Java technologies and tools. Central to AGAVE is a Java Object Model and a corresponding XML Document Type Definition (DTD) that facilitate data exchange, data integration and data transformation between components."

"The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application. Our experiences also point to some practical tips on how updates should be published by the community, and how XML can be used to facilitate the processing of updates in a warehousing environment."

"To realize the full potential of biological databases requires more than the interactive, hypertext flavor of database interoperation that is now so popular in the bioinformatics community. Interoperation based on declarative queries to multiple network-accessible databases will support analyses and investigations that are orders of magnitude faster and more powerful than what can be accomplished through interactive navigation. I present a vision of the capabilities that a query-based interoperation infrastructure should provide, and identify assumptions behind, and requirements of, this vision. I then propose an architecture for query-based interoperation that identifies a number of novel components of an information infrastructure for molecular biology. Those components include: A knowledge base that describes relationships among the conceptualizations used in different biological databases; a module that can determine what known DBs are relevant to a particular query; a module that can translate a query, or the results of a query, from one conceptualization to another; a family of DB drivers that provide uniform physical access to different DBMSs; a family of translators that can interconvert among different database schema languages; and a database that describes the network location and access methods for biological databases. A number of the components are translators because biological databases exhibit heterogeneity at several different levels, including the conceptual level, the data model, the query language, and data formats."

{bio,medical} informatics

[ rhetoric ]

[ search ]

[ outbound ]

[ schwag ]

[ et cetera ]