"Cornell University and Tel Aviv University researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences."
""This is the first time an unsupervised algorithm is shown capable of learning complex syntax, generating grammatical new sentences and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics," he said."
ScienceDaily Researchers Develop Computer Application To 'Read' Medical Literature, Find Data Relationships
"Until recently, researchers and their assistants spent countless hours poring over seemingly endless volumes of journals and scientific literature for information pertinent to their studies in fields such as cancer, AIDS, pediatrics and cardiology.
But thanks to new software developed by bioinformatics researchers at UT Southwestern Medical Center at Dallas, scientists can now easily identify obscure commonalities in research data and directly relate them to their studies, saving money and speeding the process of discovery."redux [12.05.03]
eWeek Analytics Tools Mine Text
"With predictive analytics at the core of many marketing initiatives, software developers are adding predictive capabilities to text mining so that enterprises can act on information in unstructured as well as structured data."
"The University of Pennsylvania's Abramson Cancer Center is using Clementine and LexiQuest in combination to better analyze years' worth of textual data to gain new insights into cancer diagnosis and treatment. "We took 10 years of [medical] journal articles and mined them for terms and syntax," said Michael Liebman, the center's director of computational biology and biomedical informatics, in Philadelphia. That analysis helps the center find patterns in the data showing relationships among diseases, symptoms, treatment and other factors, Liebman said, patterns that would otherwise have to be subjectively discerned by researchers poring over the data."redux [10.27.03]
Bio-ITWorld Digging Into Digital Quarries
"SO YOU THINK you've found a cure for psoriasis. But first you need to check 12 million journal article abstracts on the National Library of Medicine's PubMed database, piling up at the rate of 40,000 new citations a month from 4,600 journals. You missed the latest issue of the Journal of Investigative Dermatology? PubMed's Medical Subject Headings (MeSH) vocabulary might help -- a little. It has 300,000 synonyms for 19,000 basic medical terms. But if you type in "epidermopoiesis," a key concept in the MeSH entry for psoriasis, you will find ... nothing.
Can software put an end to tortured searching? Researchers and vendors say text mining in the life sciences is on the verge of a long-sought dream: distilling oceans of inchoate data into insights and hypotheses."redux [10.17.03]
The New York Times Digging for Nuggets of Wisdom
[requires 'free' registration]
"MICHAEL N. LIEBMAN knows his limitations. Even with a Ph.D. and a long career in medical research, he cannot keep up with all the developments in his area of interest, breast cancer. Medline, the database that already houses more than 10 million abstracts for journal articles, is adding 7,000 to 8,000 abstracts per week. Only a fraction of these are about cancer, but the volume of information is daunting nonetheless."
"Yet Dr. Liebman is convinced that new cures could someday emerge for breast cancer if only someone could read all the literature and synthesize it. So he has found a solution: enlisting a computer program to read the articles for him."redux [11.09.02]
Stanford Medical Informatics Preprint Archive Using Text Analysis to Identify Functionally Coherent Gene Groups
"The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how "functionally coherent" the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function."redux [10.08.01]
BioNLP.Org Natural language processing of biology text
"The literature of the field of biology is the largest of all the sciences. The volume of biology literature each year, measured in bytes, is about fifty times the size of the entire human genome, junk and all. But locked in this literature is an enormous amount of information that can tell us much about the structure and function of genes, proteins, cells and organisms -- how they work as well as how they can fail.
The newly emergent interest in natural language processing for biology has been christened "Information Extraction". But work in this area has been going on for many decades under different names and this site includes a good deal of information about past and current work in NLP and in information extraction for biology in particular."redux [04.30.01]
New Scientist Biologists in Norway use a computer program to "read" the scientific literature and successfully predict gene interactions
"Biologists in Norway have used a computer program to "read" the scientific literature and successfully predict gene interactions.
This data-mining of the "biobibliome" provides a way of dealing with the ever-increasing torrent of biological data - millions of papers a year. But even more impressively, the completely automated process can make new genetic discoveries - essentially free research."Stanford Medical Informatics Preprint Archive Improving Biological Literature Improves Homology Search
"Annotating the tremendous amount of sequence information being generated requires accurate automated methods for recognizing homology. Although sequence similarity is only one of many indicators of evolutionary homology, it is often the only one used. Here we find that supplementing sequence similarity with information from biomedical literature is successful in increasing the accuracy of homology search results. We modified the PSI-BLAST algorithm to use literature similarity in each iteration of its database search. The modified algorithm is evaluated and compared to standard PSI-BLAST in searching for homologous proteins. The performance of the modified algorithm achieved 32% recall with 95% precision, while the original one achieved 33% recall with 84% precision; the literature similarity requirement preserved the sensitive characteristic of the PSI-BLAST algorithm while improving the precision."MIT Technology Review Emerging Technologies That Will Change the World: Data Mining
"And the future of data-mining technology? Wide open, says Fayyad - especially as researchers begin to move beyond the field's original focus on highly structured, relational databases. One very hot area is "text data mining": extracting unexpected relationships from huge collections of free-form text documents. The results are still preliminary, as various labs experiment with natural-language processing, statistical word counts and other techniques. But the University of California at Berkeley's LINDI system, to take one example, has already been used to help geneticists search the biomedical literature and produce plausible hypotheses for the function of newly discovered genes."
“Bioinformatics will be at the core of biology in the 21st century. In fields ranging from structural biology to genomics to biomedical imaging, ready access to data and analytical tools are fundamentally changing the way investigators in the life sciences conduct research and approach problems. Complex, computationally intensive biological problems are now being addressed and promise to significantly advance our understanding of biology and medicine. No biological discipline will be unaffected by these technological breakthroughs.”BIOINFORMATICS IN THE 21st CENTURY
the panda's thumb
bioinformatics.org / nodalpoint / flags and lollipops / on genetics / a bioinformatics blog / andrew dalke / the struggling grad student / in the pipeline / gene expression / free association / pharyngula / the personal genome / genetics and public health blog / the medical informatics weblog / linuxmednews / nanodot / complexity digest /
nsu / nyt science / bbc scitech / newshub / biology news net /
informatics review / stanford / bmj info in practice / bmj info in practice /
look snazzy and support the site at the same time by buying some snowdeal schwag !
valid xhtml 1.0?
This site designed by
Eric C. Snowdeal III .