Tr 01-05 appendices

Appendices
Appendix 1 – Description of the data collection process

PIPELINE 1
Flow Diagram


This pipeline uses PUBMED search (unspecified) engine, which is a part of NCBI’s
Entrez System to retrieve all relevant citations for a given keyword. Step 1: Retrieving all PubMed Ids from PubMed for each keyword by using the The program used is PubmedIDs.java; it takes a .txt file as parameter (the .txt file should contain keywords for each disease.) and returns a directory, which contains .txt files for each keyword and a summary.txt. These .txt files contain keyword and the related PubMed Ids. The Call used to get the PubMed Ids is given below http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db="+dbName+"&term="+ keyWord+"&datetype=edat&retmax="+retMax+"&usehistory=y

This call takes in three parameters database name, keyword, and maximum number
to be retrieved. The database name here is pubmed, the keyword values are taken from the input file. The value of retMax is obtained by executing the above URL for each keyword and storing the count value from the xml output. The default values are date and usehistory, which have the values of “edate” and “y” The Total Records obtained and total time taken to retrieve data for each disease Step2: Removing duplicates from the collected PubMed Ids. Implementation of Step2: The program used is CreateUnique.java; it takes in the .txt file as parameter (the .txt file contains PubMed Ids for all the keywords.) and returns a .txt file, which has unique set of PubMed Ids. The Total Records obtained and total time taken to create

PIPELINE 2

This pipeline uses PUBMED search (unspecified) engine, which is a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Retrieving all PubMed Ids from PubMed for all keywords at once. This would give unique set of PubMed Ids for all the keywords. Implementation of Step 1: The program used is PubmedIDs1.java; it takes a .txt file as parameter (the .txt file should be modified to contain keywords for each disease. All White Spaces should be replaced with “%20” and all Paragraph Marks should be replaces with “+or+”) and returns a directory, which contains UniquePubmed.txt and summary.txt. The Call used to get the PubMed Ids is given below http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db="+dbName+"&term="+ keyWord+"&datetype=edat&retmax="+retMax+"&usehistory=y

This call takes in three parameters database name, keyword, and maximum number
to be retrieved. The database name here is pubmed, the keyword value is taken from the input file. The value of retMax is obtained by executing the above URL for each keyword and storing the count value from the xml output. The default values are date and usehistory, which have the values of “edate” and “y” The Total Records obtained and total time taken to retrieve data for each disease

PIPELINE 3

This pipeline uses OMIM and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM Ids for that keyword. We can obtain OMIM Ids in two ways. One is by querying OMIM one keyword at a time (this process involves duplicates) and the other by sending all keywords as a single keyword to OMIM Implementation of Step 1: First Method The program used is OmimIds.java (this program removes duplicates by using CreateUnique.java); it takes a .txt file as parameter (the .txt file should contain keywords for each disease.) and returns three files two .txt files and one summary.txt file. The first .txt file contains keywords and their related OMIM Ids. Second .txt file contains the unique set of OMIM Ids. Second Method The program used is OmPubPipeline.java; it takes a unique_omim_***.txt file as input (this file should contain the unique set of OMIM ids) and returns an omim_***_link.txt along with other .txt files. This omim_***_link.txt file contains the OMIM ID and its related PubMed Ids. The call used for this purpose is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes two parameters dbfrom and id. The dbfrom is set as “omim”, db as “pubmed” and id would get values from the input file. First Method (Before and After Removing Duplicates) Step 2: Retrieve linked PUBMED citations for each OMIM entry obtained in Step 1 by The program used is OmPubPipeline.java; it takes a unique_omim_***.txt file as input (this file should contain the unique set of OMIM ids) and returns an omim_***_link.txt along with other .txt files. This omim_***_link.txt file contains the OMIM ID and its related PubMed Ids. The call used for this purpose is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes two parameters dbfrom and id. The dbfrom is set as “omim”, db as “pubmed” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease PIPELINE 4
This pipeline uses OMIM and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM Ids for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract linked PUBMED citations for each OMIM entry obtained in Step 1 by Implementation of Step2: The program used is OmPubPipeline.java; it takes a unique_omim_***.txt file as input (this file should contain the unique set of OMIM ids) and returns an omim_***_parse.txt along with other .txt files. This omim_***_parse.txt file contains the OMIM ID and its related PubMed Ids. The call used for this purpose is http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=omim&dopt=x ml&tmpl=dispomimTemplate&list_uids="+keyWord This call takes one parameter id. The id would get values from the input file. The default values are db, dopt, tmpl, and cmd, which have the values of “omim”,”xml” and The Total Records obtained and total time taken to retrieve data for each disease
PIPELINE 5
This pipeline uses OMIM and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM Ids for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those Step 2: Retrieve linked PUBMED citations for all OMIM entries obtained in Step 1 by sending all entries as a single keyword to OMIM. Implementation of Step2: The program used is OmPubPipeline.java; it takes a *.txt file as input (this file should contain the unique set of OMIM ids. It has to be modified such that there are no greater than 1,161 records in each line. OMIM would not accept more than 1,161 records at one time. These records should be separated by “,’”) and returns an omim_***_all.txt. This omim_***_all.txt file contains the PubMed Ids. The call used http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubmed&id="+queryStr This call takes three parameters dbfrom, db and id. The dbfrom is set as “omim”, db as “pubmed” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease condition using this call is
PIPELINE6
Flow Diagram
NUCLEOTIDE
This pipeline uses OMIM, NUCLEOTIDE and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline Step 2: Extract NUCLEOTIDE links for each OMIM entry obtained above by using Nucleotide links option Implementation of Step 2 The program used is OmNuPipeline.java; it takes a unique_omim_***.txt file as input (this file should contain the unique set of OMIM ids) and returns an omim_nucleotide.txt along with omim_nucleotide_unique .txt files (this is created by using CreateUnique.java). This omim_ nucleotide_unique.txt file contains the unique NUCLEOTIDE ids for the OMIM ids collected in step 1.The call used for this purpose is http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=omim&db="+toDb+"&i This call takes two parameters db and id. The dbfrom is set as “omim”, db as “nucleotide” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease Step 3: Retrieve linked PUBMED citations for each NUCLEOTIDE entry obtained in Implementation of Step 3: The unique records obtained from step 2 are now taken as input and the program returns a file diabetes_nucleotide_link.txt. This ***_nucleotide_link.txt file contains the OMIM ID and its related PubMed Ids. The call used here is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes two parameters dbfrom and id. The dbfrom is set as “nucleotide”, db as “pubmed” and id would get values from the diabetes_nucleotide_unique.txt. The Total Records obtained and total time taken to retrieve data for each disease
PIPELINE7
NUCLEOTIDE
This pipeline uses OMIM, NUCLEOTIDE and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract NUCLEOTIDE links for each OMIM entry obtained above by using Nucleotide links option. We have the Nucleotide Ids from the Pipeline 6 step 2. Step 3:Extract linked PUBMED citations for each NUCLEOTIDE entry obtained in Step 2 by parsing down each Nucleotide record. Implementation of Step 3 The unique records obtained from step 3 are now taken as input and the program returns a file diabetes_nucleotide_parse.txt. This ***_nucleotide_parse.txt file contains the Keyword, Nucleotide Id and its related PubMed Ids. "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db="+toDb+"&list_u ids="+keyWord.substring(1)+"&dopt=xml&term=&qty=1" This call takes two parameters “db” and “list_uids”. The list_uids would get values from the input file and db would be nucleotide. The default values are dopt, tmpl, and cmd, which have the values of “nucleotide”, ”xml” and “retrieve”. The Total Records obtained and total time taken to retrieve data for each disease
PIPELINE8
NUCLEOTIDE
This pipeline uses OMIM, NUCLEOTIDE and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract NUCLEOTIDE links for each OMIM entry obtained above by using Nucleotide links option. We have the Nucleotide Ids from the Pipeline 6 step 2. Step 3:Extract linked PUBMED citations for all NUCLEOTIDE entries obtained in Step The program used is OmNuPipeline.java; it takes a *.txt file as input (this file should contain the unique set of OMIM ids. It has to be modified such that there are no greater than 600 records in each line. NUCLEOTIDE would not accept more than 600 records at one time. These records should be separated by “,’”) and returns an omim_***_all.txt. This omim_***_all.txt file contains the PubMed Ids. The call used for this purpose is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes three parameters dbfrom, db and id. The dbfrom is set as “nucleotide”, db as “pubmed” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease
PIPELINE9
Flow Diagram
This pipeline uses OMIM, PROTEIN and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract PROTEIN links for each OMIM entry obtained above by using Protein Implementation of Step 2 The program used is OmPrPipeline.java; it takes a unique_omim_***.txt file as input (this file should contain the unique set of OMIM ids) and returns an omim_protein.txt along with omim_protein_unique .txt files (this is created by using CreateUnique.java). This omim_ protein_unique.txt file contains the unique PROTEIN ids for the OMIM ids collected in step 1.The call used for this purpose is http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=omim&db="+toDb+"&id="+keyWord This call takes two parameters db and id. The dbfrom is set as “omim”, db as “protein” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease condition using this call is Step 3: Retrieve linked PUBMED citations for each PROTEIN entry obtained in Step 2 The unique records obtained from step 2 are now taken as input and the program returns a file diabetes_protein_link.txt. This ***_protein_link.txt file contains the OMIM ID and its related PubMed Ids. The call used here is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes two parameters dbfrom and id. The dbfrom is set as “protein”, db as “pubmed” and id would get values from the diabetes_protein_unique.txt. The Total Records obtained and total time taken to retrieve data for each disease condition using this call is

PIPELINE10
Flow Diagram
This pipeline uses OMIM, PROTEIN and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract PROTEIN links for each OMIM entry obtained above by using Protein links option. We have the Protein Ids from the Pipeline 6 step 2. Step 3:Extract linked PUBMED citations for each PROTEIN entry obtained in Step 2 by parsing down each Protein record. Implementation of Step 3 The unique records obtained from step 3 are now taken as input and the program returns a file diabetes_protein_parse.txt. This ***_protein_parse.txt file contains the Keyword, Protein Id and its related PubMed Ids. "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db="+toDb+"&list_u ids="+keyWord.substring(1)+"&dopt=xml&term=&qty=1" This call takes two parameters “db” and “list_uids”. The list_uids would get values from the input file and db would be protein. The default values are dopt, tmpl, and cmd, which have the values of “protein”, ”xml” and “retrieve”. The Total Records obtained and total time taken to retrieve data for each disease condition using this call is
PIPELINE11
This pipeline uses OMIM, PROTEIN and PUBMED search engines, which are a part of NCBI’s Entrez System to retrieve all relevant citations for a given keyword. Step 1: Use OMIM search engine to query a given keyword. We obtain a set of OMIM entries for that keyword. We already have the OMIM Ids from step 1 of pipeline 3.Those OMIM Ids are used. Step 2: Extract PROTEIN links for each OMIM entry obtained above by using Protein links option. We have the Protein Ids from the Pipeline 6 step 2. Step 3:Extract linked PUBMED citations for all PROTEIN entries obtained in Step 2. Implementation of Step 3: The program used is OmPrPipeline.java; it takes a *.txt file as input (this file should contain the unique set of OMIM ids. It has to be modified such that there are no greater than 600 records in each line. NUCLEOTIDE would not accept more than 600 records at one time. These records should be separated by “,’”) and returns an omim_***_all.txt. This omim_***_all.txt file contains the PubMed Ids. The call used for this purpose is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom="+toDb+"&db=pubme This call takes three parameters dbfrom, db and id. The dbfrom is set as “protein”, db as “pubmed” and id would get values from the input file. The Total Records obtained and total time taken to retrieve data for each disease condition using this call is

Appendix 2 – Keywords
Keywords for Aging


Keywords for Cancer

Tumor
Haemangioendothelioblastoma Phaeochromocytoma Interstitial radiation therapy Rhabdosarcoma
Keywords for Diabetes
Type II diabetes
Appendix 3 – Comparing similar links

Total Overlap Results for Diabetes

Nucleotide Protein PUBMED (number and % of) C1
Total Overlap Results for Aging

Nucleotide Protein PUBMED (Number and % of) M1
Total Overlap Results for Cancer

Nucleotide Protein PUBMED (Number and % of) M1 Appendix 4 – Comparing evaluation paths

Total overlap for Diabetes

Appendix 5 - Time (in seconds) used to collect the datasets


Appendix 6 – How to obtain data source cardinality at NCBI (
from http://www.ncbi.nlm.nih.gov/Sitemap/Summary/statistics.html#Entrez
DatabaseStats)

Other Entrez databases do not have explicit statistics web pages, but you
see the number of records in each Entrez database by viewing the index of
the Filter field. Each database has the term "all" in its Filter field. The
number in parentheses beside that term is the number of records
currently present in the database.
For example, to see the number of records in the NCBI Structure
database
, follow these steps (the links will open in a separate window):
• From the Entrez home page, follow the link for the Structure • On the Entrez Structure database page, select Preview/Index
from the grey area under the search box There are two search boxes on the Preview/Index page: (a) the search box near the top of the page shows the active query; (b) the search box near the bottom of the page is like a "worksheet" that allows you to browse the index of a search field of interest and/or to select one or more terms from the index for addition to your active query • Select the Filter field from the pop-up menu of searchable fields
that is shown beside the lower search box. • Enter "all" (without quotes) as the search term and press the Index
button A window will appear at the bottom of the page that allows you to see your term in the index of the search field, and to browse up and down the index. (Tip: If no term is entered in the search box before pressing the "Index" button, the system will automatically take you to the first term in the index. Entering a search term simply forces the system to jump to a specific part of the index.) • The number in parentheses beside the term "all" is the number of
records currently in the structure database.

References
Bartlett, J. C., E. G. Toms (2005). "Developing a Protocol for Bioinformatics
Analysis: An Integrated Information Behaviors and Task Analysis
Approach." Journal of the American Society for Information Science and
Technology 56(5): 469-482.
Burks, C. (1999). "Molecular Biology Database List." Nucleic Acids Res. 27: 1 - 9.
Cohen-Boulakia, S., Susan Davidson, Christine Froidevaux, Zoé Lacroix, Maria-
Esther Vidal (2006). "Path-based systems to guide life scientists in the maze of biological data sources." Journal of Bioinformatics and Computational Biology. Galperin, M. Y. (2005). "The Molecular Biology Database Collection: 2005 update." Nucleic Acids Res. 33: D5 - D24.
Lacroix, Z. (2003). Public data sources and applications used by scientists, Lacroix, Z., Kaushal Parekh, Maria-Esther Vidal, Marelis Cardenas, Natalia Marquez (2005). BioNavigation: Selecting Optimum Paths through Biological Resources to Evaluate Ontological Navigational Queries. Data Integration in the Life Sciences. Lacroix, Z., L. Raschid, and B.A. Eckman (2004). "Exploiting Biomolecular Source Capabilities for Query Optimization." Journal of Bioinformatics and
Computational Biology 2(2): 375-411.
Lawson, A. (1995). Studying for Biology, Addison-Wesley Educational Lord, P., Bechhofer S., Wilkinson MD., Schiltz G., Gessler D., Hull D., Goble C., Stein L. (2004). Applying semantic web services to Bioinformatics: Experiences gained, lessons learnt . ISWC, Springer-Verlag. Stajich, J. E., Block D., Boulez K., Brenner SE., Chervitz SA., Dagdigian C., Fuellen G., Gilbert JG., Korf I., Lapp H., Lehvaslaiho H., Matsalla C.,
Mungall CJ., Osborne BI., Pocock MR., Schattner P., Senger M., Stein
LD., Stupka E., Wilkinson MD., Birney E. (2002). "The Bioperl toolkit: Perl
modules for the life sciences." Genome Res. 12(10): 1611-8.
Stevens, R., Goble, C., Baker, P., and Brass, A. (2001). "A Classification of Tasks in Bioinformatics." Bioinformatics 17(2): 180-188.
Wilkinson, M. D., D. Gessler, A. Farmer, L. Stein (2003). The BioMOBY Project Explores Open-Source, Simple, Extensible Protocols for Enabling Biological Database Interoperability. Virt Conf Genom and Bioinf.

Source: http://bioinformatics.eas.asu.edu/PAPERS/TR%2001-05%20Appendices.pdf

juniorblind.org

Hey Buddies! Thank you for applying to Camp Bloomfield’s buddy program. Buddies will sign up according to their school grade. The buddy program gives sighted kids an opportunity to volunteer and participate along side the legally blind campers allowing them both to share each other’s experiences. Space is limited for this unique program during the Youth Development and Elementary/Juni

biblesabbath.org.uk

ON THE POSSIBILITY OF DIRECTLY ACCESSING EVERY HU-ELECTROMAGNETIC INDUCTION OF FUNDAMENTAL ALGO-from http://www.mindcontrolforums.com/mindnet/mn165.htm This was published in 1995. What is the current state of the art now ? MindNet Journal - Vol. 1, No. 65 V E R I C O M M / MindNet "Quid veritas est?"The views and opinions expressed below are not necessarily theviews and opinions

Copyright © 2010 Health Drug Pdf