Bioinformatika
Transkript
Bioinformatika pro PřfUK 2003 Jiří Vondrášek Ústav organické chemie a biochemie [email protected] Jan Pačes Ústav molekulární genetiky [email protected] http://bio.img.cas.cz/PrfUK2003 Databáze: obsah principy SQL formáty biologických sekvencí IUB kódy DNA databáze proteinové a genomové databáze strukturní databáze organizace databází Relační databáze c_id identifikátor, číslo a_id identifikátor title text c_id identifikátor journal krátký text name krátký text year datum … … k_id identifikátor c_id identifikátor keyword krátký text SQL: Structured Query Language c_id identifikátor, číslo title text journal krátký text year datum … … CREATE TABLE article ( c_id INTEGER, title TEXT, journal VARCHAR(30), year DATE ); SQL: Structured Query Language a_id identifikátor c_id identifikátor name krátký text CREATE TABLE author ( a_id INTEGER, c_id INTEGER, name VARCHAR(30) ); SQL: Structured Query Language INSERT INTO article SET c_id = '1', title = 'Something absolutely fantastic', journal = 'Bioinformatics', year = '2002'; INSERT INTO author SET a_id = '1', c_id = '1', name = 'Paces, Jan'; INSERT INTO author SET a_id = '2', c_id = '1', name = 'Vondrasek, Jiri'; SQL: Structured Query Language SELECT article.title,article.journal,author.name FROM article,journal WHERE article.c_id = author.c_id AND article.year > '2000' AND author.name LIKE 'Paces%'; IUB kódy nukleotidy kód A C G T (U M R W S Y K V H D B N - nukleotidy komplement A T C G G C T A U) A AC K AG Y AT S CG W CT R GT M ACG B ACT D AGT H CGT V ACGT N mezera - aminokyseliny kód A C D G H I K L M N P Q R S T V W Y B třípísmenný kód Ala Cys Asp Glu His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Asx Z Glx X Xxx * --- aminokyselina alanin cystein asparagová kyselina glutamová kyselina histidin isoleucin lysin leucin methionin asparagin prolin glutamin arginin serin threonin valin tryptofan tyrosin asparagová kys. nebo asparagin glutamová kys. nebo glutamin jakákoliv aminokyselina stop formáty sekvencí binární textové s chromatogramy SCF ALF ABI pro programy interní formáty databází minimální text fasta anotované EMBL GenBank ASN XML formáty sekvencí - SCF SCF (standart chromatogram file) formáty sekvencí - EMBL EMBL (formát databáze EMBL) ID XX AC XX SV XX DT DT XX DE XX KW XX OS OC OC XX RN RP RA RT RT RL XX RN RP RA RT RL RL RL XX FH … AF031150 standard; RNA; ROD; 1379 BP. AF031150; AF031150.1 27-FEB-1998 (Rel. 54, Created) 27-FEB-1998 (Rel. 54, Last updated, Version 1) Mus musculus paired-box transcription factor (Pax4) mRNA, complete cds. . Mus musculus (house mouse) Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. [1] 1-1379 Inoue H., Nomiyama J., Nakai K., Matsutani A., Tanizawa Y., Oka Y.; Isolation of full-length cDNA of mouse PAX4 gene and identification of its human homologue; Biochem. Biophys. Res. Commun. 243:628-633(1998). [2] 1-1379 Inoue H., Nomiyama J., Nakai K., Tanizawa Y., Oka Y.; ; Submitted (23-OCT-1997) to the EMBL/GenBank/DDBJ databases. Third Dept. of Int. Med., Yamaguchi University, 1144 Kogushi, Ube, Yamaguchi 755, Japan Key Location/Qualifiers formáty sekvencí - EMBL EMBL (formát databáze EMBL) … FH FH FT FT FT FT FT FT FT FT FT FT FT FT FT FT FT FT XX SQ Key Location/Qualifiers source 1..1379 /db_xref=taxon:10090 /organism=Mus musculus /cell_line=MIN6 297..1346 /codon_start=1 /gene=Pax4 /product=paired-box transcription factor /protein_id=AAC40046.1 /translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISR SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQ HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPR GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRR AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQL CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQV PSTHCSNWP CDS Sequence 1379 BP; 327 aaaaaaaaaa aaaaagcggc aaggctctgt gaagctctgg accagaccac cagcaaaccc ccaccttttt tcctccatcc gttttcagtt tgccagttgg agcaggacgg actcagcagt A; 402 C; 347 G; 303 T; 0 other; cgctgaattc tagcagaagg ctgccctctg accccctggc aggactgaag cagctggagg tggagcctgc acaggaccct gagacctctt agaaccagtc ccaaagagaa acttccagaa cttcctgtcc ttctgtgagg agtaccagtg gtgaatcagc tagggggact ctttgtgaat ctcctgagtg ctgttacaag cctggaattc ggagctctcc tgaagcatgc ggccggcccc 60 120 180 240 300 360 gctgtgggac cctactggga ggccctgcct caacccattg agatgttcca ctccttcctg cctgtgcatc ccataagagg tatctccaac gaatttgcct caagtgccat aaacctttt 1200 1260 1320 1379 … // agcaccaggc ctgccaatcc caccacccat ctcaaactgg gtgacacctc tggcttcctc atctgattgg cctctatttg atcccaggcc ctcatatgtg aggcccagga acagtaataa formáty sekvencí - GenBank Genbank LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM AF145233 1360 bp mRNA ROD 23-OCT-1999 Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds. AF145233 AF145233.1 GI:6102607 . house mouse. Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 1360) AUTHORS Kalousova,A., Benes,V., Paces,J., Paces,V. and Kozmik,Z. TITLE DNA binding and transactivating properties of the paired and homeobox protein Pax4 JOURNAL Biochem. Biophys. Res. Commun. 259 (3), 510-518 (1999) MEDLINE 99294619 PUBMED 10364449 REFERENCE 2 (bases 1 to 1360) AUTHORS Kalousova,A., Paces,J. and Kozmik,Z. TITLE Direct Submission JOURNAL Submitted (23-APR-1999) Dept. of Transcription Regulation, Institute of Molecular Genetics, Videnska 1083, Prague 142 20, Czech Republic FEATURES Location/Qualifiers source 1..1360 /organism="Mus musculus" /db_xref="taxon:10090" gene 1..1360 /gene="Pax4" CDS 211..1260 /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF03533.1" … formáty sekvencí - GenBank Genbank CDS 211..1260 /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF03533.1" /db_xref="GI:6102608" /translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI GGPGQVPSTHCSNWP" 359 a 381 c 328 g 292 t BASE COUNT ORIGIN 1 tggcaggact 61 ctgcacagga 121 agtcccaaag 181 gtccttctgt … 1081 tccagtgaca 1141 cctgtggctt 1201 catcatctga 1261 gaggcctcta 1321 aaaaaaaaaa // gaagcagctg ccctgagacc agaaacttcc gaggagtacc gaggctgtta tcttcctgga agaaggagct agtgtgaagc caagaccaga attcccacct ctccgttttc atgcagcagg ccaccagcaa tttttcctcc agtttgccag acggactcag accctggagc atccagaacc ttggcttcct cagtgtgaat cctcatccca cctcctcata ttggaggccc tttgacagta aaaaaaaaaa ggcctatctc tgtggaattt aggacaagtg ataaaaacct aaaaaaaaaa caaccctact gcctggccct ccatcaaccc tttcttagat aaaaaaaaaa gggactgcca gcctcaccac attgctcaaa gttaaaaaaa atccctcctt ccatcctgtg ctggccataa aaaaaaaaaa formáty sekvencí - FastA fasta >gi|6102607|gb|AF145233.1|AF145233 Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds TGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCAGACCACCAGCAAACCCTGGAGCCTGCACAGGA CCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCATCCAGAACCAGTCCCAAAGAGAAACTTCC AGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGC ATGCAGCAGGACGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCGGCCCCTTCCTC TGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCT TAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAG TGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATG AGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAA GGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACT CAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAG GCCCCCACCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAA AGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGGCTGCTGCCACCTCTCTGCCT GAAGACACGGTGAGGGTTTGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGG AAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACA GCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTTGCCTGTGCTGGAACCATTGAGTCCTTCCTTCTGTCAG CTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACT GGGACTGCCAATCCCTCCTTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCAC CCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAA GAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA formáty sekvencí - ASN ASN Seq-entry ::= set { class nuc-prot , descr { title "Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds." , source { org { taxname "Mus musculus" , common "house mouse" , db { { db "taxon" , tag id 10090 } } , orgname { name binomial { genus "Mus" , species "musculus" } , lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus" , gcode 1 , mgcode 2 , div "ROD" } } } , pub { pub { sub { authors { names std Bioinformatic Links GenBank Swiss-Prot Entrez Entrez •Literature (PubMed) •Nucleotide (GenBank) •Protein (PIR) •Genome •Structure (PDB) •PopSet •Taxonomy •OMIM Entrez Entrez Entrez SRS SRS SRS SRS SRS SRS SRS SRS SRS - list SRS - list SRS - list PDB PDB PDB PDB HEADER GENE REGULATION/DNA 22-APR-99 TITLE CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED DOMAIN-DNA TITLE 2 COMPLEX REVEALS A GENERAL MODEL FOR PAX PROTEIN-DNA TITLE 3 INTERACTIONS 6PAX COMPND MOL_ID: 1; COMPND 2 MOLECULE: HOMEOBOX PROTEIN PAX-6; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 BIOLOGICAL_UNIT: MONOMER; COMPND 6 MOL_ID: 2; COMPND 7 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 8 CHAIN: B; COMPND 9 ENGINEERED: YES; COMPND 10 BIOLOGICAL_UNIT: MONOMER; COMPND 11 MOL_ID: 3; COMPND 12 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 13 CHAIN: C; COMPND 14 ENGINEERED: YES; COMPND 15 BIOLOGICAL_UNIT: MONOMER SOURCE MOL_ID: 1; PDB SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES FORMUL HELIX HELIX HELIX HELIX HELIX HELIX SHEET SHEET CRYST1 ORIGX1 ORIGX2 ORIGX3 SCALE1 SCALE2 SCALE3 ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 A 133 SER HIS SER GLY VAL ASN GLN LEU GLY GLY VAL PHE 2 A 133 ASN GLY ARG PRO LEU PRO ASP SER THR ARG GLN ARG 3 A 133 VAL GLU LEU ALA HIS SER GLY ALA ARG PRO CYS ASP 4 A 133 SER ARG ILE LEU GLN VAL SER ASN GLY CYS VAL SER 5 A 133 ILE LEU GLY ARG TYR TYR ALA THR GLY SER ILE ARG 6 A 133 ARG ALA ILE GLY GLY SER LYS PRO ARG VAL ALA THR 7 A 133 GLU VAL VAL SER LYS ILE ALA GLN TYR LYS GLN GLU 8 A 133 PRO SER ILE PHE ALA TRP GLU ILE ARG ASP ARG LEU 9 A 133 SER GLU GLY VAL CYS THR ASN ASP ASN ILE PRO SER 10 A 133 SER SER ILE ASN ARG VAL LEU ARG ASN LEU ALA SER 11 A 133 LYS GLN GLN 1 B 26 A A G C A T T T T C A C 2 B 26 C A T G A G T G C A C A 1 C 26 T T C T G T G C A C T C 2 C 26 T G C G T G A A A A T G 4 HOH *84(H2 O1) 1 1 ASP A 20 HIS A 31 1 2 2 PRO A 36 LEU A 43 1 3 3 ASN A 47 THR A 60 1 4 4 PRO A 78 GLU A 90 1 5 5 ALA A 96 SER A 105 1 6 6 VAL A 117 GLU A 130 1 1 A 2 SER A 3 VAL A 5 0 2 A 2 VAL A 11 VAL A 13 -1 N PHE A 12 O GLY A 33.840 61.686 171.111 90.00 90.00 90.00 P 21 21 21 1.000000 0.000000 0.000000 0.00000 0.000000 1.000000 0.000000 0.00000 0.000000 0.000000 1.000000 0.00000 0.029551 0.000000 0.000000 0.00000 0.000000 0.016211 0.000000 0.00000 0.000000 0.000000 0.005844 0.00000 1 N SER A 1 -1.985 -12.356 81.201 1.00 60.11 2 CA SER A 1 -1.709 -12.440 82.636 1.00 60.41 3 C SER A 1 -2.774 -13.282 83.373 1.00 59.35 4 O SER A 1 -3.734 -13.763 82.751 1.00 58.16 5 CB SER A 1 -1.638 -11.029 83.229 1.00 64.08 6 OG SER A 1 -2.862 -10.345 83.045 1.00 69.46 7 H SER A 1 -2.431 -11.538 80.917 1.00 40.00 8 HG SER A 1 -2.887 -9.549 83.596 1.00 40.00 9 N HIS A 2 -2.634 -13.393 84.701 1.00 59.45 VAL ILE ILE LYS PRO PRO CYS LEU VAL GLU G G A C 12 8 14 13 10 14 4 4 N C C O C O H H N SCOP PDBsum PDBsum PDBsum CATH CATH FSSP - Fold classification Structural genomics Bioinformatické WWW rozcestníky EBI: Expasy: Pasteur: Lyon: NCBI: http://www.ebi.ac.uk/Tools http://www.expasy.ch http://bioweb.pasteur.fr http://pbil.univ-lyon1.fr http://ncbi.nlm.nih.gov EBI ExPASy PBIL Pasteur Bioinformatic Links
Podobné dokumenty
Drug design - Racionální návrh léčiv - Biotrend
organizmu – tzv. Anatomicko-terapeuticko-chemická klasifikace léčiv (ATC-klasifikace),
kterou spravuje Světová zdravotnická organizace (WHO) prost ednictvím World Health
Organization Collaborating ...
Člověk a šimpanz
difference that makes us human, but we can say, These are
the regions of the genome that show a lot of potential and
are excellent candidates to do further work on.”
RET : ANAL : FREQ :
Procenta ( z maximální hodnoty průtoku – Qabs. )představují pásmo, ve kterém je tlumení aktivní.
(Např. při nastavení 1 je tlumení aktivní v pásmu 1 kolem okamžité hodnoty průtoku). Při skokové
...
Tvorba (nejen) 3D grafiky v příkazovém prostředí Asymptote
label(A.pdf("controls",delay=20,keep=!settings.inlinetex));
MicroStation V8
AccuDraw, Bentley, emblém „B“ Bentley, MDL, MicroStation a SmartLine jsou
registrované ochranné známky; Bentley SELECT je registrovaná známka pro
služby; PopSet a Viecon jsou ochranné známky společ...