The Theory behind Keyword Analysis
Transkript
The Theory behind Keyword Analysis
The Theory behind Keyword Analysis Václav Cvrček Workshop on Quantitative Text Analysis for SSH Brown University April 8, 2016 IntroductionIj Introduction Text interpretation ▶ at the core of the humanities’ mission Introduction Text interpretation ▶ at the core of the humanities’ mission ▶ our interpretation + other people’s interpretation Introduction Text interpretation ▶ at the core of the humanities’ mission ▶ our interpretation + other people’s interpretation ▶ interpretation with minimum amount of extra-textual information and intuition Introduction Text interpretation ▶ at the core of the humanities’ mission ▶ our interpretation + other people’s interpretation ▶ interpretation with minimum amount of extra-textual information and intuition ▶ frame of reference, scheme, expectations, communicative norms… Introduction Text interpretation ▶ at the core of the humanities’ mission ▶ our interpretation + other people’s interpretation ▶ interpretation with minimum amount of extra-textual information and intuition ▶ frame of reference, scheme, expectations, communicative norms… ▶ there is no objective interpretation – depends on point of view (recipient) Language corpusIj What is a corpus? ▶ sample of naturally occurring written texts or transcribed speeches ▶ stored electronically (searchable) ▶ basis for linguistic analysis and description CADS = corpus assisted discourse studies “A Needle in a Haystack” (collaborative research project of Brown and Charles University) ▶ how language reflects the changing nature of the society? CADS = corpus assisted discourse studies “A Needle in a Haystack” (collaborative research project of Brown and Charles University) ▶ how language reflects the changing nature of the society? ▶ how different is the interpretation of the contemporary and historical reader? CADS = corpus assisted discourse studies “A Needle in a Haystack” (collaborative research project of Brown and Charles University) ▶ how language reflects the changing nature of the society? ▶ how different is the interpretation of the contemporary and historical reader? ▶ how can we test the limits of the corpus-based quantitative analysis of text? http://brown.edu/research/projects/needle-in-haystack/ Prominent itemsIj Interpretation and prominence How do we start with interpretation? ▶ what is striking in a text? ▶ topics, motives, themes – expressed by words ▶ interaction between words, topics… ▶ function/meaning of words, topics… ▶ minimize researcher bias Why don’t you simply count them? Top 10 lemmas: the be of a and to he it have in the and to be of we that a our in the and be of to a he have in they Why don’t you simply count them? Top 10 lemmas: the be of a and to he it have in G. Orwell: 1984 the and to be of we that a our in SOTU 2009–2016 the and be of to a he have in they JRRT: Hobbit Thematic concentration ▶ content words with “abnormal” frequency Thematic concentration ▶ ▶ content words with “abnormal” frequency Zipf’s word-frequency distribution Thematic concentration ▶ ▶ content words with “abnormal” frequency Zipf’s word-frequency distribution ▶ h-point: rank = frequency Thematic concentration ▶ ▶ content words with “abnormal” frequency Zipf’s word-frequency distribution ▶ ▶ h-point: rank = frequency h-point – approximately separates autosemantic and synsemantic branch Thematic concentration ▶ ▶ content words with “abnormal” frequency Zipf’s word-frequency distribution ▶ ▶ ▶ h-point: rank = frequency h-point – approximately separates autosemantic and synsemantic branch content words above the h-point are TC words Thematic concentration ▶ ▶ content words with “abnormal” frequency Zipf’s word-frequency distribution ▶ ▶ ▶ ▶ h-point: rank = frequency h-point – approximately separates autosemantic and synsemantic branch content words above the h-point are TC words © Popescu & Altmann 2006 Thematic concetration TC words 1984: Winston, say, know, Party, face, word, O’Brien, seem, look, never, think, moment, always, hand, year, way, long, now, eye, day, possible, war… SOTU: year, job, work, America, new, people, american, know, need, help, country, business, time, world, economy, family, right, tax, Congress, nation… Hobbit: come, go, Bilbo, see, dwarves, time, long, make, think, great, good, know, far, still, goblin, find, way, look, little, light… TC: discussion Pros and cons of TC words + objective – based on frequency distribution TC: discussion Pros and cons of TC words + objective – based on frequency distribution + text analytical applications: comparing texts according to their thematic compactness TC: discussion Pros and cons of TC words + objective – based on frequency distribution + text analytical applications: comparing texts according to their thematic compactness + no reference corpus required TC: discussion Pros and cons of TC words + objective – based on frequency distribution + text analytical applications: comparing texts according to their thematic compactness + no reference corpus required - set of TC words is invariant TC: discussion Pros and cons of TC words + objective – based on frequency distribution + text analytical applications: comparing texts according to their thematic compactness + no reference corpus required - set of TC words is invariant - “interpretation without interpretor” – interpretation always depends on the point of view Keyword analysisIj Keywords and KWA Keywords ▶ 1 homonymous term1 For other meanings see e.g. Williams 1976 or Wierzbicka 1997. Keywords and KWA Keywords ▶ homonymous term1 ▶ words with higher relative frequency in a text 1 For other meanings see e.g. Williams 1976 or Wierzbicka 1997. Keywords and KWA Keywords ▶ homonymous term1 ▶ words with higher relative frequency in a text ▶ based on comparison with reference corpus 1 For other meanings see e.g. Williams 1976 or Wierzbicka 1997. Keywords and KWA Keywords ▶ homonymous term1 ▶ words with higher relative frequency in a text ▶ based on comparison with reference corpus ▶ significance testing: χ2 test, log-likelihood (G) test, Fisher test 1 For other meanings see e.g. Williams 1976 or Wierzbicka 1997. Keywords and KWA Keywords ▶ homonymous term1 ▶ words with higher relative frequency in a text ▶ based on comparison with reference corpus ▶ significance testing: χ2 test, log-likelihood (G) test, Fisher test A word-form which recurs within the text in question will be more likely to be key in it. (Scott–Tribble 2006) 1 For other meanings see e.g. Williams 1976 or Wierzbicka 1997. Obtaining keywords – algorithm Procedure ▶ count frequency of each word – most frequent words are the, of, was… Obtaining keywords – algorithm Procedure ▶ count frequency of each word – most frequent words are the, of, was… ▶ compare it with a frequency of the same word in a corpus Obtaining keywords – algorithm Procedure ▶ count frequency of each word – most frequent words are the, of, was… ▶ compare it with a frequency of the same word in a corpus ▶ use statistical tests: χ2 , log-likelihood or Fisher to find out if the difference is significant Obtaining keywords – algorithm Procedure ▶ count frequency of each word – most frequent words are the, of, was… ▶ compare it with a frequency of the same word in a corpus ▶ use statistical tests: χ2 , log-likelihood or Fisher to find out if the difference is significant ▶ interpret top X most prominent keywords Obtaining keywords – algorithm Procedure ▶ count frequency of each word – most frequent words are the, of, was… ▶ compare it with a frequency of the same word in a corpus ▶ use statistical tests: χ2 , log-likelihood or Fisher to find out if the difference is significant ▶ interpret top X most prominent keywords Keywords: Words which appear in a text or corpus that are statistically significantly more frequent than would be expected by chance when compared to a corpus which is larger or of equal size. Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Metrics used to calculate keyness ▶ significance – level of certainty we have that the difference exists (N.B. χ2 test is “asymptotically true”) Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Metrics used to calculate keyness ▶ significance – level of certainty we have that the difference exists (N.B. χ2 test is “asymptotically true”) ▶ relevance – importance of the difference (for interpretation) Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Metrics used to calculate keyness ▶ significance – level of certainty we have that the difference exists (N.B. χ2 test is “asymptotically true”) ▶ relevance – importance of the difference (for interpretation) crucial for the top X approach: ▶ Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Metrics used to calculate keyness ▶ significance – level of certainty we have that the difference exists (N.B. χ2 test is “asymptotically true”) ▶ relevance – importance of the difference (for interpretation) crucial for the top X approach: ▶ 1. identification of KWs – statistical tests Note on significance and effect size Gabrielatos, C. & Marchi, A. (2012): there is a difference between (statistical) significance and (linguistic) relevance (effect size) Metrics used to calculate keyness ▶ significance – level of certainty we have that the difference exists (N.B. χ2 test is “asymptotically true”) ▶ relevance – importance of the difference (for interpretation) crucial for the top X approach: ▶ 1. identification of KWs – statistical tests 2. ranking of KWs – task for a different metric DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) values of DIN cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) 0 (= when a word occurs equally in target and RefC) cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ ▶ ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) 0 (= when a word occurs equally in target and RefC) 100 (= when a word is present only in the target corpus) cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ ▶ ▶ ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) 0 (= when a word occurs equally in target and RefC) 100 (= when a word is present only in the target corpus) represents the proportion of the difference of relative frequencies to their mean (× 50) cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ ▶ ▶ ▶ ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) 0 (= when a word occurs equally in target and RefC) 100 (= when a word is present only in the target corpus) represents the proportion of the difference of relative frequencies to their mean (× 50) identical value of DIN for words appearing in a target text only (!) cf. Hofland–Johansson (1982). DIN coefficient Variation on the Sørensen–Dice’s coefficient2 : DIN = 100 × ▶ values of DIN ▶ ▶ ▶ ▶ ▶ ▶ 2 RelFq(Target) − RelFq(Reference) RelFq(Target) + RelFq(Reference) -100 (= when a word is present only in the RefC) 0 (= when a word occurs equally in target and RefC) 100 (= when a word is present only in the target corpus) represents the proportion of the difference of relative frequencies to their mean (× 50) identical value of DIN for words appearing in a target text only (!) useful for ranking of KWs (not for their identification!) cf. Hofland–Johansson (1982). Example 1: Grammatical words Keywords from all Husák’s New Year’s Addresses 400 300 200 100 0 Log−likelihood (rank) 500 600 All NYA (gram. words highlighted) vlastenectví zachování světem ozbrojených dalšímu jakývědeckýchpotřeb strany postupu vědomím odpovídá prospěchu rovnosti vztahy kterým občané potřebné lidstva hospodářského odkaz těžkosti solidaritu široká růstu aktivita hospodařit vstříc energetické odkazu s všestranného příští blaho všechny částech řešení zdravotnických všestranné našimi materiální milióny nedostatků vysokoumožností továrnách velký pracovní potvrzují přínos uspokojením zničení nezbytná starat pokrokovým tvůrčí zájmům krizových svobody našem obětavě prosinci krize spojeno pevnou orgánů si politiku zajistili duchovní správnou šesté mírovému let volby ekonomického společný kterém náročný zmařit láskou dobrá zbrojení stupních drahé díky metra podnětem mezinárodním stavbáchpozitivní důvěra bratrskými otevřela zásluhou zhodnotil zasedání složitých politiky pracovišti zdůraznit důkaz celého plnění sjezdu odhodlání přání sovětskouplní nedostatky problémů jaderných potřebám rozvíjela přispívat politice síly příznivý progresivních vývojem svou dařilo hovoříme naléhavě složek sil zasedáních konfrontace hrdosti náročných 1978 pracovištích vztazích postavení naléhavé čelit spravedlnost vysoce pozvednout národy svazu kupředu rychleji našemu angažovanost katastrofy každém spolu nemálo cen závěry vítěznou inteligence ekonomiku nadcházející vyspělost prohlubovat složitou děkuji užívání ve loňském stupňů rostoucí hladiny to dějinné společné výsledkům veškeré bratrský úspěšně kontinentě vnější školství slabá problémy nám společenské věnovat důstojně rovnováhu měny abych které potvrdily rozloučili organizací správě pracujícího současných obětavou pokrok osvobozenecký vyspělou smyslem soudružské bezpečnost dobrý zájmu vykonali odhodláni kultuře celém společenský otevřeně spokojeně celé historickými připomínat postup cíle lidstvo nezávislosti generace sborů prostá velkou sjezd lidí hmotné úspěšného hrdostí dynamiku školských vůle hrdí sovětského perspektivy nejvyšších všestranný pracujících vyžadovat zlepšování ovzduší službách evropě podílí úseků složitost zřízením upevnění spolupráci proto xiv příštích rozdílným záměry připomněli pevným přestavby republika návrhy socialistickásplnění vykonanou začneme obětavá podporuje činorodé příznivé spojenci překonávání přátelství celkově podíleli šťastného armádou abyste vyžadují zajištění musíme zápase udrželi osvobozování horečného bezpečnosti nových složitá mládeži stručně přestavbě mírovou efektivněji zdrojem politika vůlí minulém bratrskému pokračovat důsledně díváme pramení energičtěji udržet jednoty program tužby zlepšení povinnosti realisticky jaderné dobrých opravňuje novém krok oblastech vše přáteli plně sociální nimiž bratrské soužití usilujeme lid pokračoval samozřejmé osmé upřímné budoucnosti pozdravil cesta plodem víme rozvíjet úrovně přesvědčen 1981 znovu významné nejspolehlivější cestu opravňují bratrských pro obav Úspěšně překážek překonání vyvrcholení jednou československéholepší pracíúkolů práce plnou přáním uskutečňovat důrazmuselizdravotnictví tvořivé urychlení zabezpečit ženám generacím dialogu vzestupný rozkvětu usilovat kriticky letošním mezinárodníchzeměmi dělníkům dnešním radost socialistickými odhodláním ústavechzastupitelských společným slováků nadcházejícím životních poctivé dovolte Čechů dopravě dobrou podporu významných dosažené občany svým rolníkům varšavské důvěry příslušníkům řešit právem spolehlivou uplynulým dalším dopadynastávajícím upevnili odpovědně pozitivních československé výhodnou úspěchy nestraníků tvořivou udělejmevzpomínat uskutečňování států vážíme zlepšovat mírovýchuvolňování prošli 1982 tvořiváoceňujeme cestou mnoho reálné dalšího překonávat země konstruktivní výročí událostinárodně výsledků mírové odzbrojení srdce upevňování pracovat vašim podmínkách 1983 spolupráce vývoj náročně další kvalitněji všude přikládáme vzájemně prohlubování úroveň opíráme osvobozeneckého inteligenci Československá sociálních úspěšný hranicemi naši dalšími svědomitou žít chceme šťastný uplynulých vrstev rodinném uplynulého všestranná domovům hodnotíme nadále jistoty přesvědčeni dále jistot považujeme hodně aby uvědomujeme rozkvétala rozvíjelo dosáhli přátelům desetiletí národností máme vás jdeme věříme zápasu výboru pevné národů úsilí rozvoje socialistických vážené hrozby pozdravuji pokročili míru přejeme rozvoji dobrým mezinárodní napětí zřízení vzkvétala podporujeme úsecích zápas lépe klademe prožili odvrácení rozvoj pohodě náročné 1986 aktivně pohodu obětavé zamýšlíme vlast spojenectví Československo životní spokojený abychom pětiletky úkoly pokroku úspěchů sovětským prahu pozdravy 1987 vstupu minulého nás občanům mírový občanů střízlivým dalších hospodářství zdravíme přičiňme novoroční práci osobním svazem říci připomeneme našim spokojenost poctivou spokojenosti dařilauplynulý chci státu všech přátele socialistické přispěli uplynulém společnosti mírového důvěrou komunistické života optimismem xviixvi štěstí rozkvět všem poděkovat náš dobré státy vážení upřímně soudruzi posíláme můžeme život výsledky zdravím srdečně naše roce rokem zdraví společenství a národní ústředního vám fronty našich životě nového vstupujeme rok socialistického světě vlasti i jménem jsme Československa budeme soudružky roku přátelé přeji našeho spoluobčané našílidu drazí 0 100 200 300 Dice (rank) 400 500 600 Tools for KWAIj KWords http://kwords.korpus.cz KWords: analysed text KWords: list of keywords Two types of prominent units: keywords and thematic concentration Dispersion plot SOTU 2016: terrorism × economy Keyword links Keyword links according to the size of the window Distant KW Links = KWs appearing in distant context (-15;-5) and (5;15); these KW links indicate that the themes represented by these KWs may form a discourse-semantic network Keyword links Keyword links according to the size of the window Distant KW Links = KWs appearing in distant context (-15;-5) and (5;15); these KW links indicate that the themes represented by these KWs may form a discourse-semantic network Immediate KW Links (multi-word KWs) = co-occurrence of two or more KWs within immediate or near context (-2;2); these adjacent KW may signal multi-word KW unit (e.g. American people, better politics) Keyword links Keyword links according to the size of the window Distant KW Links = KWs appearing in distant context (-15;-5) and (5;15); these KW links indicate that the themes represented by these KWs may form a discourse-semantic network Immediate KW Links (multi-word KWs) = co-occurrence of two or more KWs within immediate or near context (-2;2); these adjacent KW may signal multi-word KW unit (e.g. American people, better politics) custom = arbitrary size of the span ame rica ns pla ne t sta all mp ies e ak m wo terr o voic rist stro es nge st am eri ca isil qaeda bipartisan hardwork ing weake n folks terror ists new people t jus rs yea r yea o wh ip rsh y de rac lea moc de ate m cli rk s nd tre my no eco nee d n rica ame harder nt retireme keep fellow want businesses world politics country lot care states our hing everyt let re futu ry eve elec ted us al er bett believe military we ge y securit chan basic job nearly dy te bo vo ery ev irit sp ee r ag kers r wo ilies ity fam tun r po op pen hap rgy ene cit ize ns na t job ion to s n co igh ng t re ss Comparison For comparing texts – time series (SOTU) State of the UnionIj Obama’s State of the Union Address Eight addresses (2009–2016) Tokens Types 2009 6346 1347 2010 8024 1531 2011 7611 1505 2012 7743 1555 2013 7427 1561 2014 7506 1598 2015 7479 1521 2016 6671 1422 source: http://www.whitehouse.gov Average length N = 7351 Average vocabulary V(N) = 1505 Reference corpus: British National Corpus (BNC) Permanent KWs (Key KWs) KWs appearing in all eight addresses: america, american, americans, businesses, congress, country, economy, jobs, nation, tonight KWs appearing in six or seven addresses: energy, every, let, more, new, people, democrats, families, make, millions, republicans, tax, why SOTU: Topics – economy/politics Reference corpus in KWAIj Reference corpus in KWA What does reference corpus affect? size: bigger reference corpus ⇒ more KWs Reference corpus in KWA What does reference corpus affect? size: bigger reference corpus ⇒ more KWs composition: different reference corpora represent different readers (conceptualized reader) Reference corpus in KWA What does reference corpus affect? size: bigger reference corpus ⇒ more KWs composition: different reference corpora represent different readers (conceptualized reader) ▶ balanced corpus ∼ general reader Reference corpus in KWA What does reference corpus affect? size: bigger reference corpus ⇒ more KWs composition: different reference corpora represent different readers (conceptualized reader) ▶ balanced corpus ∼ general reader ▶ specialized corpus ∼ specific reader (e.g. from the past, with specific background…) Different readers = different interpretations Contrastive KWA analysis ▶ different RefCs: interferences – time, style, topic differences Different readers = different interpretations Contrastive KWA analysis ▶ ▶ different RefCs: interferences – time, style, topic differences New Year’s Addresses of the last communist president of the Czechoslovakia Gustáv Husák (1975–1989) Different readers = different interpretations Contrastive KWA analysis ▶ ▶ different RefCs: interferences – time, style, topic differences New Year’s Addresses of the last communist president of the Czechoslovakia Gustáv Husák (1975–1989) ▶ contemporary reader (SYN2010) × reader from the past (Totalita) Different readers = different interpretations Contrastive KWA analysis ▶ ▶ different RefCs: interferences – time, style, topic differences New Year’s Addresses of the last communist president of the Czechoslovakia Gustáv Husák (1975–1989) ▶ ▶ contemporary reader (SYN2010) × reader from the past (Totalita) State of the Union addresses of Barack Obama (2009–2016) Different readers = different interpretations Contrastive KWA analysis ▶ ▶ different RefCs: interferences – time, style, topic differences New Year’s Addresses of the last communist president of the Czechoslovakia Gustáv Husák (1975–1989) ▶ ▶ contemporary reader (SYN2010) × reader from the past (Totalita) State of the Union addresses of Barack Obama (2009–2016) ▶ general reader (BNC) × politician/expert (rest of Obama’s speeches) Husák: Influence of the reference corpora What happens if we compare texts to different RefCs? ▶ the inventory of KWs does not differ substantially Husák: Influence of the reference corpora What happens if we compare texts to different RefCs? ▶ the inventory of KWs does not differ substantially ▶ the difference is in ranking (prominence of KWs – DIN) Husák: Influence of the reference corpora What happens if we compare texts to different RefCs? ▶ the inventory of KWs does not differ substantially ▶ the difference is in ranking (prominence of KWs – DIN) Historical reader (Totalita) → genre differences ▶ Modal verbs: want, can ▶ Verbs: 1. sg./pl. Husák: Influence of the reference corpora What happens if we compare texts to different RefCs? ▶ the inventory of KWs does not differ substantially ▶ the difference is in ranking (prominence of KWs – DIN) Historical reader (Totalita) → genre differences ▶ ▶ Contemporary reader (SYN2010) → connected with historical events Modal verbs: want, can ▶ ideology Verbs: 1. sg./pl. ▶ archaisms, historism Detailed comparison – 3 thematic groups Cold war: mír, míru, mírová, mírové, mírového, mírovému, mírovou, mírový, mírových, mírovými, mírumilovné, mírumilovných, mírumilovným; napětí; odzbrojení, výzbroje, zbrojení, zbrojením, ozbrojených Collective possession: náš, naše, našeho, našem, našemu, naši, naší, našich, našim, naším, našimi Ideo markers: socialismu, socialismus, socialistická, socialistické, socialistického, socialistickém, socialistickému, socialistickou, socialistický, socialistických, socialistickým, socialistickými; komunismu, komunisté, komunistů, ksč, komunistům, komunisty komunistická, komunistické, komunistickým Cold war 90 100 Cold War KWs in SYN−KWA and TOT−KWA 70 60 50 40 DIN 80 SYN−KWA TOT−KWA 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 Year Fidler–Cvrček (2015) Collective possession 80 85 90 95 KWs "our" in SYN−KWA and TOT−KWA 65 70 75 DIN SYN−KWA TOT−KWA 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 Year Fidler–Cvrček (2015) Ideological markers 70 80 90 100 Ideological markers KWs in SYN−KWA and TOT−KWA 30 40 50 60 DIN SYN−KWA TOT−KWA 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 Year Fidler–Cvrček (2015) Reader from the past × contemporary reader Totalita 1. style & genre: fellow citizens, friends 2. propaganda: blossom, succeed 3. 1st pers. sg. (greet, wish) SYN2010 1. ideology-related: comrade(s), socialist, five-year plan 2. period-specific: brotherly, liberating, feverish, dutiful, imperialistic Difference in sensitivity ▶ lines for both readers have similar tendencies Difference in sensitivity ▶ lines for both readers have similar tendencies ▶ contemporary reader has higher overall level of prominence (DIN) Difference in sensitivity ▶ lines for both readers have similar tendencies ▶ contemporary reader has higher overall level of prominence (DIN) ▶ tendencies are more visible for reader from the past Difference in sensitivity ▶ lines for both readers have similar tendencies ▶ contemporary reader has higher overall level of prominence (DIN) ▶ tendencies are more visible for reader from the past ▶ contemporary reader cannot distinguish subtle changes (overwhelmed by unusual lexicon) Difference in sensitivity ▶ lines for both readers have similar tendencies ▶ contemporary reader has higher overall level of prominence (DIN) ▶ tendencies are more visible for reader from the past ▶ contemporary reader cannot distinguish subtle changes (overwhelmed by unusual lexicon) ▶ astute reader from the past might notice slight shifts in the discourse 2015 SOTU: General vs. specialist’s view Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Keyword rebekah bipartisan hardworking loopholes childcare republicans folks veterans diplomacy striving terrorists americans infrastructure fastest democrats Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Keyword ben economics childcare rebekah west believed striving tools sick spread write paid surely issues scientists 2015 SOTU: Differences Specialist’s view: Rhetorics/style: believed, write, thing, tools Important topics: economics, leave, paid, childcare 2015 SOTU: Differences Specialist’s view: Rhetorics/style: believed, write, thing, tools Important topics: economics, leave, paid, childcare General view: Political speeches: bipartisan, republicans, folks, veterans, diplomacy, terrorists, americans, infrastructure, democrats, combat, sanctions Topical words: hardworking, loopholes, childcare, striving, fastest References ▶ Cvrček, V. – Vondřička, P. (2013): KWords. FF UK. Praha. Available at: <http://kwords.korpus.cz>. ▶ Fidler, M. - Cvrček, V. (2015): A Data-Driven Analysis of Reader Viewpoints: Reconstructing the Historical Reader Using Keyword Analysis. Journal of Slavic Linguistics 23(2), (p. 197–239). ▶ Gabrielatos, C. – Marchi, A. (2012): Keyness: appropriate metrics and practical issues. Paper presented at the CADS International Conference, Bologna, Italy, September 2012 (www.gabrielatos.com/Presentations.htm). ▶ Hofland, K. – Johansson, S. (1982): Word frequencies in British and American English. Bergen: The Norwegian computing centre for the Humanities. ▶ Popescu, I. – Altmann, G. (2006): Some Aspects of Word Frequencies. Glottometrics 13, (p. 23–46). ▶ Scott, M. – Tribble, C. (2006): Textual patterns: Keyword and corpus analysis in language education. Amsterdam: Benjamins. ▶ Wierzbicka, A. (1997): Understanding cultures through their keywords. English, Russian, Polish, German, and Japanese. Oxford, New York: Oxford UP. ▶ Williams, R. (1976/85): Keywords: A vocabulary of culture and society. New York: Oxford UP. Thank you for your attention!
Podobné dokumenty
On Some Methodological Issues of CADS
lidstva
hospodářského
odkaz
těžkosti
solidaritu
široká
růstu
aktivita
hospodařit
vstříc
energetické
odkazu
s
všestranného
příští
blaho
všechny
částech
řešení
zdravotnických
všestranné našimi
materi...