7.4 Inaugural Corpus USA
The readtext package comes with various datasets. We specify the path to where to find the datasets and upload them
7.4.2 Create a corpus
## Corpus consisting of 5 documents and 3 docvars.
## text1 :
## "Fellow-Citizens of the Senate and of the House of Representa..."
##
## text2 :
## "Fellow citizens, I am again called upon by the voice of my c..."
##
## text3 :
## "When it was first perceived, in early times, that no middle ..."
##
## text4 :
## "Friends and Fellow Citizens: Called upon to undertake the du..."
##
## text5 :
## "Proceeding, fellow citizens, to that qualification which the..."
7.4.2.1 Summary
## Corpus consisting of 5 documents, showing 5 documents:
##
## Text Types Tokens Sentences Year President FirstName
## text1 625 1538 23 1789 Washington George
## text2 96 147 4 1793 Washington George
## text3 826 2578 37 1797 Adams John
## text4 717 1927 41 1801 Jefferson Thomas
## text5 804 2381 45 1805 Jefferson Thomas
7.4.2.2 Editing docnames
docid <- paste(dat_inaug$Year,
dat_inaug$FirstName,
dat_inaug$President, sep = " ")
docnames(dat_inaug_corpus) <- docid
print(dat_inaug_corpus)
## Corpus consisting of 5 documents and 3 docvars.
## 1789 George Washington :
## "Fellow-Citizens of the Senate and of the House of Representa..."
##
## 1793 George Washington :
## "Fellow citizens, I am again called upon by the voice of my c..."
##
## 1797 John Adams :
## "When it was first perceived, in early times, that no middle ..."
##
## 1801 Thomas Jefferson :
## "Friends and Fellow Citizens: Called upon to undertake the du..."
##
## 1805 Thomas Jefferson :
## "Proceeding, fellow citizens, to that qualification which the..."
7.4.2.3 Accessing parts of corpus
## [1] "Fellow-Citizens of the Senate and of the House of Representatives:\n\nAmong the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order, and received on the 14th day of the present month. On the one hand, I was summoned by my Country, whose voice I can never hear but with veneration and love, from a retreat which I had chosen with the fondest predilection, and, in my flattering hopes, with an immutable decision, as the asylum of my declining years -- a retreat which was rendered every day more necessary as well as more dear to me by the addition of habit to inclination, and of frequent interruptions in my health to the gradual waste committed on it by time. On the other hand, the magnitude and difficulty of the trust to which the voice of my country called me, being sufficient to awaken in the wisest and most experienced of her citizens a distrustful scrutiny into his qualifications, could not but overwhelm with despondence one who (inheriting inferior endowments from nature and unpracticed in the duties of civil administration) ought to be peculiarly conscious of his own deficiencies. In this conflict of emotions all I dare aver is that it has been my faithful study to collect my duty from a just appreciation of every circumstance by which it might be affected. All I dare hope is that if, in executing this task, I have been too much swayed by a grateful remembrance of former instances, or by an affectionate sensibility to this transcendent proof of the confidence of my fellow citizens, and have thence too little consulted my incapacity as well as disinclination for the weighty and untried cares before me, my error will be palliated by the motives which mislead me, and its consequences be judged by my country with some share of the partiality in which they originated.\n\nSuch being the impressions under which I have, in obedience to the public summons, repaired to the present station, it would be peculiarly improper to omit in this first official act my fervent supplications to that Almighty Being who rules over the universe, who presides in the councils of nations, and whose providential aids can supply every human defect, that His benediction may consecrate to the liberties and happiness of the people of the United States a Government instituted by themselves for these essential purposes, and may enable every instrument employed in its administration to execute with success the functions allotted to his charge. In tendering this homage to the Great Author of every public and private good, I assure myself that it expresses your sentiments not less than my own, nor those of my fellow citizens at large less than either. No people can be bound to acknowledge and adore the Invisible Hand which conducts the affairs of men more than those of the United States. Every step by which they have advanced to the character of an independent nation seems to have been distinguished by some token of providential agency; and in the important revolution just accomplished in the system of their united government the tranquil deliberations and voluntary consent of so many distinct communities from which the event has resulted can not be compared with the means by which most governments have been established without some return of pious gratitude, along with an humble anticipation of the future blessings which the past seem to presage. These reflections, arising out of the present crisis, have forced themselves too strongly on my mind to be suppressed. You will join with me, I trust, in thinking that there are none under the influence of which the proceedings of a new and free government can more auspiciously commence.\n\nBy the article establishing the executive department it is made the duty of the President \"to recommend to your consideration such measures as he shall judge necessary and expedient.\" The circumstances under which I now meet you will acquit me from entering into that subject further than to refer to the great constitutional charter under which you are assembled, and which, in defining your powers, designates the objects to which your attention is to be given. It will be more consistent with those circumstances, and far more congenial with the feelings which actuate me, to substitute, in place of a recommendation of particular measures, the tribute that is due to the talents, the rectitude, and the patriotism which adorn the characters selected to devise and adopt them. In these honorable qualifications I behold the surest pledges that as on one side no local prejudices or attachments, no separate views nor party animosities, will misdirect the comprehensive and equal eye which ought to watch over this great assemblage of communities and interests, so, on another, that the foundation of our national policy will be laid in the pure and immutable principles of private morality, and the preeminence of free government be exemplified by all the attributes which can win the affections of its citizens and command the respect of the world. I dwell on this prospect with every satisfaction which an ardent love for my country can inspire, since there is no truth more thoroughly established than that there exists in the economy and course of nature an indissoluble union between virtue and happiness; between duty and advantage; between the genuine maxims of an honest and magnanimous policy and the solid rewards of public prosperity and felicity; since we ought to be no less persuaded that the propitious smiles of Heaven can never be expected on a nation that disregards the eternal rules of order and right which Heaven itself has ordained; and since the preservation of the sacred fire of liberty and the destiny of the republican model of government are justly considered, perhaps, as deeply, as finally, staked on the experiment entrusted to the hands of the American people.\n\nBesides the ordinary objects submitted to your care, it will remain with your judgment to decide how far an exercise of the occasional power delegated by the fifth article of the Constitution is rendered expedient at the present juncture by the nature of objections which have been urged against the system, or by the degree of inquietude which has given birth to them. Instead of undertaking particular recommendations on this subject, in which I could be guided by no lights derived from official opportunities, I shall again give way to my entire confidence in your discernment and pursuit of the public good; for I assure myself that whilst you carefully avoid every alteration which might endanger the benefits of an united and effective government, or which ought to await the future lessons of experience, a reverence for the characteristic rights of freemen and a regard for the public harmony will sufficiently influence your deliberations on the question how far the former can be impregnably fortified or the latter be safely and advantageously promoted.\n\nTo the foregoing observations I have one to add, which will be most properly addressed to the House of Representatives. It concerns myself, and will therefore be as brief as possible. When I was first honored with a call into the service of my country, then on the eve of an arduous struggle for its liberties, the light in which I contemplated my duty required that I should renounce every pecuniary compensation. From this resolution I have in no instance departed; and being still under the impressions which produced it, I must decline as inapplicable to myself any share in the personal emoluments which may be indispensably included in a permanent provision for the executive department, and must accordingly pray that the pecuniary estimates for the station in which I am placed may during my continuance in it be limited to such actual expenditures as the public good may be thought to require.\n\nHaving thus imparted to you my sentiments as they have been awakened by the occasion which brings us together, I shall take my present leave; but not without resorting once more to the benign Parent of the Human Race in humble supplication that, since He has been pleased to favor the American people with opportunities for deliberating in perfect tranquillity, and dispositions for deciding with unparalleled unanimity on a form of government for the security of their union and the advancement of their happiness, so His divine blessing may be equally conspicuous in the enlarged views, the temperate consultations, and the wise measures on which the success of this Government must depend. "
## [1] "Fellow-Citizens of the Senate and of the House of Representatives:\n\nAmong the vicissitudes incident to life no event could have filled me with greater anxieties than that of which the notification was transmitted by your order, and received on the 14th day of the present month. On the one hand, I was summoned by my Country, whose voice I can never hear but with veneration and love, from a retreat which I had chosen with the fondest predilection, and, in my flattering hopes, with an immutable decision, as the asylum of my declining years -- a retreat which was rendered every day more necessary as well as more dear to me by the addition of habit to inclination, and of frequent interruptions in my health to the gradual waste committed on it by time. On the other hand, the magnitude and difficulty of the trust to which the voice of my country called me, being sufficient to awaken in the wisest and most experienced of her citizens a distrustful scrutiny into his qualifications, could not but overwhelm with despondence one who (inheriting inferior endowments from nature and unpracticed in the duties of civil administration) ought to be peculiarly conscious of his own deficiencies. In this conflict of emotions all I dare aver is that it has been my faithful study to collect my duty from a just appreciation of every circumstance by which it might be affected. All I dare hope is that if, in executing this task, I have been too much swayed by a grateful remembrance of former instances, or by an affectionate sensibility to this transcendent proof of the confidence of my fellow citizens, and have thence too little consulted my incapacity as well as disinclination for the weighty and untried cares before me, my error will be palliated by the motives which mislead me, and its consequences be judged by my country with some share of the partiality in which they originated.\n\nSuch being the impressions under which I have, in obedience to the public summons, repaired to the present station, it would be peculiarly improper to omit in this first official act my fervent supplications to that Almighty Being who rules over the universe, who presides in the councils of nations, and whose providential aids can supply every human defect, that His benediction may consecrate to the liberties and happiness of the people of the United States a Government instituted by themselves for these essential purposes, and may enable every instrument employed in its administration to execute with success the functions allotted to his charge. In tendering this homage to the Great Author of every public and private good, I assure myself that it expresses your sentiments not less than my own, nor those of my fellow citizens at large less than either. No people can be bound to acknowledge and adore the Invisible Hand which conducts the affairs of men more than those of the United States. Every step by which they have advanced to the character of an independent nation seems to have been distinguished by some token of providential agency; and in the important revolution just accomplished in the system of their united government the tranquil deliberations and voluntary consent of so many distinct communities from which the event has resulted can not be compared with the means by which most governments have been established without some return of pious gratitude, along with an humble anticipation of the future blessings which the past seem to presage. These reflections, arising out of the present crisis, have forced themselves too strongly on my mind to be suppressed. You will join with me, I trust, in thinking that there are none under the influence of which the proceedings of a new and free government can more auspiciously commence.\n\nBy the article establishing the executive department it is made the duty of the President \"to recommend to your consideration such measures as he shall judge necessary and expedient.\" The circumstances under which I now meet you will acquit me from entering into that subject further than to refer to the great constitutional charter under which you are assembled, and which, in defining your powers, designates the objects to which your attention is to be given. It will be more consistent with those circumstances, and far more congenial with the feelings which actuate me, to substitute, in place of a recommendation of particular measures, the tribute that is due to the talents, the rectitude, and the patriotism which adorn the characters selected to devise and adopt them. In these honorable qualifications I behold the surest pledges that as on one side no local prejudices or attachments, no separate views nor party animosities, will misdirect the comprehensive and equal eye which ought to watch over this great assemblage of communities and interests, so, on another, that the foundation of our national policy will be laid in the pure and immutable principles of private morality, and the preeminence of free government be exemplified by all the attributes which can win the affections of its citizens and command the respect of the world. I dwell on this prospect with every satisfaction which an ardent love for my country can inspire, since there is no truth more thoroughly established than that there exists in the economy and course of nature an indissoluble union between virtue and happiness; between duty and advantage; between the genuine maxims of an honest and magnanimous policy and the solid rewards of public prosperity and felicity; since we ought to be no less persuaded that the propitious smiles of Heaven can never be expected on a nation that disregards the eternal rules of order and right which Heaven itself has ordained; and since the preservation of the sacred fire of liberty and the destiny of the republican model of government are justly considered, perhaps, as deeply, as finally, staked on the experiment entrusted to the hands of the American people.\n\nBesides the ordinary objects submitted to your care, it will remain with your judgment to decide how far an exercise of the occasional power delegated by the fifth article of the Constitution is rendered expedient at the present juncture by the nature of objections which have been urged against the system, or by the degree of inquietude which has given birth to them. Instead of undertaking particular recommendations on this subject, in which I could be guided by no lights derived from official opportunities, I shall again give way to my entire confidence in your discernment and pursuit of the public good; for I assure myself that whilst you carefully avoid every alteration which might endanger the benefits of an united and effective government, or which ought to await the future lessons of experience, a reverence for the characteristic rights of freemen and a regard for the public harmony will sufficiently influence your deliberations on the question how far the former can be impregnably fortified or the latter be safely and advantageously promoted.\n\nTo the foregoing observations I have one to add, which will be most properly addressed to the House of Representatives. It concerns myself, and will therefore be as brief as possible. When I was first honored with a call into the service of my country, then on the eve of an arduous struggle for its liberties, the light in which I contemplated my duty required that I should renounce every pecuniary compensation. From this resolution I have in no instance departed; and being still under the impressions which produced it, I must decline as inapplicable to myself any share in the personal emoluments which may be indispensably included in a permanent provision for the executive department, and must accordingly pray that the pecuniary estimates for the station in which I am placed may during my continuance in it be limited to such actual expenditures as the public good may be thought to require.\n\nHaving thus imparted to you my sentiments as they have been awakened by the occasion which brings us together, I shall take my present leave; but not without resorting once more to the benign Parent of the Human Race in humble supplication that, since He has been pleased to favor the American people with opportunities for deliberating in perfect tranquillity, and dispositions for deciding with unparalleled unanimity on a form of government for the security of their union and the advancement of their happiness, so His divine blessing may be equally conspicuous in the enlarged views, the temperate consultations, and the wise measures on which the success of this Government must depend. "
7.4.3 Advanced manipulations
7.4.3.1 Tokens
tokens()
segments texts in a corpus into tokens (words or sentences) by word boundaries.
We can remove punctuations or not
7.4.3.1.1 With punctuations
## Tokens consisting of 5 documents and 3 docvars.
## 1789 George Washington :
## [1] "Fellow-Citizens" "of" "the" "Senate"
## [5] "and" "of" "the" "House"
## [9] "of" "Representatives" ":" "Among"
## [ ... and 1,526 more ]
##
## 1793 George Washington :
## [1] "Fellow" "citizens" "," "I" "am" "again"
## [7] "called" "upon" "by" "the" "voice" "of"
## [ ... and 135 more ]
##
## 1797 John Adams :
## [1] "When" "it" "was" "first" "perceived" ","
## [7] "in" "early" "times" "," "that" "no"
## [ ... and 2,566 more ]
##
## 1801 Thomas Jefferson :
## [1] "Friends" "and" "Fellow" "Citizens" ":" "Called"
## [7] "upon" "to" "undertake" "the" "duties" "of"
## [ ... and 1,915 more ]
##
## 1805 Thomas Jefferson :
## [1] "Proceeding" "," "fellow" "citizens"
## [5] "," "to" "that" "qualification"
## [9] "which" "the" "Constitution" "requires"
## [ ... and 2,369 more ]
7.4.3.1.2 Without punctuations
dat_inaug_corpus_tok_no_punct <- tokens(dat_inaug_corpus, remove_punct = TRUE)
dat_inaug_corpus_tok_no_punct
## Tokens consisting of 5 documents and 3 docvars.
## 1789 George Washington :
## [1] "Fellow-Citizens" "of" "the" "Senate"
## [5] "and" "of" "the" "House"
## [9] "of" "Representatives" "Among" "the"
## [ ... and 1,418 more ]
##
## 1793 George Washington :
## [1] "Fellow" "citizens" "I" "am" "again" "called"
## [7] "upon" "by" "the" "voice" "of" "my"
## [ ... and 123 more ]
##
## 1797 John Adams :
## [1] "When" "it" "was" "first" "perceived" "in"
## [7] "early" "times" "that" "no" "middle" "course"
## [ ... and 2,306 more ]
##
## 1801 Thomas Jefferson :
## [1] "Friends" "and" "Fellow" "Citizens" "Called" "upon"
## [7] "to" "undertake" "the" "duties" "of" "the"
## [ ... and 1,714 more ]
##
## 1805 Thomas Jefferson :
## [1] "Proceeding" "fellow" "citizens" "to"
## [5] "that" "qualification" "which" "the"
## [9] "Constitution" "requires" "before" "my"
## [ ... and 2,154 more ]
7.4.3.2 Compound words
7.4.3.2.1 kwic
Phrase
dat_inaug_corpus_tok_no_punct_phrase <- kwic(dat_inaug_corpus_tok_no_punct, pattern = phrase("the Constitution"), window = 6)
head(dat_inaug_corpus_tok_no_punct_phrase, 10)
## Keyword-in-context with 10 matches.
## [1789 George Washington, 1023:1024]
## [1793 George Washington, 71:72]
## [1797 John Adams, 465:466]
## [1797 John Adams, 688:689]
## [1797 John Adams, 739:740]
## [1797 John Adams, 1537:1538]
## [1797 John Adams, 2225:2226]
## [1801 Thomas Jefferson, 331:332]
## [1805 Thomas Jefferson, 8:9]
## [1805 Thomas Jefferson, 482:483]
##
## delegated by the fifth article of | the Constitution |
## any official act of the President | the Constitution |
## of these transactions I first saw | the Constitution |
## and the State legislatures according to | the Constitution |
## the most serious obligations to support | the Constitution |
## after truth if an attachment to | the Constitution |
## same American people pledged to support | the Constitution |
## announced according to the rules of | the Constitution |
## fellow citizens to that qualification which | the Constitution |
## States and a corresponding amendment of | the Constitution |
##
## is rendered expedient at the present
## requires an oath of office This
## of the United States in a
## itself adopt and ordain Returning to
## The operation of it has equaled
## of the United States and a
## of the United States I entertain
## all will of course arrange themselves
## requires before my entrance on the
## be applied in time of peace
7.4.3.2.2 Compounds
dat_inaug_corpus_tok_no_punct_comp <- tokens_compound(dat_inaug_corpus_tok_no_punct, pattern = phrase("the Constitution"))
dat_inaug_corpus_tok_no_punct_comp_kwic <- kwic(dat_inaug_corpus_tok_no_punct_comp, pattern = phrase("the_Constitution"))
head(dat_inaug_corpus_tok_no_punct_comp_kwic, 10)
## Keyword-in-context with 10 matches.
## [1789 George Washington, 1023] by the fifth article of |
## [1793 George Washington, 71] official act of the President |
## [1797 John Adams, 465] these transactions I first saw |
## [1797 John Adams, 687] the State legislatures according to |
## [1797 John Adams, 737] most serious obligations to support |
## [1797 John Adams, 1534] truth if an attachment to |
## [1797 John Adams, 2221] American people pledged to support |
## [1801 Thomas Jefferson, 331] according to the rules of |
## [1805 Thomas Jefferson, 8] citizens to that qualification which |
## [1805 Thomas Jefferson, 481] and a corresponding amendment of |
##
## the_Constitution | is rendered expedient at the
## the_Constitution | requires an oath of office
## the_Constitution | of the United States in
## the_Constitution | itself adopt and ordain Returning
## the_Constitution | The operation of it has
## the_Constitution | of the United States and
## the_Constitution | of the United States I
## the_Constitution | all will of course arrange
## the_Constitution | requires before my entrance on
## the_Constitution | be applied in time of
7.4.3.3 N-grams
N-grams are a subfamily of compound words. They can be named as “bi-grams”, “tri-grams”, etc. N-grams yield a sequence of tokens from already tokenised text object.
7.4.3.3.1 Multi-grams
The code below allows to obtain the sequences of consecutive compound words, with 2, 3 or 4 compound words.
dat_inaug_corpus_tok_no_punct_ngram <- tokens_ngrams(dat_inaug_corpus_tok_no_punct, n = 2:4) %>%
unlist() %>%
tolower() %>%
table()
## Top 10 rows
head(dat_inaug_corpus_tok_no_punct_ngram, 10)
## .
## 14th_day 14th_day_of
## 1 1
## 14th_day_of_the a_benevolent
## 1 1
## a_benevolent_human a_benevolent_human_mind
## 1 1
## a_benign a_benign_religion
## 1 1
## a_benign_religion_professed a_bulwark
## 1 1
## .
## zeal_and_purity zeal_and_purity_and zeal_and_wisdom zeal_and_wisdom_of
## 1 1 1 1
## zeal_on zeal_on_which zeal_on_which_to zeal_with
## 1 1 1 1
## zeal_with_which zeal_with_which_it
## 1 1
7.4.3.3.2 Skip-grams
Skip-grams allow to obtain non consecutive n-grams
dat_inaug_corpus_tok_no_punct_ngram_skip <- tokens_ngrams(dat_inaug_corpus_tok_no_punct, n = 2:4, skip = 1:2) %>%
unlist() %>%
tolower() %>%
table()
## Top 10 rows
head(dat_inaug_corpus_tok_no_punct_ngram_skip, 10)
## .
## 14th_of 14th_of_month 14th_of_month_one 14th_of_month_the
## 1 1 1 1
## 14th_of_present 14th_of_present_on 14th_of_present_the 14th_the
## 1 1 1 1
## 14th_the_month 14th_the_month_one
## 1 1
## .
## zeal_which_under zeal_which_under_difficulties
## 1 1
## zeal_which_under_to zeal_wisdom
## 1 1
## zeal_wisdom_characters zeal_wisdom_characters_selected
## 1 1
## zeal_wisdom_characters_who zeal_wisdom_the
## 1 1
## zeal_wisdom_the_selected zeal_wisdom_the_thus
## 1 1
7.4.3.4 Dictionary
If you have a dictionary with various words that fall within a generic word (e.g., variants of pronunciation of a word), then you can look these up. Here, we will create a dictionary that we populate ourselves and we show how to use it to search for items
7.4.3.4.1 Create dictionary
dict_dat_inaug <- dictionary(list(Population = c("Citizens*", "people"),
upper_house = c("Representatives", "senat")))
print(dict_dat_inaug)
## Dictionary object with 2 key entries.
## - [Population]:
## - citizens*, people
## - [upper_house]:
## - representatives, senat
7.4.3.4.2 Token lookup
dat_inaug_corpus_tok_no_punct_dict_toks <- tokens_lookup(dat_inaug_corpus_tok_no_punct, dictionary = dict_dat_inaug)
print(dat_inaug_corpus_tok_no_punct_dict_toks)
## Tokens consisting of 5 documents and 3 docvars.
## 1789 George Washington :
## [1] "upper_house" "Population" "Population" "Population" "Population"
## [6] "Population" "Population" "Population" "upper_house" "Population"
##
## 1793 George Washington :
## [1] "Population" "Population"
##
## 1797 John Adams :
## [1] "Population" "upper_house" "Population" "Population" "Population"
## [6] "Population" "upper_house" "Population" "Population" "Population"
## [11] "Population" "Population"
## [ ... and 12 more ]
##
## 1801 Thomas Jefferson :
## [1] "Population" "Population" "Population" "Population" "Population"
## [6] "Population" "Population"
##
## 1805 Thomas Jefferson :
## [1] "Population" "Population" "Population" "Population" "Population"
## [6] "Population" "Population" "Population" "Population" "Population"
7.4.3.5 Part of Speech tagging
Part-of-Speech tagging (or PoS-Tagging) is used to distinguish different part of speech, e.g., the sentence: “Jane likes the girl” can be tagged as “Jane/NNP likes/VBZ the/DT girl/NN”, where NNP = proper noun (singular), VBZ = 3rd person singular present tense verb, DT = determiner, and NN = noun (singular or mass). We will use the udpipe package
7.4.3.5.1 Download and load language model
Before using the PoS-tagger, we need to download a language model.
As you can see from typing ?udpipe_download_model
, there are 65 languages trained on 101 treebanks from here
file_to_check <- "models/english-ewt-ud-2.5-191206.udpipe"
if (file.exists(file = file_to_check)){
m_english <- udpipe_load_model(file = "models/english-ewt-ud-2.5-191206.udpipe")
}else{
m_english <- udpipe_download_model(model_dir = "models/", language = "english-ewt")
m_english <- udpipe_load_model(file = "models/english-ewt-ud-2.5-191206.udpipe")
}
7.4.3.5.2 Tokenise, tag, dependency parsing
We use the already tokenised text, with no punctuations.
dat_inaug_anndf <- udpipe_annotate(m_english, x = dat_inaug_corpus_tok_no_punct[[1]]) %>%
as.data.frame()
## inspect
head(dat_inaug_anndf, 10)
## doc_id paragraph_id sentence_id sentence token_id token lemma
## 1 doc1 1 1 Fellow-Citizens 1 Fellow fellow
## 2 doc1 1 1 Fellow-Citizens 2 - -
## 3 doc1 1 1 Fellow-Citizens 3 Citizens citizen
## 4 doc2 1 1 of 1 of of
## 5 doc3 1 1 the 1 the the
## 6 doc4 1 1 Senate 1 Senate Senate
## 7 doc5 1 1 and 1 and and
## 8 doc6 1 1 of 1 of of
## 9 doc7 1 1 the 1 the the
## 10 doc8 1 1 House 1 House House
## upos xpos feats head_token_id dep_rel deps
## 1 ADJ JJ Degree=Pos 3 amod <NA>
## 2 PUNCT HYPH <NA> 3 punct <NA>
## 3 NOUN NNS Number=Plur 0 root <NA>
## 4 ADP IN <NA> 0 root <NA>
## 5 DET DT Definite=Def|PronType=Art 0 root <NA>
## 6 PROPN NNP Number=Sing 0 root <NA>
## 7 CCONJ CC <NA> 0 root <NA>
## 8 ADP IN <NA> 0 root <NA>
## 9 DET DT Definite=Def|PronType=Art 0 root <NA>
## 10 PROPN NNP Number=Sing 0 root <NA>
## misc
## 1 SpaceAfter=No
## 2 SpaceAfter=No
## 3 SpacesAfter=\\n
## 4 SpacesAfter=\\n
## 5 SpacesAfter=\\n
## 6 SpacesAfter=\\n
## 7 SpacesAfter=\\n
## 8 SpacesAfter=\\n
## 9 SpacesAfter=\\n
## 10 SpacesAfter=\\n
7.4.3.5.3 Dependency parsing
## parse text
dat_inaug_corpus_sent <- udpipe_annotate(m_english, x = dat_inaug_corpus[[1]]) %>%
as.data.frame()
## inspect
head(dat_inaug_corpus_sent)
## doc_id paragraph_id sentence_id
## 1 doc1 1 1
## 2 doc1 1 1
## 3 doc1 1 1
## 4 doc1 1 1
## 5 doc1 1 1
## 6 doc1 1 1
## sentence token_id
## 1 Fellow-Citizens of the Senate and of the House of Representatives: 1
## 2 Fellow-Citizens of the Senate and of the House of Representatives: 2
## 3 Fellow-Citizens of the Senate and of the House of Representatives: 3
## 4 Fellow-Citizens of the Senate and of the House of Representatives: 4
## 5 Fellow-Citizens of the Senate and of the House of Representatives: 5
## 6 Fellow-Citizens of the Senate and of the House of Representatives: 6
## token lemma upos xpos feats head_token_id dep_rel
## 1 Fellow fellow ADJ JJ Degree=Pos 3 amod
## 2 - - PUNCT HYPH <NA> 3 punct
## 3 Citizens citizen NOUN NNS Number=Plur 0 root
## 4 of of ADP IN <NA> 6 case
## 5 the the DET DT Definite=Def|PronType=Art 6 det
## 6 Senate Senate PROPN NNP Number=Sing 3 nmod
## deps misc
## 1 <NA> SpaceAfter=No
## 2 <NA> SpaceAfter=No
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
dat_inaug_corpus_sent_dplot <- textplot_dependencyparser(dat_inaug_corpus_sent, size = 3)
## show plot
dat_inaug_corpus_sent_dplot
7.4.3.6 Feature co-occurrence matrix (FCM)
Feature co-occurrence matrix (FCM) records the number of co-occurrences of tokens
7.4.3.6.1 Computing number of co-occurrences
dat_inaug_corpus_dfmat <- dfm(dat_inaug_corpus_tok_no_punct)
dat_inaug_corpus_dfmat_trim <- dfm_trim(dat_inaug_corpus_dfmat, min_termfreq = 50)
topfeatures_dat_inaug_corpus <- topfeatures(dat_inaug_corpus_dfmat_trim)
topfeatures_dat_inaug_corpus
## the of and to in a which that be by
## 565 427 354 269 140 106 105 102 93 90
## [1] 22
7.4.3.6.2 Features co-occurrences
## Feature co-occurrence matrix of: 22 by 22 features.
## features
## features of the and to have with that which by on
## of 22676 59162 39447 28270 5307 8355 10602 10290 9528 5314
## the 0 38489 50613 37168 7278 11184 14098 13988 12462 7283
## and 0 0 17032 24334 4530 7120 9111 8494 8206 4540
## to 0 0 0 8927 3687 5512 6988 6607 5932 3610
## have 0 0 0 0 408 1188 1499 1375 1140 824
## with 0 0 0 0 0 824 2174 2076 1756 1143
## that 0 0 0 0 0 0 1326 2540 2220 1435
## which 0 0 0 0 0 0 0 1373 2228 1398
## by 0 0 0 0 0 0 0 0 977 1174
## on 0 0 0 0 0 0 0 0 0 381
## [ reached max_nfeat ... 12 more features, reached max_nfeat ... 12 more features ]