| Title: | Interface to the 'Perspective' API |
|---|---|
| Description: | Interface to the 'Perspective' API, which can be found at the following URL: <https://github.com/conversationai/perspectiveapi#perspective-comment-analyzer-api>. The 'Perspective' API uses machine learning models to score the perceived impact a comment might have on a conversation (i.e. TOXICITY, INFLAMMATORY, etc.). 'peRspective' provides access to the API and returns tidy data frames with results of the specified machine learning model(s). |
| Authors: | Fabio Votta [aut, cre] |
| Maintainer: | Fabio Votta <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-06-07 09:29:06 UTC |
| Source: | https://github.com/favstats/peRspective |
Provides access to the Perspective API (http://www.perspectiveapi.com/). Perspective is an API that uses machine learning models to score the perceived impact a comment might have on a conversation.
peRspective provides access to the API using the R programming language.
For an excellent documentation of the Perspective API see here.
Follow these steps as outlined by the Perspective API to get an API key.
peRspective functions will read the API key from
environment variable perspective_api_key.
You can specify it like this at the start of your script:
Sys.setenv(perspective_api_key = "**********")
To start R session with the
initialized environment variable create an .Renviron file in your R home
with a line like this:
perspective_api_key = "**********"
To check where your R home is, try normalizePath("~").
You can check your quota limits by going to your google cloud project's Perspective API page, and check your projects quota usage at the cloud console quota usage page.
The maximum text size per request is 3000 bytes.
The following production-ready models are recommended for use. They have been tested across multiple domains and trained on hundreds of thousands of comments tagged by thousands of human moderators. These are available in English (en), Spanish, (es), French (fr), German (de), Portuguese (pt), Italian (it), Russian (ru).
TOXICITY: rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion. This model is a Convolutional Neural Network (CNN) trained with word-vector inputs.
SEVERE_TOXICITY: This model uses the same deep-CNN algorithm as the TOXICITY model, but is trained to recognize examples that were considered to be 'very toxic' by crowdworkers. This makes it much less sensitive to comments that include positive uses of curse-words for example. A labelled dataset and details of the methodology can be found in the same toxicity dataset that is available for the toxicity model.
The following experimental models give more fine-grained classifications than overall toxicity. They were trained on a relatively smaller amount of data compared to the primary toxicity models above and have not been tested as thoroughly.
IDENTITY_ATTACK: negative or hateful comments targeting someone because of their identity.
INSULT: insulting, inflammatory, or negative comment towards a person or a group of people.
PROFANITY: swear words, curse words, or other obscene or profane language.
THREAT: describes an intention to inflict pain, injury, or violence against an individual or group.
SEXUALLY_EXPLICIT: contains references to sexual acts, body parts, or other lewd content.
FLIRTATION: pickup lines, complimenting appearance, subtle sexual innuendos, etc.
For more details on how these were trained, see the Toxicity and sub-attribute annotation guidelines.
The following experimental models were trained on New York Times data tagged by their moderation team.
ATTACK_ON_AUTHOR: Attack on the author of an article or post.
ATTACK_ON_COMMENTER: Attack on fellow commenter.
INCOHERENT: Difficult to understand, nonsensical.
INFLAMMATORY: Intending to provoke or inflame.
LIKELY_TO_REJECT: Overall measure of the likelihood for the comment to be rejected according to the NYT's moderation.
OBSCENE: Obscene or vulgar language such as cursing.
SPAM: Irrelevant and unsolicited commercial content.
UNSUBSTANTIAL: Trivial or short comments.
Analyzing toxic comments can be disheartening sometimes. Feel free to look at this picture of cute kittens whenever you need to:
This is a helper function that will write a dataframe to a SQL database
db_append(path, tbl, data)db_append(path, tbl, data)
path |
path to SQL database |
tbl |
name of the table in SQL database |
data |
the object dataframe that goes into the SQL database |
This is a helper function that will retreive a dataframe to a SQL database
db_get_data(tbl_dat, path = "sql_data/omdata.db")db_get_data(tbl_dat, path = "sql_data/omdata.db")
tbl_dat |
which table from the SQL database do you want to retrieve |
path |
path to database |
This is a helper function that will remove a dataframe from a SQL database
db_remove(path, datasets = NULL, remove_cleaned_data = T)db_remove(path, datasets = NULL, remove_cleaned_data = T)
path |
path to database |
datasets |
which table from the SQL database do you want to remove |
remove_cleaned_data |
boolean remove all datasets that are created through the cleaning script |
For more details see ?peRspective or Perspective API documentation
form_request(score_model, text, score_sentences, languages, doNotStore = F)form_request(score_model, text, score_sentences, languages, doNotStore = F)
score_model |
Specify what model do you want to use (for example |
text |
a character string. |
score_sentences |
A boolean value that indicates if the request should return spans that describe the scores for each part of the text (currently done at per sentence level). Defaults to |
languages |
A vector of ISO 631-1 two-letter language codes specifying the language(s) that comment is in (for example, "en", "es", "fr", "de", etc). If unspecified, the API will autodetect the comment language. If language detection fails, the API returns an error. |
doNotStore |
Whether the API is permitted to store comment from this request. Stored comments will be used for future research and community model building purposes to improve the API over time. Perspective API also plans to provide dashboards and automated analysis of the comments submitted, which will apply only to those stored. Defaults to |
a tibble
Print a beautiful message in the console
msg(type, type_style = crayon::make_style("red4"), msg)msg(type, type_style = crayon::make_style("red4"), msg)
type |
what message should be displayed in the beginning |
type_style |
crayon color or style |
msg |
what message should be printed |
## Send a message to the world msg("MESSAGE", crayon::make_style('blue4'), "This is a message to the world")## Send a message to the world msg("MESSAGE", crayon::make_style('blue4'), "This is a message to the world")
Check if API key is present
perspective_api_key(test = F)perspective_api_key(test = F)
test |
necessary when in a test environment. Defaults to |
Provide iterator number and total length of items to be iterated over
print_progress(x, total, print_prct = F)print_progress(x, total, print_prct = F)
x |
iterator number. |
total |
length of items to be iterated over. |
print_prct |
only print percentage progress (defaults to |
a chr
## Print progress (1 out of 100) print_progress(1, 100) ## Only print percentage print_progress(1, 100, print_prct = TRUE)## Print progress (1 out of 100) print_progress(1, 100) ## Only print percentage print_progress(1, 100, print_prct = TRUE)
All valid experimental Perspective API models
All valid (non-experimental) Perspective API models
Provide a character string with your text, your API key and what scores you want to obtain.
prsp_score( text, text_id = NULL, languages = NULL, score_sentences = F, score_model, sleep = 1, doNotStore = F, key = NULL )prsp_score( text, text_id = NULL, languages = NULL, score_sentences = F, score_model, sleep = 1, doNotStore = F, key = NULL )
text |
a character string. |
text_id |
a unique ID for the text that you supply (required). |
languages |
A vector of ISO 631-1 two-letter language codes specifying the language(s) that comment is in (for example, "en", "es", "fr", "de", etc). If unspecified, the API will autodetect the comment language. If language detection fails, the API returns an error. |
score_sentences |
A boolean value that indicates if the request should return spans that describe the scores for each part of the text (currently done at per sentence level). Defaults to |
score_model |
Specify what model do you want to use (for example |
sleep |
how long should |
doNotStore |
Whether the API is permitted to store comment from this request. Stored comments will be used for future research and community model building purposes to improve the API over time. Perspective API also plans to provide dashboards and automated analysis of the comments submitted, which will apply only to those stored. Defaults to |
key |
Your API key (see here to set up an API key). |
For more details see ?peRspective or Perspective API documentation
a tibble
## Not run: ## GET TOXICITY SCORES for a comment prsp_score("Hello, I am a test comment!", score_model = "TOXICITY") ## GET TOXICITY and SEVERE_TOXICITY Scores for a comment prsp_score("Hello, I am a test comment!", score_model = c("TOXICITY", "SEVERE_TOXICITY")) ## GET TOXICITY and SEVERE_TOXICITY Scores for each sentence of a comment prsp_score("Hello, I am a test comment! I am a second sentence and I will (hopefully) be scored seperately", score_model = c("TOXICITY", "SEVERE_TOXICITY"), score_sentences = T) ## End(Not run)## Not run: ## GET TOXICITY SCORES for a comment prsp_score("Hello, I am a test comment!", score_model = "TOXICITY") ## GET TOXICITY and SEVERE_TOXICITY Scores for a comment prsp_score("Hello, I am a test comment!", score_model = c("TOXICITY", "SEVERE_TOXICITY")) ## GET TOXICITY and SEVERE_TOXICITY Scores for each sentence of a comment prsp_score("Hello, I am a test comment! I am a second sentence and I will (hopefully) be scored seperately", score_model = c("TOXICITY", "SEVERE_TOXICITY"), score_sentences = T) ## End(Not run)
This function wraps prsp_score and loops over your text input. Provide a character string with your text and which scores you want to obtain. Make sure to keep track of your ratelimit with on the cloud console quota usage page.
prsp_stream( .data, text = NULL, text_id = NULL, ..., safe_output = F, verbose = F, save = F, dt_name = "persp" )prsp_stream( .data, text = NULL, text_id = NULL, ..., safe_output = F, verbose = F, save = F, dt_name = "persp" )
.data |
a dataset with a text and text_id column. |
text |
a character vector with text you want to score. |
text_id |
a unique ID for the text that you supply (required) |
... |
arguments passed to |
safe_output |
wraps the function into a |
verbose |
narrates the streaming procedure (defaults to |
save |
NOT USABLE YET saves data into SQLite database (defaults to |
dt_name |
NOT USABLE YET what is the name of the dataset? (defaults to |
For more details see ?peRspective or Perspective API documentation
a tibble
## Not run: ## Create a mock tibble text_sample <- tibble( ctext = c("You wrote this? Wow. This is dumb and childish, please go f**** yourself.", "I don't know what to say about this but it's not good. The commenter is just an idiot", "This goes even further!", "What the hell is going on?", "Please. I don't get it. Explain it again", "Annoying and irrelevant! I'd rather watch the paint drying on the wall!"), textid = c("#efdcxct", "#ehfcsct", "#ekacxwt", "#ewatxad", "#ekacswt", "#ewftxwd") ) ## GET TOXICITY and SEVERE_TOXICITY Scores for a dataset with a text column text_sample %>% prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY")) ## Safe Output argument means will not stop on error prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY"), safe_output = T) ## verbose = T means you get pretty narration of your scoring procedure prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY"), safe_output = T, verbose = T) ## End(Not run)## Not run: ## Create a mock tibble text_sample <- tibble( ctext = c("You wrote this? Wow. This is dumb and childish, please go f**** yourself.", "I don't know what to say about this but it's not good. The commenter is just an idiot", "This goes even further!", "What the hell is going on?", "Please. I don't get it. Explain it again", "Annoying and irrelevant! I'd rather watch the paint drying on the wall!"), textid = c("#efdcxct", "#ehfcsct", "#ekacxwt", "#ewatxad", "#ekacswt", "#ewftxwd") ) ## GET TOXICITY and SEVERE_TOXICITY Scores for a dataset with a text column text_sample %>% prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY")) ## Safe Output argument means will not stop on error prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY"), safe_output = T) ## verbose = T means you get pretty narration of your scoring procedure prsp_stream(text = ctext, text_id = textid, score_model = c("TOXICITY", "SEVERE_TOXICITY"), safe_output = T, verbose = T) ## End(Not run)
Specify a decimal
specify_decimal(x, k)specify_decimal(x, k)
x |
a number to be rounded |
k |
round to which position after the comma |
## specify 2 decimals of a number specify_decimal(1.0434, 2)## specify 2 decimals of a number specify_decimal(1.0434, 2)
For more details see ?peRspective or Perspective API documentation
unnest_scores(Output, score_model, score_sentences, text)unnest_scores(Output, score_model, score_sentences, text)
Output |
comes out of the |
score_model |
Specify what model do you want to use (for example |
score_sentences |
A boolean value that indicates if the request should return spans that describe the scores for each part of the text (currently done at per sentence level). Defaults to |
text |
a character string. |
a tibble