| Title: | Retrieve and Parse Meta Ad Targeting Data |
|---|---|
| Description: | The metatargetr package provides tools for retrieving, parsing, and analyzing advertising targeting data from Meta’s Ad Library and Audience Tab. |
| Authors: | Fabio Votta [aut, cre] (ORCID: YOUR-ORCID-ID), Philipp Mendoza [aut] (ORCID: YOUR-ORCID-ID) |
| Maintainer: | Fabio Votta <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.10 |
| Built: | 2026-05-28 07:04:33 UTC |
| Source: | https://github.com/favstats/metatargetr |
This function takes a single dataframe, assumed to be the result of
bind_rows() on multiple targeting datasets from different time periods.
It correctly aggregates the spending for each unique targeting criterion
and calculates the new totals and percentages based on the combined data.
aggr_targeting(combined_df)aggr_targeting(combined_df)
combined_df |
A single dataframe that has already been combined from
multiple sources (e.g., via |
filter_disclaimer |
An optional character vector of disclaimers or page names to filter the dataset before aggregation. If NULL (default), all data is used. |
A single, aggregated tibble where each row represents a unique targeting criterion for an advertiser across the combined period.
This function safely closes the browser instance associated with the provided browser object. It's designed to handle cases where the browser may have already been closed, preventing errors.
browser_close(browser_df)browser_close(browser_df)
browser_df |
A tibble returned by |
Invisibly returns NULL. This function is called for its side effect
of closing the browser.
## Not run: # Launch a browser browser <- browser_launch() # ... perform actions with the browser ... # Close the browser instance when done browser_close(browser) ## End(Not run)## Not run: # Launch a browser browser <- browser_launch() # ... perform actions with the browser ... # Close the browser instance when done browser_close(browser) ## End(Not run)
Verifies both the R-side flag and that Chrome is actually responsive by sending a lightweight evaluation to the browser. If Chrome has crashed or the websocket connection has been lost, the stale session is automatically cleaned up and FALSE is returned.
browser_session_active()browser_session_active()
Logical, TRUE if a session is active and responsive.
## Not run: browser_session_active() # FALSE browser_session_start() browser_session_active() # TRUE browser_session_close() browser_session_active() # FALSE ## End(Not run)## Not run: browser_session_active() # FALSE browser_session_start() browser_session_active() # TRUE browser_session_close() browser_session_active() # FALSE ## End(Not run)
Closes the browser session started by browser_session_start().
After calling this, each function call will start its own browser again.
browser_session_close()browser_session_close()
Invisibly returns TRUE on success.
## Not run: browser_session_start() # ... do work ... browser_session_close() ## End(Not run)## Not run: browser_session_start() # ... do work ... browser_session_close() ## End(Not run)
Closes the current session (if any) and starts a fresh one with warm-up. Useful when Chrome has become unresponsive or after errors mid-batch.
browser_session_restart(warm_up = TRUE, warm_up_wait = 8)browser_session_restart(warm_up = TRUE, warm_up_wait = 8)
warm_up |
Logical. If TRUE (default), navigates to the Facebook Ad Library landing page on startup to pass the JS challenge and set cookies. |
warm_up_wait |
Seconds to wait during warm-up (default 8). |
Invisibly returns TRUE on success.
## Not run: browser_session_start() # ... Chrome becomes unresponsive ... browser_session_restart() # ... continue working ... browser_session_close() ## End(Not run)## Not run: browser_session_start() # ... Chrome becomes unresponsive ... browser_session_restart() # ... continue working ... browser_session_close() ## End(Not run)
Starts a headless Chrome browser session that will be reused by
get_ad_snapshots(), get_deeplink(), and get_ad_html() until
browser_session_close() is called.
browser_session_start(warm_up = TRUE, warm_up_wait = 8)browser_session_start(warm_up = TRUE, warm_up_wait = 8)
warm_up |
Logical. If TRUE (default), navigates to the Facebook Ad Library landing page on startup to pass the JS challenge and set cookies. |
warm_up_wait |
Seconds to wait during warm-up (default 8). |
This significantly improves performance when processing multiple ads, as browser startup (~2–3 seconds) only happens once. By default the session is warmed up by navigating to the Facebook Ad Library landing page, which passes the JS challenge and sets cookies so that subsequent calls return data immediately.
Invisibly returns TRUE on success.
## Not run: # Start session (includes warm-up) browser_session_start() # Process multiple ads (each reuses the session) results <- map_dfr_progress(ad_ids, ~get_ad_snapshots(.x)) # Close when done browser_session_close() ## End(Not run)## Not run: # Start session (includes warm-up) browser_session_start() # Process multiple ads (each reuses the session) results <- map_dfr_progress(ad_ids, ~get_ad_snapshots(.x)) # Close when done browser_session_close() ## End(Not run)
Check hash of a media file
check_hash(.x, hash_table, mediadir)check_hash(.x, hash_table, mediadir)
.x |
Path to the media file. |
hash_table |
A data frame of existing hashes. |
mediadir |
Directory to save media files. |
A tibble with file details.
Create necessary directories
create_necessary_dirs(x)create_necessary_dirs(x)
x |
Directory path to check and create. |
None (used for side effects).
Facebook's SSR script tags contain a bundle of many ads' snapshot data.
This function locates the "snapshot": occurrence that belongs to the
requested ad_id by first finding the ad_id in the raw text and then
extracting the nearest "snapshot": JSON object after it.
detectmysnap(rawhtmlascharacter, ad_id = NULL)detectmysnap(rawhtmlascharacter, ad_id = NULL)
rawhtmlascharacter |
Raw HTML content as character string. |
ad_id |
Character string. The Facebook ad ID to find in the bundle. When provided, the function extracts only the snapshot belonging to this ad. When NULL, extracts the first snapshot found (legacy behavior). |
Falls back to the original split-based approach when no ad_id is provided.
A parsed JSON object (list).
get_ad_snapshots() which calls this function internally.
Function to detect the JSON code on facebook ad websites that contains the media URLs This is basically str_extract but with perl!
detectmysnap_dep(rawhtmlascharacter)detectmysnap_dep(rawhtmlascharacter)
rawhtmlascharacter |
Raw HTML content as character string. |
A parsed JSON object.
Download media files and potentially hash them
download_media(media_dat, mediadir = "data/media", hashing = T)download_media(media_dat, mediadir = "data/media", hashing = T)
media_dat |
Data containing media URLs. |
mediadir |
Directory to save media files. |
hashing |
Logical, whether to hash the files. |
None (used for side effects).
Download media files with specified IDs
download_media_int(id, x, n, mediadir = "data/media")download_media_int(id, x, n, mediadir = "data/media")
id |
Ad ID. |
x |
Media URLs to download. |
n |
Number of URLs. |
mediadir |
Directory to save media files. |
A character vector with file paths.
Extract media URLs from data
extract_media_urls(yo)extract_media_urls(yo)
yo |
Data containing potential media URLs. |
A character vector of media URLs.
Find an object in a nested list by name
find_name(haystack, needle)find_name(haystack, needle)
haystack |
A nested list. |
needle |
Name of the object to find. |
Object in the nested list with the given name or NULL if not found.
Parse JSON-formatted strings into named character vectors
fix_json(include)fix_json(include)
include |
A character vector of JSON-formatted strings |
A list of named character vectors, where each vector represents a parsed JSON object
## Not run: # Parse an example character vector example_json <- c("{\"city\":\"Berlin\",\"zip_code\":\"12345\"}", "{\"city\":\"Munich\",\"zip_code\":\"67890\"}") parsed_json <- fix_json(example_json) # Check the resulting list of named character vectors parsed_json ## End(Not run)## Not run: # Parse an example character vector example_json <- c("{\"city\":\"Berlin\",\"zip_code\":\"12345\"}", "{\"city\":\"Munich\",\"zip_code\":\"67890\"}") parsed_json <- fix_json(example_json) # Check the resulting list of named character vectors parsed_json ## End(Not run)
Retrieves HTML content for Facebook Ad Library pages using headless Chrome to bypass JavaScript-based bot detection. Results are cached to disk.
get_ad_html( ad_ids, country, cache_dir = NULL, overwrite = FALSE, strip_css = TRUE, wait_sec = 3, log_failed_ids = NULL, quiet = FALSE, return_type = c("paths", "list") )get_ad_html( ad_ids, country, cache_dir = NULL, overwrite = FALSE, strip_css = TRUE, wait_sec = 3, log_failed_ids = NULL, quiet = FALSE, return_type = c("paths", "list") )
ad_ids |
Character vector of Ad-Library IDs. |
country |
Two-letter country code. |
cache_dir |
Directory where .html.gz files will be stored. Defaults to the value set during interactive setup, or "html_cache". |
overwrite |
If FALSE (default) keep already-cached files. |
strip_css |
Run fast regex-based CSS removal on downloaded pages. |
wait_sec |
Seconds to wait for each page to load (default 3). |
log_failed_ids |
If a character path is provided (e.g., "log.txt"), failed IDs will be appended to that file. |
quiet |
Suppress progress messages. |
return_type |
|
Either a named character vector of file paths or a named list of
HTML strings, in the same order as ad_ids.
Translates Facebook page handles (or profile URLs) into the numeric
Ad Library page identifier used as view_all_page_id in
https://www.facebook.com/ads/library/ URLs.
get_ad_library_page_id( handles, wait_sec = 8, max_retries = 1, country = "ALL", quiet = FALSE )get_ad_library_page_id( handles, wait_sec = 8, max_retries = 1, country = "ALL", quiet = FALSE )
handles |
Character vector of Facebook handles, profile URLs, profile transparency URLs, Ad Library URLs, or numeric page IDs. |
wait_sec |
Numeric. Seconds to wait after each page navigation
before extracting content. Default is |
max_retries |
Integer. Number of retries per handle when extraction
fails. Default is |
country |
Character. Country code used when constructing
|
quiet |
Logical. If |
The function loads each page's Profile Transparency tab in a headless
browser and extracts the Page ID from rendered text.
A tibble with one row per input and columns:
Original input value.
Normalized handle or numeric ID extracted from input.
Resolved Ad Library page ID (same as view_all_page_id).
Alias of page_id for clarity.
Logical/NA: whether transparency text indicates
the page is currently running ads.
Profile transparency URL used for extraction.
Convenience Ad Library page URL for the resolved ID.
Logical success flag for each input.
Error message when resolution fails.
# Numeric IDs are returned directly (no browser needed) get_ad_library_page_id(c("121264564551002", "106359662726593")) ## Not run: # Resolve from handles / URLs get_ad_library_page_id(c("VVD", "https://www.facebook.com/TeachGoldenApple/")) ## End(Not run)# Numeric IDs are returned directly (no browser needed) get_ad_library_page_id(c("121264564551002", "106359662726593")) ## Not run: # Resolve from handles / URLs get_ad_library_page_id(c("VVD", "https://www.facebook.com/TeachGoldenApple/")) ## End(Not run)
Automates downloading Facebook Ad Library reports for vectors of countries, timeframes, and dates. It uses a robust tryCatch block for each request to ensure cleanup and prevent hanging processes.
get_ad_report(country, timeframe, date)get_ad_report(country, timeframe, date)
country |
A character vector of two-letter ISO country codes. |
timeframe |
A character vector of time windows (e.g., "last_7_days"). |
date |
A character vector of report dates in "YYYY-MM-DD" format. |
A single tibble containing the combined data for all successful requests.
Retrieves snapshot data (images, videos, cards, body text, etc.) for a Facebook ad from the Ad Library. Uses headless Chrome via chromote to bypass Facebook's JavaScript-based bot detection.
get_ad_snapshots( ad_id, download = FALSE, mediadir = "data/media", hashing = FALSE, wait_sec = 6, max_retries = 1 )get_ad_snapshots( ad_id, download = FALSE, mediadir = "data/media", hashing = FALSE, wait_sec = 6, max_retries = 1 )
ad_id |
Character string. The Facebook ad ID. |
download |
Logical. If TRUE, download media files to |
mediadir |
Character. Directory to save downloaded media files. |
hashing |
Logical. If TRUE, hash downloaded files for deduplication. Recommended for large-scale data collection. |
wait_sec |
Numeric. Seconds to wait for the page to load (default 6). Increase if you are getting empty results. |
max_retries |
Integer. Number of retry attempts if data is not found on the first try (default 1). Each retry waits progressively longer. |
For best results when processing multiple ads, call
browser_session_start() before and browser_session_close() after.
If no persistent session exists, a temporary one is created and warmed
up automatically (adds ~10 seconds on the first call).
Includes built-in retry logic: if the page loads but snapshot data is not yet available (e.g., JS challenge still completing), the function retries with a longer wait before giving up.
A tibble with one row containing ad snapshot fields (images, videos, body, title, display_format, page_name, etc.) and the ad ID. If retrieval fails, returns a single-column tibble with just the id.
## Not run: # Single ad snap <- get_ad_snapshots("1536277920797773") # Batch processing (recommended) browser_session_start() results <- map_dfr_progress(ad_ids, ~get_ad_snapshots(.x)) browser_session_close() ## End(Not run)## Not run: # Single ad snap <- get_ad_snapshots("1536277920797773") # Batch processing (recommended) browser_session_start() results <- map_dfr_progress(ad_ids, ~get_ad_snapshots(.x)) browser_session_close() ## End(Not run)
Downloads the historical Facebook or Instagram page info dataset for a given ISO2C country code.
The data is retrieved from a fixed GitHub release URL in .parquet format. It includes information on:
Page-level metadata (e.g., name, verification status, profile type)
Audience metrics (e.g., number of likes, Instagram followers)
Shared disclaimers (if applicable)
Page creation and name change events with timestamps
Contact and address information (if available)
Free-text descriptions ("about" section)
get_additional_page_info_db(iso2c, verbose = TRUE)get_additional_page_info_db(iso2c, verbose = TRUE)
iso2c |
A string specifying the ISO-3166-1 alpha-2 country code (e.g., "DE", "FR", "US"). |
verbose |
Logical. If TRUE (default), prints a status message when downloading. |
A tibble containing Facebook page info for the specified country.
If the dataset is not available or cannot be retrieved, a tibble with no_data = TRUE
and the given iso2c code is returned.
## Not run: de_info <- get_page_info_db("DE") fr_info <- get_page_info_db("FR") ## End(Not run)## Not run: de_info <- get_page_info_db("DE") fr_info <- get_page_info_db("FR") ## End(Not run)
A wrapper function that downloads ad HTMLs for a given set of IDs and a country, parses the data, and returns a final, reordered dataframe.
get_ads_info(ad_ids, country, keep_html = TRUE, cache_dir = "html_cache", ...)get_ads_info(ad_ids, country, keep_html = TRUE, cache_dir = "html_cache", ...)
ad_ids |
A character vector of Ad Library IDs. |
country |
A two-letter country code. |
keep_html |
A logical flag. If |
cache_dir |
The directory to store downloaded HTML files. Defaults to "html_cache". |
... |
Additional arguments to be passed down to |
A single, reordered dataframe containing the parsed ad data.
This function programmatically retrieves the embedded JSON object labeled deeplinkAdCard
from the source code of a Facebook Ad Library ad page.
get_deeplink(ad_id, wait_sec = 4)get_deeplink(ad_id, wait_sec = 4)
ad_id |
Character string specifying the Facebook ad ID (as shown in the Ad Library URL). |
wait_sec |
Seconds to wait for page to load (default 4). Increase if getting errors. |
The function performs the following steps internally:
Fetches the ad page HTML from Facebook's Ad Library using headless Chrome.
Locates the <script> tag containing the deeplinkAdCard object.
Uses a recursive regular expression to extract the full JSON object following deeplinkAdCard.
Parses the JSON string into a nested R list.
Flattens the JSON into a tidy tibble row, unnesting nested sub-objects such as fevInfo,
free_form_additional_info, learn_more_content, and optionally snapshot if present.
The output is designed for downstream analysis: each ad is represented as one row in a tibble,
with nested JSON fields expanded into their own columns via tidyr::unnest_wider().
This function complements get_ad_snapshots(), which extracts the snapshot JSON.
Use get_deeplink() when additional metadata embedded under deeplinkAdCard is required.
A tibble with one row, containing flattened columns extracted from the deeplink JSON object.
Columns depend on the structure of the JSON and may include fields like fevInfo_*,
fevInfo_free_form_additional_info_*, fevInfo_learn_more_content_*, and snapshot-related columns.
get_ad_snapshots() for extracting snapshot JSON; detectmysnap() for raw JSON detection.
## Not run: df <- get_deeplink("1103135646905363") glimpse(df) ## End(Not run)## Not run: df <- get_deeplink("1103135646905363") glimpse(df) ## End(Not run)
This function automates the process of obtaining data from the Google Ads Transparency report. It targets the main data bundle, which contains several CSV files. The user can specify which file to process using either its full filename or a convenient shorthand. By default, all downloaded files are deleted after the data is read into memory.
get_ggl_ads(file_to_read = "creatives", keep_file_at = NULL, quiet = FALSE)get_ggl_ads(file_to_read = "creatives", keep_file_at = NULL, quiet = FALSE)
file_to_read |
A character string specifying which CSV file to read from
the bundle, using either the full filename or a shorthand alias (e.g., |
keep_file_at |
A character path to a directory where the selected CSV file
should be saved. If |
quiet |
A logical value. If |
Downloads the latest Google Political Ads transparency data bundle (a ZIP file), extracts a specific CSV report, reads it into a tibble, and then cleans up the downloaded and extracted files.
The data bundle contains several files. The user can specify which file to read using a shorthand alias.
Available Reports (Aliases):
"creatives" (Default): google-political-ads-creative-stats.csv. The primary and most detailed file. Contains statistics for each ad creative, including advertiser info, targeting details, and spend.
"advertisers": google-political-ads-advertiser-stats.csv. Aggregate statistics for each political advertiser.
"weekly_spend": google-political-ads-advertiser-weekly-spend.csv. Advertiser spending aggregated by week.
"geo_spend": google-political-ads-geo-spend.csv. Overall spending aggregated by geographic location.
"advertiser_geo_spend": google-political-ads-advertiser-geo-spend.csv. Advertiser-specific spending aggregated by US state.
"declarations": google-political-ads-advertiser-declared-stats.csv. Self-declared information from advertisers in certain regions (e.g., California, New Zealand).
"advertiser_mapping": advertiser_id_mapping.csv. A mapping file to reconcile different advertiser identifiers.
"creative_mapping": creative_id_mapping.csv. A mapping file to reconcile different ad creative identifiers.
"updated_date": google-political-ads-updated.csv. A single-entry file indicating the last time the report data was refreshed.
"campaigns" (Deprecated): google-political-ads-campaign-targeting.csv. Ad-level targeting is now in the "creatives" file.
"keywords" (Discontinued): google-political-ads-top-keywords-history.csv. Historical data on top keywords, terminated in Dec 2019.
For more details on the specific fields in each file, please refer to the Google Ads Transparency Report documentation.
A tibble (data frame) containing the data from the selected CSV file.
## Not run: # Fetch the main creative stats report using the default alias creative_stats <- get_ggl_ads() # Fetch the advertiser stats report using its alias advertiser_stats <- get_ggl_ads(file_to_read = "advertisers") # Fetch the advertiser ID mapping file advertiser_map <- get_ggl_ads(file_to_read = "advertiser_mapping") # Fetch the geo spend report using its full filename geo_spend_report <- get_ggl_ads( file_to_read = "google-political-ads-geo-spend.csv" ) # Fetch the main report and save the CSV file to a "data" folder creative_stats_saved <- get_ggl_ads( file_to_read = "creatives", keep_file_at = "data/" ) ## End(Not run)## Not run: # Fetch the main creative stats report using the default alias creative_stats <- get_ggl_ads() # Fetch the advertiser stats report using its alias advertiser_stats <- get_ggl_ads(file_to_read = "advertisers") # Fetch the advertiser ID mapping file advertiser_map <- get_ggl_ads(file_to_read = "advertiser_mapping") # Fetch the geo spend report using its full filename geo_spend_report <- get_ggl_ads( file_to_read = "google-political-ads-geo-spend.csv" ) # Fetch the main report and save the CSV file to a "data" folder creative_stats_saved <- get_ggl_ads( file_to_read = "creatives", keep_file_at = "data/" ) ## End(Not run)
This function scrapes ad data from the LinkedIn Ad Library, handling pagination to retrieve all available results for a given search query. It first collects all ad detail links and then scrapes each detail page with a configurable timeout and retry mechanism.
get_linkedin_ads( keyword, countries, start_date, end_date, account_owner = NULL, max_pages = 100, max_retries = 5, timeout_seconds = 15 )get_linkedin_ads( keyword, countries, start_date, end_date, account_owner = NULL, max_pages = 100, max_retries = 5, timeout_seconds = 15 )
keyword |
A character string for the keyword to search for (e.g., "Habeck"). |
countries |
A character vector of two-letter country codes (e.g., "DE"). |
start_date |
The start date of the search range in "YYYY-MM-DD" format. |
end_date |
The end date of the search range in "YYYY-MM-DD" format. |
account_owner |
Optional. A character string for the ad account owner. |
max_pages |
The maximum number of pages to scrape. Defaults to 100. |
max_retries |
The maximum number of retries for each detail page request. Defaults to 5. |
timeout_seconds |
The timeout in seconds for each detail page request. Defaults to 15. |
A tibble containing the detailed scraped ad information from all pages.
## Not run: ads_data <- get_linkedin_ads( keyword = "Habeck", countries = "DE", start_date = "2025-01-01", end_date = "2025-02-23", account_owner = "INSM", max_pages = 5, max_retries = 3, timeout_seconds = 20 ) print(ads_data) ## End(Not run)## Not run: ads_data <- get_linkedin_ads( keyword = "Habeck", countries = "DE", start_date = "2025-01-01", end_date = "2025-02-23", account_owner = "INSM", max_pages = 5, max_retries = 3, timeout_seconds = 20 ) print(ads_data) ## End(Not run)
Downloads the historical Facebook or Instagram page info dataset for a given ISO2C country code.
The data is retrieved from a fixed GitHub release URL in .parquet format. It includes information on:
Page-level metadata (e.g., name, verification status, profile type)
Audience metrics (e.g., number of likes, Instagram followers)
Shared disclaimers (if applicable)
Page creation and name change events with timestamps
Contact and address information (if available)
Free-text descriptions ("about" section)
get_page_info_db(iso2c, verbose = TRUE)get_page_info_db(iso2c, verbose = TRUE)
iso2c |
A string specifying the ISO-3166-1 alpha-2 country code (e.g., "DE", "FR", "US"). |
verbose |
Logical. If TRUE (default), prints a status message when downloading. |
A tibble containing Facebook page info for the specified country.
If the dataset is not available or cannot be retrieved, a tibble with no_data = TRUE
and the given iso2c code is returned.
## Not run: de_info <- get_page_info_db("DE") fr_info <- get_page_info_db("FR") ## End(Not run)## Not run: de_info <- get_page_info_db("DE") fr_info <- get_page_info_db("FR") ## End(Not run)
Retrieves insights for a given Facebook page within a specified timeframe, language, and country. It allows for fetching specific types of information and optionally joining page info with targeting info.
get_page_insights( pageid, timeframe = "LAST_30_DAYS", lang = "en-GB", iso2c = "US", include_info = c("page_info", "targeting_info"), join_info = T )get_page_insights( pageid, timeframe = "LAST_30_DAYS", lang = "en-GB", iso2c = "US", include_info = c("page_info", "targeting_info"), join_info = T )
pageid |
A string specifying the unique identifier of the Facebook page. |
timeframe |
A string indicating the timeframe for the insights. Valid options include predefined timeframes such as "LAST_30_DAYS". The default value is "LAST_30_DAYS". |
lang |
A string representing the language locale to use for the request, formatted as language code followed by country code (e.g., "en-GB" for English, United Kingdom). The default is "en-GB". |
iso2c |
A string specifying the ISO-3166-1 alpha-2 country code for which insights are requested. The default is "US". |
include_info |
A character vector specifying the types of information to include in the output. Possible values are "page_info" and "targeting_info". By default, both types of information are included. |
join_info |
A logical value indicating whether to join page info and targeting info into a single data frame (if TRUE) or return them as separate elements in a list (if FALSE). The default is TRUE. |
If join_info is TRUE, returns a data frame combining page and targeting information for
the specified Facebook page. If join_info is FALSE, returns a list with two elements:
page_info and targeting_info, each containing the respective data as a data frame.
In case of errors or no data available, the function may return a simplified data frame or list
indicating the absence of data.
insights <- get_page_insights(pageid="123456789", timeframe="LAST_30_DAYS", lang="en-GB", iso2c="US", include_info=c("page_info", "targeting_info"), join_info=TRUE)insights <- get_page_insights(pageid="123456789", timeframe="LAST_30_DAYS", lang="en-GB", iso2c="US", include_info=c("page_info", "targeting_info"), join_info=TRUE)
This function retrieves a report for a specific country and timeframe from a GitHub repository hosting RDS files. The file is downloaded to a temporary location, read into R, and then deleted.
get_report_db(the_cntry, timeframe, ds, verbose = FALSE)get_report_db(the_cntry, timeframe, ds, verbose = FALSE)
the_cntry |
Character. The ISO country code (e.g., "DE", "US"). |
timeframe |
Character or Numeric. Timeframe in days (e.g., "30", "90") or "yesterday" / "lifelong". |
ds |
Character. A timestamp or identifier used to construct the file name (e.g., "2024-12-25"). |
verbose |
Logical. Whether to print messages about the process. Default is |
A data frame or object read from the RDS file.
# Example usage report_data <- get_report_db( the_cntry = "DE", timeframe = 30, ds = "2024-12-25", verbose = TRUE ) print(head(report_data))# Example usage report_data <- get_report_db( the_cntry = "DE", timeframe = 30, ds = "2024-12-25", verbose = TRUE ) print(head(report_data))
This function retrieves data for the targeting criteria of a Facebook page for a specified timeframe.
get_targeting(id, timeframe = "LAST_30_DAYS", lang = "en-GB", legacy = FALSE)get_targeting(id, timeframe = "LAST_30_DAYS", lang = "en-GB", legacy = FALSE)
id |
A character string representing the Facebook page ID. |
timeframe |
A character string representing the desired timeframe. Can either be "LAST_30_DAYS" or "LAST_7_DAYS". Defaults to "LAST_30_DAYS". |
lang |
An ISO language code character string representing the desired language of the targeting criteria. Defaults to "en-GB" but can be "en-US" and many more. |
A tibble containing the audience data for the specified Facebook page and timeframe.
## Not run: get_targeting("123456789") get_targeting("987654321", "LAST_7_DAYS") ## End(Not run)## Not run: get_targeting("123456789") get_targeting("987654321", "LAST_7_DAYS") ## End(Not run)
This function retrieves targeting data for a specific country and timeframe
from a GitHub repository hosting parquet files. The function uses the arrow
package to read the parquet file directly from the specified URL.
Note that the retreival of archived data is only possible three days after
a specified date.
get_targeting_db(the_cntry, tf, ds, remove_nas = T, verbose = F)get_targeting_db(the_cntry, tf, ds, remove_nas = T, verbose = F)
the_cntry |
Character. The ISO country code (e.g., "DE", "US"). |
tf |
Numeric or character. The timeframe in days ("yesterday", "7", "30", "90", "lifelong"). Note, some data points for lifelong in the past may be missing for some countries. |
ds |
Character. A timestamp or identifier used to construct the file path (e.g., "2024-12-25"). |
A data frame containing the targeting data from the parquet file.
# Example usage latest_data <- get_targeting_db( the_cntry = "DE", tf = 30, ds = "2024-10-25" ) print(head(latest_data))# Example usage latest_data <- get_targeting_db( the_cntry = "DE", tf = 30, ds = "2024-10-25" ) print(head(latest_data))
This function retrieves metadata for targeting data releases for a specific country and timeframe from a GitHub repository.
get_targeting_metadata( country_code, timeframe, base_url = "https://github.com/favstats/meta_ad_targeting/releases/expanded_assets/" )get_targeting_metadata( country_code, timeframe, base_url = "https://github.com/favstats/meta_ad_targeting/releases/expanded_assets/" )
country_code |
Character. The ISO country code (e.g., "DE", "US"). |
timeframe |
Character. The timeframe to filter (e.g., "7", "30", or "90"). |
base_url |
Character. The base URL for the GitHub repository. Defaults to
|
A data frame containing metadata about available targeting data, including file names, sizes, timestamps, and tags.
# Retrieve metadata for Germany for the last 30 days metadata <- get_targeting_metadata("DE", "30") print(metadata)# Retrieve metadata for Germany for the last 30 days metadata <- get_targeting_metadata("DE", "30") print(metadata)
This function queries the Google Ads Transparency Report to retrieve information about advertising spending for a specified advertiser. It supports a range of countries and can return either aggregated data or time-based spending data.
ggl_get_spending( advertiser_id, start_date = 20231029, end_date = 20231128, cntry = "NL", get_times = FALSE )ggl_get_spending( advertiser_id, start_date = 20231029, end_date = 20231128, cntry = "NL", get_times = FALSE )
advertiser_id |
A string representing the unique identifier of the advertiser. For example "AR14716708051084115969". |
start_date |
An integer representing the start date for data retrieval in YYYYMMDD format. For example 20231029. |
end_date |
An integer representing the end date for data retrieval in YYYYMMDD format. For example 20231128. |
cntry |
A string representing the country code for which the data is to be retrieved. For example "NL" (Netherlands). |
get_times |
A boolean indicating whether to return time-based spending data. If FALSE, returns aggregated data. Default is FALSE. |
A tibble containing advertising spending data. If get_times is TRUE,
the function returns a tibble with date-wise spending data. Otherwise, it returns
a tibble with aggregated spending data, including details like currency, number of ads,
ad type breakdown, advertiser details, and other metrics.
# Retrieve aggregated spending data for a specific advertiser in the Netherlands spending_data <- ggl_get_spending(advertiser_id = "AR14716708051084115969", start_date = 20231029, end_date = 20231128, cntry = "NL") # Retrieve time-based spending data for the same advertiser and country time_based_data <- ggl_get_spending(advertiser_id = "AR14716708051084115969", start_date = 20231029, end_date = 20231128, cntry = "NL", get_times = TRUE)# Retrieve aggregated spending data for a specific advertiser in the Netherlands spending_data <- ggl_get_spending(advertiser_id = "AR14716708051084115969", start_date = 20231029, end_date = 20231128, cntry = "NL") # Retrieve time-based spending data for the same advertiser and country time_based_data <- ggl_get_spending(advertiser_id = "AR14716708051084115969", start_date = 20231029, end_date = 20231128, cntry = "NL", get_times = TRUE)
Adding a progress bar to the map_dfr function https://www.jamesatkins.net/posts/progress-bar-in-purrr-map-df/
map_dfr_progress(.x, .f, ...)map_dfr_progress(.x, .f, ...)
.x |
List to iterate over. |
.f |
Function to apply. |
... |
Other parameters passed to |
.id |
An identifier. |
An aggregated data frame.
This function takes the HTML content from a LinkedIn Ad Library detail page and extracts key information like advertiser, targeting, and impressions.
parse_linkedin_ads_details(html_content)parse_linkedin_ads_details(html_content)
html_content |
An |
A list containing the extracted ad details.
## Not run: # Assuming you have saved the HTML of a detail page to "ad_detail.html" ad_html <- rvest::read_html("ad_detail.html") details <- parse_ad_details(ad_html) print(details) ## End(Not run)## Not run: # Assuming you have saved the HTML of a detail page to "ad_detail.html" ad_html <- rvest::read_html("ad_detail.html") details <- parse_ad_details(ad_html) print(details) ## End(Not run)
A function to parse the location strings in the Ad Targeting Dataset and split into separate columns for each level of detail.
parse_location(.x, loc_var, type = "include", verbose = T)parse_location(.x, loc_var, type = "include", verbose = T)
.x |
A data.frame containing the location string |
loc_var |
A character string specifying the name of the column in .x that contains the location string |
type |
A character string specifying the prefix to add to each column of split location details. Default is "include". Should be "include" or "exclude". |
verbose |
A logical flag specifying whether to display a progress bar during processing. Default is |
A data.frame with columns for each level of detail in the location.
## Not run: ### create a dataset with unique include_location values distinct_data <- targeting_data %>% distinct(include_location) #### parse the location data and join in original dataset distinct_data %>% parse_location(include_location, type = "include") %>% right_join(targeting_data) ###----#### ### create a dataset with unique exclude_location values distinct_data <- targeting_data %>% distinct(exclude_location) #### parse the location data and join in original dataset distinct_data %>% parse_location(exclude_location, type = "exclude") %>% right_join(targeting_data) ## End(Not run)## Not run: ### create a dataset with unique include_location values distinct_data <- targeting_data %>% distinct(include_location) #### parse the location data and join in original dataset distinct_data %>% parse_location(include_location, type = "include") %>% right_join(targeting_data) ###----#### ### create a dataset with unique exclude_location values distinct_data <- targeting_data %>% distinct(exclude_location) #### parse the location data and join in original dataset distinct_data %>% parse_location(exclude_location, type = "exclude") %>% right_join(targeting_data) ## End(Not run)
Retrieves ads from the Facebook Ad Library by page ID or text query, without requiring an API key. Uses headless Chrome to load the Ad Library search page, which embeds ~30 ads with rich metadata in its server-side rendered HTML.
search_ad_library( page_id = NULL, query = NULL, country = "ALL", ad_type = "all", active_status = "all", date_min = NULL, date_max = NULL, media_type = "all", publisher_platforms = NULL, content_languages = NULL, search_type = NULL, sort_mode = "total_impressions", sort_direction = "desc", max_pages = 1, wait_sec = 12 )search_ad_library( page_id = NULL, query = NULL, country = "ALL", ad_type = "all", active_status = "all", date_min = NULL, date_max = NULL, media_type = "all", publisher_platforms = NULL, content_languages = NULL, search_type = NULL, sort_mode = "total_impressions", sort_direction = "desc", max_pages = 1, wait_sec = 12 )
page_id |
Character. Facebook page ID to search for all ads from
that page (e.g., |
query |
Character. Text search query (e.g., |
country |
Character. ISO country code filter (default |
ad_type |
Character. Ad type filter (default |
active_status |
Character. Delivery status filter (default |
date_min |
Character or NULL. Minimum start date filter in
|
date_max |
Character or NULL. Maximum start date filter in
|
media_type |
Character. Media type filter (default |
publisher_platforms |
Character vector or NULL. Platform filters.
Example: |
content_languages |
Character vector or NULL. Content language filters.
Example: |
search_type |
Character or NULL. Search type override.
Common options: |
sort_mode |
Character or NULL. Sort mode for |
sort_direction |
Character or NULL. Sort direction for
|
max_pages |
Integer. Number of pages to fetch, each containing ~30 ads (default 1). Set higher to paginate (experimental). |
wait_sec |
Numeric. Seconds to wait for the page to load (default 12). Increase if getting empty results. |
Each ad includes:
Metadata: ad_archive_id, page_name, page_id, categories
(e.g., "POLITICAL"), is_active, start_date, end_date, spend,
currency, reach_estimate, publisher_platform, impressions_lower,
impressions_upper, targeted_countries
Creative (snapshot): body, title, display_format, link_url,
cta_text, images, videos, cards, page_profile_picture_url
Link: ad_library_url — direct URL to the ad in Facebook's Ad Library
Uses active_status="all" by default (full history)
Use active_status="active" to focus on ads currently running
Each page load yields ~30 ads from the server-side rendered HTML
Pagination beyond page 1 reuses Facebook's own pagination request shape captured from browser runtime (still experimental)
The categories field can identify political ads ("POLITICAL")
A tibble with one row per ad. Returns an empty tibble if no ads are found. See Data returned section for column details.
## Not run: browser_session_start() # Search by page ID (all D66 ads) d66_ads <- search_ad_library(page_id = "52985377549") # Search by keyword in the Netherlands ads <- search_ad_library(query = "Rob Jetten", country = "NL") # Check which ads are political ads %>% dplyr::filter(categories == "POLITICAL") # Check ad dates and spend ads %>% dplyr::select(page_name, start_date, end_date, spend, categories) browser_session_close() ## End(Not run)## Not run: browser_session_start() # Search by page ID (all D66 ads) d66_ads <- search_ad_library(page_id = "52985377549") # Search by keyword in the Netherlands ads <- search_ad_library(query = "Rob Jetten", country = "NL") # Check which ads are political ads %>% dplyr::filter(categories == "POLITICAL") # Check ad dates and spend ads %>% dplyr::select(page_name, start_date, end_date, spend, categories) browser_session_close() ## End(Not run)
Launches an interactive command-line interface to help users configure and save default package options, such as the cache directory and user-agent randomization preference.
set_metatargetr_options()set_metatargetr_options()
The function will guide the user through setting the following options:
metatargetr.cache_dir: The default directory to save HTML files.
metatargetr.randomize_ua: Whether to use random User-Agents by default.
The user will also be prompted to save these settings as environment variables
in their personal .Renviron file for persistence across R sessions.
We did not manage to easily output the entry for one ad as a row in a tibble; this line solves the issue and does exactly that.
stupid_conversion(x)stupid_conversion(x)
x |
An object containing data about the ad. |
A tibble row.
unnest and fix duplicates in interest targeting data from Ad Targeting Dataset
The function unnests "include" and "exclude" columns in the Ad Targeting Dataset, and removes duplicates.
unnest_and_fix_dups(dat, the_list, new_name)unnest_and_fix_dups(dat, the_list, new_name)
dat |
a data frame |
the_list |
the column name to unnest |
new_name |
the name of the new column after unnesting and fixing duplicates |
a modified data frame with unnested and deduplicated values
### example usage: ## make sure you have the variable 'archive_id' in your data ad_targeting_data %>% rowwise() %>% mutate(include_list = fix_json(include)) %>% ungroup() %>% ## the_list: the parsed list of JSON, new_name: what the parsed column should be called unnest_and_fix_dups(the_list = include_list, new_name = "parsed_include")### example usage: ## make sure you have the variable 'archive_id' in your data ad_targeting_data %>% rowwise() %>% mutate(include_list = fix_json(include)) %>% ungroup() %>% ## the_list: the parsed list of JSON, new_name: what the parsed column should be called unnest_and_fix_dups(the_list = include_list, new_name = "parsed_include")
Walk through a list with a progress bar
walk_progress(.x, .f, ...)walk_progress(.x, .f, ...)
.x |
List to iterate over. |
.f |
Function to apply. |
... |
Other parameters passed to |
None (used for side effects).