Package 'globaltrends'

Title: Download and Measure Global Trends Through 'Google' Search Volumes
Description: 'Google' offers public access to global search volumes from its search engine through the 'Google Trends' portal. The package downloads these search volumes provided by 'Google Trends' and uses them to measure and analyze the distribution of search scores across countries or within countries. The package allows researchers and analysts to use these search scores to investigate global trends based on patterns within these scores. This offers insights such as degree of internationalization of firms and organizations or dissemination of political, social, or technological trends across the globe or within single countries. An outline of the package's methodological foundations and potential applications is available as a working paper: <https://www.ssrn.com/abstract=3969013>.
Authors: Harald Puhr [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-3308-9553>), Jakob Muellner [ccp] (ORCID: <https://orcid.org/0000-0002-3443-0469>)
Maintainer: Harald Puhr <[email protected]>
License: MIT + file LICENSE
Version: 0.2.0
Built: 2026-06-29 10:56:34 UTC
Source: https://github.com/ha-pu/globaltrends

Help Index


Add batches of control or object keywords

Description

The function adds one or more batches of keywords with a time period for downloads to the database. The batches serve as input for all download and computation functions.

Usage

add_control_keyword(keyword, start_date = "2010-01", end_date = "2020-12")

add_object_keyword(keyword, start_date = "2010-01", end_date = "2020-12")

Arguments

keyword

Keywords that should be added as batch. Vector of type character or a list of character vectors. The function also allows the usage of codes for search topics instead of search terms.

start_date

Start of the time frame for which batch data should be downloaded. Character scalar in the format "YYYY-MM". Defaults to "2010-01".

end_date

End of the time frame for which batch data should be downloaded. Character scalar in the format "YYYY-MM". Defaults to "2020-12".

Details

Since Google Trends allows a maximum of five keywords for each query, batches of control keywords can consist of up to five keywords. Since one control keyword is added to batches of object keywords for mapping, object batch length is limited to four keywords. When a character vector contains more than four (five) keywords, the vector is split into four-keyword (five-keyword) batches. A list must contain character vectors of length four (five) or less. Each batch of keywords is combined with a time period for which data will be downloaded. To change the time period for an existing batch, all downloads and computations must be rerun.

Value

Integer vector of the newly added batch IDs (one element per batch created). Batch data is written to tables batch_keywords and batch_time. A message is printed for each batch.

Warning

If you use search topics for object keywords, make sure to use search topics for control keywords and vice versa. See Google's FAQ for additional information on search topics.

Note

Leading, trailing, and internal whitespace is automatically trimmed from all keywords via trimws().

See Also

Examples

## Not run: 
add_control_keyword(
  keyword = c("gmail", "maps", "translate", "wikipedia", "youtube"),
  start_date = "2016-01", end_date = "2019-12"
)
add_object_keyword(
  keyword = c("apple", "facebook", "google", "microsoft"),
  start_date = "2016-01", end_date = "2019-12"
)

add_control_keyword(
  keyword = c("gmail", "maps", "news", "translate", "weather", "wikipedia", "youtube"),
  start_date = "2016-01", end_date = "2019-12"
)
add_control_keyword(
  keyword = c("amazon", "apple", "facebook", "google", "microsoft", "netflix", "twitter"),
  start_date = "2016-01", end_date = "2019-12"
)

add_control_keyword(
  keyword = list(
    c("gmail", "maps", "news"),
    c("translate", "weather", "wikipedia", "youtube")
  ),
  start_date = "2016-01", end_date = "2019-12"
)
add_control_keyword(
  keyword = list(
    c("amazon", "apple", "facebook", "google"),
    c("microsoft", "netflix", "twitter")
  ),
  start_date = "2016-01", end_date = "2019-12"
)

# search topics
add_control_keyword(
  keyword = c("%2Fm%2F02q_bk", "%2Fm%2F055t58", "%2Fm%2F025sndk", "%2Fm%2F0d07ph", "%2Fm%2F09jcvs"),
  start_date = "2016-01", end_date = "2019-12"
)
# This adds the following topics: Gmail, Google Maps, Google Translate, Wikipedia, YouTube

## End(Not run)

Add a location set

Description

Adds location codes to a named location set in the data_locations database table. A location set is a named group of codes (e.g., "countries", "DACH") that is passed as the locations argument to download and computation functions. After insertion the set is immediately accessible as ⁠gt.env$<type>⁠.

Usage

add_locations(locations, type, export = TRUE)

Arguments

locations

Character vector of location codes to add. Each code must appear in gtrendsR::countries$country_code (country level) or gtrendsR::countries$sub_code (subnational level). Leading/trailing whitespace and duplicates are removed automatically.

type

Character scalar. Name of the location set to which locations should be added (e.g., "DACH", "EU"). After export the set is available as ⁠gt.env$<type>⁠.

export

Logical scalar. If TRUE (default), gt.env is refreshed so the updated set is available immediately. Set to FALSE when calling add_locations() several times in sequence to avoid a redundant database read after each call; run add_locations(..., export = TRUE) on the final call, or restart the session, to make all sets available.

Details

The package ships with two default sets — "countries" and "us_states" — written to the database by start_db(). Use add_locations() to define additional sets such as "EU", "DACH", or subnational regions for a specific country.

The function is idempotent with respect to ⁠(type, location)⁠ pairs: codes that already exist in the named set are silently skipped, so repeated calls are safe. Leading and trailing whitespace is trimmed from all codes before validation and insertion.

Value

Invisibly returns a tibble of the rows appended to data_locations (columns: location, type). Returns a zero-row tibble when all supplied codes already exist in the set. A message is emitted in either case summarising how many codes were added and how many were skipped.

Known API limitation

The Google Trends API cannot handle the location code "NA" (Namibia). If "NA" is supplied it is dropped with a warning. If it is the only code supplied, the function errors.

See Also

Examples

## Not run: 
# Create a custom set for the DACH region
add_locations(locations = c("AT", "CH", "DE"), type = "DACH")

# Add subnational codes (US states from the built-in vector)
add_locations(locations = us_states, type = "us_states")

# Add several sets without redundant DB reads; refresh once at the end
add_locations(locations = c("AT", "CH", "DE"), type = "DACH", export = FALSE)
add_locations(locations = c("BE", "LU", "NL"), type = "benelux", export = TRUE)

## End(Not run)

Add synonyms for object keywords

Description

Registers one or more synonyms for a single object keyword. When compute_score() aggregates search scores, all synonyms are treated as equivalent to the canonical keyword and their scores are summed.

A common use-case is alternate names: e.g., "FC Bayern" and "Bayern Munich" refer to the same entity and should be aggregated.

Usage

add_synonym(keyword, synonym)

Arguments

keyword

Character scalar. The canonical object keyword for which the synonyms are registered. Must already exist as an object keyword in the database.

synonym

Character scalar or vector, or a list of character vectors. One or more synonyms to associate with keyword. Each element is inserted as a separate row in keyword_synonyms.

Value

Invisibly returns NULL. Synonym rows are written to table keyword_synonyms and the in-memory cache gt.env$keyword_synonyms is refreshed. A message is printed for each synonym added.

Note

trimws() is applied to both keyword and synonym to remove leading, trailing, and internal whitespace.

See Also

Examples

## Not run: 
# Single synonym
add_synonym(
  keyword = "fc bayern",
  synonym = "bayern munich"
)

# Multiple synonyms in one call
add_synonym(
  keyword = "fc barcelona",
  synonym = c("barcelona", "barca", "fcb")
)

## End(Not run)

Aggregate search scores across synonym terms

Description

Merges synonym keyword scores into their canonical keyword scores in data_score. Run this after compute_score(). Synonym relationships are defined with add_synonym().

Usage

aggregate_synonyms(control, vacuum = TRUE)

Arguments

control

Numeric/integer scalar. The control batch id (batch_c), identifying the reference search used for score normalisation. In most single-control setups this is 1.

vacuum

Logical scalar. If TRUE (default), calls vacuum_data() after aggregation to reclaim space freed by the row deletions.

Details

For a given control batch (batch_c), this function:

  1. Retrieves all canonical-synonym pairs and their associated object batches (batch_o) in a single database query.

  2. Pulls the relevant data_score rows, remaps synonym rows onto their canonical keyword, and sums scores across duplicates.

  3. Deletes the affected data_score rows for those object batches.

  4. Writes the aggregated rows back to data_score.

  5. Optionally calls vacuum_data() to reclaim disk space.

The delete-and-reinsert pattern can be slow for large datasets. Vacuuming adds the most overhead and can be deferred by setting vacuum = FALSE.

Value

Invisibly returns a data frame of the rows written to data_score. Called primarily for its side effects (database modifications).

See Also

compute_score() to populate data_score before aggregating, add_synonym() to define synonym relationships, vacuum_data() for manual space reclamation.

Examples

## Not run: 
compute_score(object = 1:2, control = 1)
aggregate_synonyms(control = 1, vacuum = FALSE)

## End(Not run)

Example table: keyword batches (batch_keywords)

Description

Example data representing the database table batch_keywords. Each row assigns a single keyword to a batch and a type ("control" or "object").

The example contains one control batch (5 keywords: gmail, maps, translate, wikipedia, youtube) and four object batches (14 object keywords covering football clubs and technology firms), all covering the period 2010-01 to 2019-12.

In a live database, keyword batches are created via add_keyword() and are exported to the package environment gt.env by start_db() as gt.env$keywords_control and gt.env$keywords_object. Control batches hold up to five keywords; object batches hold up to four (one slot is reserved for the overlap keyword used in score mapping).

Usage

example_keywords

Format

A tibble with 3 variables:

type

Character. Batch type: "control" or "object".

batch

Integer. Batch identifier within type.

keyword

Character. Keyword assigned to the batch.

See Also

add_keyword(), start_db()


Example table: batch time windows (batch_time)

Description

Example data representing the database table batch_time. Each row assigns a time window (start_date, end_date) to a batch and a type ("control" or "object"). Each ⁠(type, batch)⁠ combination has exactly one row.

In a live database, batch time windows are generated when keywords are added (see add_keyword()) and are exported to the package environment gt.env by start_db() as gt.env$time_control and gt.env$time_object.

Dates are stored as "YYYY-MM" strings to represent monthly windows. To change the time window for an existing batch, all downloads and computations for that batch must be re-run.

Usage

example_time

Format

A tibble with 4 variables:

type

Character. Batch type: "control" or "object".

batch

Integer. Batch identifier within type.

start_date

Character. Window start in "YYYY-MM".

end_date

Character. Window end in "YYYY-MM".

See Also

add_keyword(), start_db()


Compute degree of internationalization (DOI)

Description

Computes degree of internationalization (DOI) for object keywords based on the cross-location distribution of search scores. DOI is computed per ⁠(keyword, date)⁠ combination for a given control batch (batch_c), object batch (batch_o), and a named location set (e.g., "countries"). Results are appended to the data_doi database table.

Usage

compute_doi(object, control = 1, locations = "countries")

## S3 method for class 'numeric'
compute_doi(object, control = 1, locations = "countries")

## S3 method for class 'list'
compute_doi(object, control = 1, locations = "countries")

Arguments

object

Numeric scalar, vector, or list of numerics. One or more object batch ids (batch_o) identifying keyword groups for which DOI should be computed. A numeric vector is processed element-by-element (equivalent to passing a list).

control

Numeric scalar. Control batch id (batch_c) identifying the baseline keyword group used for score normalisation. Defaults to 1.

locations

Character scalar. Name of a location set stored in data_locations$type (e.g., "countries", "us_states"). Only locations belonging to this set are included in the DOI computation. Defaults to "countries".

Details

DOI captures how evenly search interest is spread across a set of locations: a perfectly uniform score vector yields the maximum DOI, while one concentrated in a single location yields the minimum.

Three complementary dispersion measures are computed for each ⁠(keyword, date)⁠ series:

gini

1 - Gini(score). Uses the rank-weighted formula Gini = (2 * sum(score[i] * i) / sum(score) - (n + 1)) / n over the sorted score vector. Ranges from 0 (complete concentration) to 1 (perfect equality).

hhi

1 - HHI(score) where HHI = sum(p^2) and p = score / sum(score). Ranges from 0 (monopoly) to 1 - 1/n (perfect equality across n locations).

entropy

H(p) - log(n) where p = score / sum(score), H(p) = -sum(p * log(p)) is Shannon entropy, and n is the number of locations with non-zero scores. Always ⁠<= 0⁠; equals 0 when scores are perfectly uniform and becomes more negative as concentration increases. Zero scores are excluded before computing logs.

If all scores for a ⁠(keyword, date)⁠ series are NA, all three measures are set to NA. If all non-NA scores are zero, gini and hhi return 0 and entropy returns 0.

Score data must already exist in data_score, typically produced by compute_score(). Only locations whose type in data_locations matches the locations argument are included. The global aggregate (location == "world") is excluded unless the location set explicitly contains it.

If DOI for the requested ⁠(batch_c, batch_o, locations)⁠ combination already exists in data_doi, the function emits a message and returns early without recomputing.

Value

Invisibly returns the data frame appended to data_doi for the processed batch, with columns date, keyword, gini, hhi, entropy, batch_c, batch_o, and locations. Returns an empty data frame when DOI already exists or when no matching score data is found. Called primarily for its side effects (database writes) and emits a progress message per batch.

See Also

compute_score() to produce the score data consumed by this function; data_doi for the database table schema.

Examples

## Not run: 
compute_doi(object = 1, control = 1, locations = "countries")
compute_doi(object = as.list(1:5), control = 1, locations = "countries")

## End(Not run)

Compute search scores for object keywords

Description

Computes search scores for object keywords by mapping object and control search volumes onto a common scale and then normalizing object volumes by the mapped control total for each ⁠(location, date)⁠.

Convenience wrapper around compute_score() for computing the volume of internationalization (VOI) — a measure of how globally distributed search interest for a keyword is relative to the control baseline. Equivalent to compute_score(object, control, locations = "world"), which uses the worldwide aggregate rather than country-level breakdowns.

Use this function when you only need the global aggregate score, for example when locations = "world" was passed to download_object().

Usage

compute_score(object, control = 1, locations = NULL)

## S3 method for class 'numeric'
compute_score(object, control = 1, locations = NULL)

## S3 method for class 'list'
compute_score(object, control = 1, locations = NULL)

compute_voi(object, control = 1)

Arguments

object

Integer-like scalar, vector, or list. The object batch id(s) (batch_o) for which VOI should be computed.

control

Integer-like scalar. The control batch id (batch_c). Defaults to 1.

locations

Character vector of location codes to compute scores for. The package exports countries (ISO 3166-1 alpha-2 codes for all countries) and us_states (two-letter US state codes) as convenience vectors. Pass "world" to compute the global aggregate only (see also compute_voi()). If NULL, defaults to gt.env$countries when set via start_db(), otherwise falls back to globaltrends::countries.

Details

Conceptually, the score for an object keyword is computed as:

scoreo,loc,t=hitso,loc,tkChits~k,loc,tscore_{o,loc,t} = \frac{hits_{o,loc,t}}{\sum_{k \in C} \tilde{hits}_{k,loc,t}}

where CC is the set of control keywords and hits~\tilde{hits} are control hits mapped to the object scale using an overlap-based benchmark, following the mapping logic described in Castelnuovo and Tran (2017, Appendix A).

Idempotency. Already-computed ⁠(batch_c, batch_o, location)⁠ combinations are detected and skipped automatically, so repeated calls safely fill in only missing locations.

Operationally, for each object batch (batch_o) and control batch (batch_c), the function:

  1. Identifies the subset of locations not yet present in data_score for this ⁠(batch_c, batch_o)⁠ pair.

  2. Computes a per-⁠(location, date)⁠ benchmark as the mean ratio of object-to-control hits for the keywords that appear in both downloads.

  3. Maps control hits to the object scale: hits_mapped = hits * benchmark.

  4. Sums mapped control hits across keywords to obtain hits_c and computes score = hits_object / hits_c for each object keyword.

  5. Inserts the resulting rows into data_score.

If synonym keywords were specified via add_synonym(), run aggregate_synonyms() after score computation to roll synonym scores into their canonical terms.

Value

Called primarily for its side effects (writing to data_score); the return value is rarely needed. When object is a scalar or vector, returns the number of rows inserted into data_score as an integer (0L if all requested locations were already computed). When object is a list, returns TRUE invisibly after processing all elements.

See compute_score() for return value semantics.

References

Castelnuovo, E. & Tran, T. D. (2017). Google It Up! A Google Trends-based Uncertainty index for the United States and Australia. Economics Letters, 161, 149–153. doi:10.1016/j.econlet.2017.09.032

See Also

download_control() and download_object() to populate the raw data tables before computing scores. aggregate_synonyms() to roll synonym keyword scores into their canonical terms after score computation. add_synonym() to define synonym relationships. compute_voi() for the global-aggregate shorthand.

compute_score() for country-level scores.

Examples

## Not run: 
# Compute scores for a single object batch across all countries
compute_score(object = 1, control = 1, locations = countries)

# Process multiple object batches in one call
compute_score(object = as.list(1:5), control = 1, locations = countries)

# Compute the global aggregate (VOI) only
compute_voi(object = 1, control = 1)

## End(Not run)

Default location set: countries

Description

Character vector of country location codes used by the package as a default location set for cross-country computations.

The vector contains ISO 3166-1 alpha-2 country codes selected from countries_wdi based on a GDP share threshold (>= 0.1% of world GDP in 2018) using World Bank World Development Indicators (WDI). This threshold retains the economically significant countries while keeping query volume manageable. Pass this vector as the locations argument to compute_score() or compute_doi() for standard cross-country analyses.

Note that "NA" (Namibia's ISO code) is excluded because the Google Trends API cannot handle it; see add_locations() for details.

Usage

countries

Format

A character vector of ISO 3166-1 alpha-2 country codes.

See Also

countries_wdi, add_locations(), start_db()

Examples

length(countries)
head(countries)

Country codes and names from WDI

Description

A data frame of country/location codes and names as provided by the World Bank World Development Indicators (WDI). This object is a bundled snapshot of WDI::WDI_data$country included to remove the runtime dependency on the WDI package. It is useful for mapping ISO-style codes to human-readable country names when inspecting or constructing custom location sets, and for understanding which countries are included in countries.

Usage

countries_wdi

Format

A data frame whose columns follow the conventions of WDI::WDI_data$country. Key columns include iso2c (ISO 3166-1 alpha-2 code, matching values in countries), country (English country name), and additional World Bank metadata fields.

Source

World Bank World Development Indicators (WDI), https://datatopics.worldbank.org/world-development-indicators/. Bundled as a static snapshot; for the latest data see the WDI R package.

See Also

countries, add_locations()


Example table: control downloads (data_control)

Description

Example data representing the database table data_control. Each row contains Google Trends hits for a control keyword in a given location on a given date, along with the control batch identifier.

In a live database, data are downloaded via download_control() and are queryable through gt.env$globaltrends_db after start_db(). Global aggregates use "world" as location.

The example dataset is simulated to resemble real Google Trends output. Simulated values are bounded to the empirical [min, max] range observed in actual downloads for each keyword–location pair.

Usage

example_control

Format

A tibble with 5 variables:

location

Character. Location code (ISO 3166-1 alpha-2 or other codes supported by Google Trends). Global data uses "world".

keyword

Character. Control keyword.

date

Integer. Date stored as days since 1970-01-01 (Unix epoch). Convert with as.Date(date, origin = "1970-01-01").

hits

Integer. Relative search interest in [0, 100]. Google Trends normalizes all values within a single query window so the peak observation equals 100.

batch

Integer. Control batch id.

Source

Google Trends (https://trends.google.com). Simulated to match empirical distributional statistics from real downloads.

See Also

download_control(), start_db()


Example table: degree of internationalization (data_doi)

Description

Example data representing the database table data_doi. Each row contains degree-of-internationalization (DOI) metrics for an object keyword on a given date, computed from the distribution of data_score across a specified set of locations.

DOI captures how evenly search interest is spread across locations: a perfectly uniform score distribution yields the maximum value for each metric; concentration in a single location yields the minimum. Three complementary dispersion measures are provided — see compute_doi() for their exact formulae.

DOI is computed via compute_doi() and is queryable through gt.env$globaltrends_db after start_db(). The batch_c column indicates the control batch used as baseline, and batch_o indicates the object batch.

The example dataset is simulated to resemble outputs derived from real Google Trends data.

Usage

example_doi

Format

A tibble with 8 variables:

keyword

Character. Object keyword.

date

Integer. Date stored as days since 1970-01-01. Convert with as.Date(date, origin = "1970-01-01").

gini

Double. 1 - Gini(score) across locations. Range [0, 1]: 1 = perfectly equal distribution; 0 = all search interest in one location.

hhi

Double. 1 - HHI(score) across locations. Range [0, 1 - 1/n] where n is the number of locations: higher values indicate more equal distributions.

entropy

Double. H(p) - log(n) (Shannon entropy deficit). Range (-Inf, 0]: 0 = perfectly uniform distribution; more negative values indicate greater concentration.

batch_c

Integer. Control batch id used as baseline.

batch_o

Integer. Object batch id.

locations

Character. Name of the location set used (e.g., "countries", "us_states").

References

Castelnuovo, E. & Tran, T. D. (2017). Google It Up! A Google Trends-based Uncertainty index for the United States and Australia. Economics Letters, 161, 149–153. doi:10.1016/j.econlet.2017.09.032

Puhr, H. & Müllner, J. (2022). Let me Google that for you: Capturing internationalization using Google Trends. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3969013

See Also

compute_doi(), start_db(), dplyr::tbl()


Example table: object downloads (data_object)

Description

Example data representing the database table data_object. Each row contains Google Trends hits for an object keyword in a given location on a given date. Each download pairs an object batch (batch_o) with a control batch (batch_c): one control keyword is included in every object query so that object and control hits can be mapped onto a common scale during score computation.

In a live database, data are downloaded via download_object() and are queryable through gt.env$globaltrends_db after start_db(). Global aggregates use "world" as location.

The example dataset is simulated to resemble real Google Trends output. Simulated values are bounded to the empirical [min, max] range observed in actual downloads for each keyword–location pair.

Usage

example_object

Format

A tibble with 6 variables:

location

Character. Location code. Global data uses "world".

keyword

Character. Object keyword.

date

Integer. Date stored as days since 1970-01-01. Convert with as.Date(date, origin = "1970-01-01").

hits

Integer. Relative search interest in [0, 100] within the query window. The peak value across all keywords in that query equals 100.

batch_c

Integer. Control batch id. Identifies which control batch was co-downloaded for scale mapping in compute_score().

batch_o

Integer. Object batch id.

Source

Google Trends (https://trends.google.com). Simulated to match empirical distributional statistics from real downloads.

See Also

download_object(), start_db()


Example table: computed scores (data_score)

Description

Example data representing the database table data_score. Each row contains a computed score for an object keyword in a given location on a given date, along with the associated control batch (batch_c) and object batch (batch_o).

Scores are computed by compute_score() as:

score=hitsokChits~kscore = \frac{hits_o}{\sum_{k \in C} \tilde{hits}_k}

where hitsohits_o are object search volumes and hits~k\tilde{hits}_k are control keyword hits mapped to the object scale via an overlap-based benchmark (see Castelnuovo & Tran, 2017). Scores are non-negative; values greater than 1 are possible when object interest exceeds control interest.

In a live database, scores are queryable through gt.env$globaltrends_db after start_db(). Global aggregates use "world" as location.

The example dataset is simulated to resemble outputs derived from real Google Trends data.

Usage

example_score

Format

A tibble with 6 variables:

location

Character. Location code. Global data uses "world".

keyword

Character. Object keyword.

date

Integer. Date stored as days since 1970-01-01. Convert with as.Date(date, origin = "1970-01-01").

score

Double. Normalised search interest (object hits divided by total mapped control hits). Non-negative; 0 when no control data are available.

batch_c

Integer. Control batch id used as baseline.

batch_o

Integer. Object batch id.

References

Castelnuovo, E. & Tran, T. D. (2017). Google It Up! A Google Trends-based Uncertainty index for the United States and Australia. Economics Letters, 161, 149–153. doi:10.1016/j.econlet.2017.09.032

See Also

compute_score(), start_db()


Disconnect from the database and persist changes

Description

Exports the current in-memory SQLite state to the Parquet store under ⁠db/⁠ and closes the DBI connection.

Usage

disconnect_db()

Details

Call this function after all downloads and computations are complete. It overwrites the Parquet files under ⁠db/⁠ with the current in-memory state and then closes the SQLite connection. gt.env$globaltrends_db is set to NULL afterwards; all lazy ⁠tbl_*⁠ handles become invalid.

Data written to the in-memory database during the session will be lost if this function is not called before the R session ends.

Value

Invisibly returns TRUE. Called for its side effects (writing files under ⁠db/⁠ and closing the connection).

See Also

initialize_db() to create the store; start_db() to open a new session.

Examples

## Not run: 
start_db()
# ... downloads and computations ...
disconnect_db()

## End(Not run)

Download data for control keyword batches

Description

Downloads Google Trends search volumes for one or more control batches across a set of locations and appends the results to the database table data_control.

Convenience wrapper around download_control() that downloads the worldwide aggregate instead of country-level data. Equivalent to calling download_control(control, locations = "world").

Usage

download_control(control, locations = NULL)

## S3 method for class 'numeric'
download_control(control, locations = NULL)

## S3 method for class 'list'
download_control(control, locations = NULL)

download_control_global(control)

Arguments

control

Numeric scalar, numeric vector, or list of numeric scalars. Control batch id(s) to download.

locations

Character vector of ISO 3166-1 alpha-2 location codes. Defaults to gt.env$countries when set by start_db(); otherwise falls back to globaltrends::countries. Pass "world" (or use download_control_global()) to download the worldwide aggregate instead of country-level data.

Details

Prerequisites. start_db() must be called before download_control(). It connects to the database and populates gt.env$keywords_control and gt.env$time_control from the tables batch_keywords and batch_time (created via add_keyword()). These in-memory objects are used to look up the keywords and time window for each requested batch.

Dispatch. download_control() is an S3 generic that dispatches on the class of control. Passing a numeric scalar routes to the .numeric method, which performs the actual download. Passing a numeric vector of length > 1 coerces control to a list and delegates to the .list method, which iterates over batches sequentially. Passing a list directly also routes to the .list method.

Download backend. Requests are issued through the internal .get_trend() helper, which uses either gtrendsR::gtrends() (default) or the Google Trends Research API when initialize_python() has been called.

Deduplication. Before downloading, the function queries data_control for locations already present for the requested batch. Only locations not yet in the database are downloaded. If all locations are already present, the function returns early with a message and no requests are made.

Missing data. If the API returns no data for a location (e.g. due to insufficient search volume), the result for that location is silently skipped (nothing is written to data_control) and a "No data returned" message is emitted.

Value

Invisibly returns TRUE. The function is called for its side effects: downloaded rows are appended to data_control in the active database, and one progress message is emitted per location indicating whether data was written or no data was returned.

Invisibly returns TRUE. See download_control() for details on side effects and emitted messages.

Category codes

Avoid category codes unless you are confident they apply uniformly to all keywords in the batch. Google Trends applies a category constraint to the entire request, which can unintentionally change the meaning of control and object keywords.

See Also

start_db() to connect to the database and populate gt.env. add_keyword() to register control batches before downloading. download_control_global() for a convenience wrapper for worldwide data. download_object() to download object keyword data using a control batch for scaling.

Examples

## Not run: 
# Download one control batch for all countries
download_control(control = 1, locations = countries)

# Download several batches sequentially
download_control(control = as.list(1:5), locations = countries)

# Download worldwide aggregate
download_control_global(control = 1)

## End(Not run)

Download data for object keyword batches

Description

Downloads Google Trends search volumes for one or more object batches across a set of locations and appends the results to the database table data_object. Each object batch is downloaded together with one control keyword so that object hits can be mapped to the control scale used elsewhere in the package.

Convenience wrapper around download_object() that downloads the worldwide aggregate instead of country-level data. Equivalent to calling download_object(object, control, locations = "world").

Usage

download_object(object, control = 1, locations = NULL)

## S3 method for class 'numeric'
download_object(object, control = 1, locations = NULL)

## S3 method for class 'list'
download_object(object, control = 1, locations = NULL)

download_object_global(object, control = 1)

Arguments

object

Numeric scalar, numeric vector, or list of numeric scalars. Object batch id(s) to download.

control

Numeric scalar. Control batch id used for mapping. Defaults to 1.

locations

Character vector of ISO 3166-1 alpha-2 location codes. Defaults to gt.env$countries when set by start_db(); otherwise falls back to globaltrends::countries. Pass "world" (or use download_object_global()) to download the worldwide aggregate instead of country-level data.

Details

Prerequisites. start_db() must be called before download_object(). It connects to the database and populates gt.env$keywords_object and gt.env$time_object from the tables batch_keywords and batch_time (created via add_keyword()). These in-memory objects are used to look up the keywords and time window for each requested batch. data_control for the chosen control batch must also be present, as it is used to select an appropriate control keyword per location.

Dispatch. download_object() is an S3 generic that dispatches on the class of object. Passing a numeric scalar routes to the .numeric method, which performs the actual download. Passing a numeric vector of length > 1 coerces object to a list and delegates to the .list method, which iterates over batches sequentially. Passing a list directly also routes to the .list method.

Control keyword selection. For each location the function queries data_control for the chosen control batch, ranks control keywords by their average hits in ascending order, and tries them one by one until one yields non-zero signal in the returned series. Trying lower-signal keywords first reduces saturation risk. If no control keyword produces usable signal, the function stops with an informative error.

Download backend. Requests are issued through the internal .get_trend() helper, which uses either gtrendsR::gtrends() (default) or the Google Trends Research API when initialize_python() has been called.

Deduplication. Before downloading, the function queries data_object for locations already present for the requested ⁠(batch_c, batch_o)⁠ pair. Only locations not yet in the database are downloaded. If all locations are already present, the function returns early with a message and no requests are made.

Missing control baseline. If data_control contains no rows for a given location, that location is skipped with a message (nothing is written to data_object).

Value

Invisibly returns TRUE. The function is called for its side effects: downloaded rows are appended to data_object in the active database, and one progress message is emitted per location. Locations with no control baseline in data_control are skipped with a message.

Invisibly returns TRUE. See download_object() for details on side effects and emitted messages.

Category codes

Avoid category codes unless you are confident they apply uniformly to all keywords in the batch. Google Trends applies a category constraint to the entire request, which can unintentionally change the meaning of control and object keywords.

See Also

start_db() to connect to the database and populate gt.env. add_keyword() to register object batches before downloading. download_object_global() for a convenience wrapper for worldwide data. download_control() to download control keyword data used for scaling.

Examples

## Not run: 
# Download one object batch for all countries
download_object(object = 1, control = 1, locations = countries)

# Download several batches sequentially
download_object(object = as.list(1:5), control = 1, locations = countries)

# Download worldwide aggregate
download_object_global(object = 1, control = 1)

## End(Not run)

Download regional interest data for object keywords

Description

Downloads regional interest data (sub-geo breakdown) for the keywords in one or more object batches (batch_o) and writes the results to the database table data_region.

Convenience wrapper around download_region() that downloads the worldwide aggregate instead of country-level data. Equivalent to calling download_region(object, locations = "world").

Usage

download_region(object, locations = NULL)

## S3 method for class 'numeric'
download_region(object, locations = NULL)

## S3 method for class 'list'
download_region(object, locations = NULL)

download_region_global(object)

Arguments

object

Numeric scalar, numeric vector, or list of numeric scalars. Object batch id(s) to download.

locations

Character vector of location codes. Defaults to gt.env$countries when set by start_db(); otherwise falls back to globaltrends::countries. Pass "world" (or use download_region_global()) to download the worldwide aggregate instead of country-level data.

Details

Prerequisites. initialize_python() must be called before download_region() to initialise the Research API backend. start_db() must also have been called to connect to the database and populate gt.env$keywords_object and gt.env$time_object.

Dispatch. download_region() is an S3 generic that dispatches on the class of object. Passing a numeric scalar routes to the .numeric method, which performs the actual download. Passing a numeric vector of length > 1 coerces object to a list and delegates to the .list method, which iterates over batches sequentially. Passing a list directly also routes to the .list method.

Download backend. Requests are issued through the internal .get_region() helper using the Google Trends Research API. This backend always requires Python to be set up via initialize_python(); unlike download_control(), no gtrendsR fallback is available.

Deduplication. Before downloading, the function queries data_region for locations already present for the requested object batch. Only locations not yet in the database are downloaded. If all requested locations are already present, the function returns early with a message and no requests are made.

Missing data. If the API returns no data for a location (e.g. due to insufficient search volume), the result for that location is silently skipped (nothing is written to data_region) and a "No region data returned" message is emitted.

Value

Invisibly returns TRUE. The function is called for its side effects: downloaded rows are appended to data_region in the active database, and one progress message is emitted per location indicating whether data was written or no data was returned.

Invisibly returns TRUE. See download_region() for details on side effects and emitted messages.

See Also

initialize_python() to set up the Python backend before downloading. start_db() to connect to the database and populate gt.env. add_keyword() to register object batches before downloading. download_region_global() for a convenience wrapper for worldwide data. download_control() to download control keyword data.

Examples

## Not run: 
# Download one object batch for all countries
initialize_python(api_key = "XXX", conda_env = "/path/to/env")
start_db()
download_region(object = 1, locations = countries)

# Download several batches sequentially
download_region(object = as.list(1:3), locations = countries)

# Download worldwide aggregate
download_region_global(object = 1)

## End(Not run)

Export data from database tables

Description

Seven functions for exporting filtered subsets of the four computed data tables. Each function returns a data frame that can be passed directly to standard R I/O functions such as readr::write_csv() or writexl::write_xlsx().

Function Source table Location scope
export_control() data_control (control hits) country/region level
export_control_global() data_control world aggregate only
export_object() data_object (object hits) country/region level
export_object_global() data_object world aggregate only
export_score() data_score (normalized scores) country/region level
export_voi() data_score world aggregate only (VOI)
export_doi() data_doi (internationalization) aggregated across locations

Usage

export_control(control = NULL, location = NULL)

export_control_global(control = NULL)

export_object(keyword = NULL, object = NULL, control = NULL, location = NULL)

export_object_global(keyword = NULL, object = NULL, control = NULL)

export_score(keyword = NULL, object = NULL, control = NULL, location = NULL)

export_voi(keyword = NULL, object = NULL, control = NULL)

export_doi(keyword = NULL, object = NULL, control = NULL, locations = NULL)

Arguments

control

Integer scalar batch id for control data (batch_c or batch).

location

Character vector (or list coercible via unlist()) of location codes to filter by (e.g., values from countries or us_states).

keyword

Character vector (or list coercible via unlist()) of object keywords to export. When provided, overrides object.

object

Integer scalar batch id for object data (batch_o). Ignored if keyword is supplied.

locations

Character scalar naming a location set (e.g., "countries", "us_states"). Applies to export_doi() only.

Details

All filter arguments default to NULL, which disables that filter and returns all rows for that dimension. When keyword is provided it takes precedence over object: the object argument is silently ignored.

Non-⁠_global⁠ functions (export_control(), export_object(), export_score()) exclude the "world" aggregate row. The ⁠_global⁠ counterparts (export_control_global(), export_object_global(), export_voi()) return only the "world" row.

Value

A data frame with the requested rows and a date column of class Date. Batch identifier columns are renamed for clarity:

  • export_control(), export_control_global(): location, keyword, date, hits, control (renamed from batch).

  • export_object(), export_object_global(): location, keyword, date, hits, object (from batch_o), control (from batch_c).

  • export_score(), export_voi(): location, keyword, date, score, control (from batch_c), object (from batch_o).

  • export_doi(): keyword, date, gini, hhi, entropy, control (from batch_c), object (from batch_o), locations.

See Also

Examples

## Not run: 
# Control hits for batch 2
export_control(control = 2)

# World-aggregate control hits
export_control_global(control = 1)

# Object hits for a keyword across all locations
export_object(keyword = "manchester united", location = countries)

# Object hits for multiple keywords
export_object(keyword = c("manchester united", "real madrid"))

# World-aggregate object hits
export_object_global(keyword = "manchester united", control = 1)

# Location-level scores, written to CSV
export_score(object = 3, control = 1, location = us_states) |>
  readr::write_csv("data_score.csv")

# Volume of interest (world-aggregate scores)
export_voi(keyword = "manchester united", control = 1)

# Degree of internationalization for a keyword, written to Excel
export_doi(keyword = "manchester united", control = 2, locations = "us_states") |>
  writexl::write_xlsx("data_doi.xlsx")

## End(Not run)

Report daily Research API usage

Description

Returns the number of Google Trends Research API calls made today, the number remaining before the daily limit is reached, and the limit itself. The counter is stored in gt.env and resets automatically when the calendar date changes.

Usage

get_api_usage()

Details

The counter is incremented once per successful call to the internal helpers .get_trend(), .get_region(), and .get_related() whenever the Research API backend is active (i.e., after initialize_python() has been called). Calls routed through the default gtrendsR scraping backend are not counted.

The daily limit of 10,000 calls is set by Google. The counter does not enforce this limit; it only tracks usage so that callers can monitor their consumption.

Value

A named integer vector with three elements:

calls

Number of Research API calls made today.

remaining

Calls remaining before the daily limit is reached.

limit

The daily limit (always 10000).

See Also

initialize_python() to enable the Research API backend.

Examples

get_api_usage()

Package environment for internal state

Description

gt.env is the internal package environment used to store runtime state and database handles. It centralizes objects that should be shared across functions (e.g., the DBI connection, lazy table references, cached keyword batches).

Usage

gt.env

Format

An environment with parent = emptyenv().

Details

The following bindings may be present in gt.env after package attach and/or after calling initialization functions such as start_db():

  • globaltrends_db: DBI connection/handle to the SQLite database.

  • tbl_locations: Lazy table reference for location sets stored in the DB.

  • tbl_keywords: Lazy table reference for keyword batches stored in the DB.

  • tbl_time: Lazy table reference for time windows stored in the DB.

  • tbl_synonyms: Lazy table reference for keyword synonyms stored in the DB.

  • tbl_doi: Lazy table reference for DOI data stored in the DB.

  • tbl_control: Lazy table reference for control search-volume data.

  • tbl_object: Lazy table reference for object search-volume data.

  • tbl_score: Lazy table reference for computed scores.

  • tbl_related: Lazy table reference for related search terms.

  • tbl_region: Lazy table reference for regional search-volume data.

  • keywords_control: Cached tibble of control keywords by batch (populated by start_db() / exports).

  • time_control: Cached tibble of control batch time windows.

  • keywords_object: Cached tibble of object keywords by batch.

  • time_object: Cached tibble of object batch time windows.

  • keyword_synonyms: Cached tibble of keyword/synonym mappings.

  • query_wait: Numeric scalar. Seconds to wait between API calls (default: 0.1).

  • py_setup: Logical scalar. TRUE if initialize_python() has been called successfully.

  • api_calls: Integer scalar. Number of Research API calls made today (reset automatically at midnight).

  • api_calls_date: Date scalar. The date for which api_calls is counted; used to detect day boundaries.

Implementation notes

The environment is created with parent = emptyenv() to avoid accidental variable capture. Bindings are initialized on package attach so downstream functions can rely on their existence; however, most bindings remain NULL until start_db() (or related setup routines) populates them.

See Also


Initialize the local database store

Description

Creates the local database store used by globaltrends in the current working directory and initializes all required tables and indexes.

Usage

initialize_db()

Details

The package uses SQLite with a Parquet-backed persistence layout under the ⁠db/⁠ folder. initialize_db() creates a transient in-memory SQLite database, builds the schema, populates default location sets, and exports the result as Parquet files (via arrow) to ⁠db/⁠. The in-memory connection is closed before the function returns; call start_db() to open a working session.

If all required Parquet files already exist the function returns early without overwriting anything. If only some files are present (indicating a partial or corrupted store) the function stops with an error.

Default location sets written to data_locations:

countries

ISO 3166-1 alpha-2 codes for countries above the GDP share threshold (see countries).

us_states

ISO 3166-2 codes for US states and Washington DC (see us_states).

Value

Invisibly returns TRUE. Called for its side effects (creating files under ⁠db/⁠).

Concurrency

SQLite allows concurrent readers but only one writer at a time. If you run parallel download workers, use one database directory per worker and merge results afterwards.

See Also

start_db() to open a working session after initialization; disconnect_db() to persist changes and close the session.

Examples

## Not run: 
initialize_db()
start_db()

## End(Not run)

Initialize Python backend for Google Trends Research API

Description

Initializes the Python session required to download data via the Google Trends Research API (not the public gtrendsR::gtrends() scraping route). The function configures the Python interpreter (Conda or virtualenv), stores the API key in gt.env, sources the package's Python helper code, and marks the session as ready for API-based downloads.

Usage

initialize_python(api_key, conda_env = NULL, python_env = NULL)

Arguments

api_key

Character scalar. API key obtained from Google.

conda_env

Optional character scalar. Name or path of a Conda environment (passed to reticulate::use_condaenv()). Supply either this or python_env, not both.

python_env

Optional character scalar. Path to a Python virtual environment (passed to reticulate::use_virtualenv()). Supply either this or conda_env, not both.

Details

Prerequisites. Before calling initialize_python():

  1. Apply for Research API access and obtain an API key via Google's request form.

  2. Create a Python environment (Conda or virtualenv) with google-api-python-client installed.

Environment specification. Exactly one of conda_env or python_env must be supplied; providing neither or both is an error.

Effect on the download backend. Once initialized, all download functions (download_control(), download_object(), download_region(), download_related()) switch from the default gtrendsR::gtrends() scraping route to the Research API.

Value

Invisibly returns TRUE. Called for its side effects: stores api_key in gt.env, sources python/query_gtrends.py, and sets gt.env$py_setup to TRUE to activate the Research API download backend.

See Also

download_control(), download_object(), download_region(), download_related() for the download functions that use the Research API once initialized. reticulate::use_condaenv() and reticulate::use_virtualenv() for Python environment configuration.

Examples

## Not run: 
# Conda environment
initialize_python(
  api_key   = "YOUR_API_KEY",
  conda_env = "/path/to/conda/env"
)

# Virtual environment
initialize_python(
  api_key    = "YOUR_API_KEY",
  python_env = "/path/to/venv"
)

## End(Not run)

Remove data from database tables

Description

Removes batches and derived data from the database. Deletions are greedy: all downstream tables that depend on the deleted entry are automatically cleaned up to keep the database consistent.

Reclaims unused disk space by running VACUUM on the underlying database. Call this after bulk deletions via remove_data() to compact the file and free storage.

Usage

remove_data(table, control = NULL, object = NULL)

vacuum_data()

Arguments

table

Character scalar. The table to delete from. One of "batch_keywords", "batch_time", "data_control", "data_object", "data_score", "data_doi", "data_related", "data_region". See the argument requirements table in Details for which of control and object are required, optional, or ignored for each table.

control

Optional integer-like scalar. Control batch id.

  • Required for table = "data_control".

  • Exactly one of control or object for "batch_keywords" and "batch_time".

  • At least one of control or object for "data_object", "data_score", and "data_doi".

  • Ignored (with a warning) for "data_related" and "data_region".

object

Optional integer-like scalar. Object batch id.

  • Required for table = "data_related" and "data_region".

  • Exactly one of control or object for "batch_keywords" and "batch_time".

  • At least one of control or object for "data_object", "data_score", and "data_doi".

  • Ignored (with a warning) for "data_control".

Details

Dependency chain

Deletions cascade through the following dependency graph:

batch_keywords / batch_time
       |
       v
  data_control
       |
       v
  data_object ---> data_related
       |      \--> data_region
       v
  data_score
       |
       v
   data_doi

For example:

  • Deleting a control batch from data_control removes all data_object rows for that control, then the associated data_score, data_doi, data_related, and data_region rows.

  • Deleting an object batch from batch_keywords removes the corresponding batch_time entry, all data_object rows for that object batch, and everything downstream.

Argument requirements by table

table control object
"batch_keywords", "batch_time" exactly one of exactly one of
"data_control" required ignored
"data_object", "data_score", "data_doi" at least one of at least one of
"data_related", "data_region" ignored required

After deletions, consider running vacuum_data() to reclaim disk space. Vacuuming can take several minutes for large database files.

For SQLite-based backends, VACUUM rewrites the entire database file in place and may take several minutes for large databases. No data is modified; only free pages are reclaimed.

Value

Invisibly returns TRUE on success. The function is called for its side effects (deleting rows).

Invisibly returns TRUE on success.

See Also

Examples

## Not run: 
# Remove a control keyword batch and all data derived from it
remove_data(table = "batch_keywords", control = 1)

# Remove an object keyword batch and all data derived from it
remove_data(table = "batch_keywords", object = 1)

# Remove all object data linked to a control batch
remove_data(table = "data_object", control = 1)

# Remove scores for one specific control-object combination
remove_data(table = "data_score", control = 1, object = 1)

# Remove related-query data for an object batch
remove_data(table = "data_related", object = 1)

# Remove regional breakdown data for an object batch
remove_data(table = "data_region", object = 1)

# Reclaim disk space after bulk deletions
vacuum_data()

## End(Not run)

Start a database session

Description

Loads the Parquet-backed store under ⁠db/⁠ into an in-memory SQLite connection and registers lazy dplyr table handles and cached tibbles in gt.env.

Usage

start_db()

Details

Requires initialize_db() to have been run in the current working directory. All Parquet files are read into an in-memory SQLite instance; the following bindings are written to gt.env:

globaltrends_db

Active DBI connection to the in-memory SQLite instance.

keywords_control, keywords_object

Data frames of control and object keywords by batch (without the type column).

time_control, time_object

Data frames of batch time windows for control and object runs (without the type column).

keyword_synonyms

Data frame of all keyword/synonym pairs.

Location sets are exported as named character vectors via .export_locations().

Value

Invisibly returns TRUE. Called primarily for its side effects.

See Also

initialize_db() to create the store before the first session; disconnect_db() to persist changes and close the session.

Examples

## Not run: 
start_db()
# ... downloads and computations ...
disconnect_db()

## End(Not run)

Default location set: US states

Description

Character vector of US state-level location codes used by the package.

The vector contains the 51 ISO 3166-2 codes of the form "US-XX" for the 50 US states and "US-DC" for the District of Columbia. Pass this vector as the locations argument to compute_score() or compute_doi() for within-US analyses.

Usage

us_states

Format

A character vector of 51 ISO 3166-2 location codes.

See Also

add_locations(), start_db()

Examples

length(us_states)
head(us_states)