get_archive.Rd
Find all archive of given type within a vector of html text.
get_archive(
origin,
extension = c("zip", "7z"),
name = NULL,
date = NULL,
version = NULL,
directory = FALSE
)
character, the url address where to find archives.
character, vector of acceptable types of archives to be downloaded.
character, vector of acceptable names fo archives to be downloaded. See details
character, something like a date that should be used as a filter. See details
character, something like a version that should be used as a filter.
logical, should directories be found instead of archives. See details.
A character vector of all archives or directory found in origin matching with given constraints.
First, a regex search is made to find in x names enclosed in href="name" or href='name'.
extension
may contain different possibilities. It will be matches at
the end of archives' names. This may lead to an empty character as result.
name
may contain different possibilities. It will be matched at the
beginning of archives' names. This may lead to an empty character as result.
date
may contain either "last", and so anything that can be considered
as a date in archives' names ("\
against and the max is taken. If nothing matches, all archives' names are
kept. codedate may also contain anything admissible for codecreate_date.
If so, anything that can be considered as a date in archives' names ("\
"\
date pertain to create_date(date)
are kept, possibly nothing.
version
may contain different possibilities. Il will be matched
anywhere in archives' names. This may lead an empty character as result.
If directory
is set to TRUE, extension
is not used. Instead,
links finishing by "\" are looked after.
if (FALSE) {
# RPG archive for year 2010 in data.cquest.org
origin = "https://data.cquest.org/registre_parcellaire_graphique/2010"
file_list = get_archive(origin)
get_archive(origin)
get_archive(origin, version = "34")
get_archive(origin, version = 30:35)
# All RPG archives for any year for region "Occitanie" in data.cquest.org
origin = get_archive(
"https://data.cquest.org/registre_parcellaire_graphique",
directory = TRUE
)
get_archive(origin, version = "R76")
# "geo_siret" archives in data.cquest.org
origin = "https://data.cquest.org/geo_sirene/v2019/last/dep"
get_archive(origin, "gz", c("geo_siret_34", "geo_siret_83"))
get_archive(origin, "gz", c("geo_siret"), version = c("34", "83"))
# "ADMIN EXPRESS" archives in ign
origin = "https://geoservices.ign.fr/adminexpress"
get_archive(origin, "7z", "ADMIN-EXPRESS-COG", date = "last")
get_archive(origin, "7z", "ADMIN-EXPRESS-COG", version = "FRA", date = "last")
get_archive(origin, "7z", "ADMIN-EXPRESS", date = 2021:2022)
}