Find archive within an html-type vector
get_archive.Rd
Find all archive of given type within a vector of html text.
Usage
get_archive(
origin,
extension = c("zip", "7z"),
name = NULL,
date = NULL,
version = NULL,
directory = FALSE,
local_origin = NULL
)
Arguments
- origin
character, the url address where to find archives.
- extension
character, vector of acceptable types of archives to be downloaded.
- name
character, vector of acceptable names fo archives to be downloaded. See details
- date
character, something like a date that should be used as a filter. See details
- version
character, something like a version that should be used as a filter.
- directory
logical, should directories be found instead of archives. See details.
- local_origin
character, the local url address to add to relative links. When set to `NULL`` (default), origin is used.
Value
A character vector of all archives or directory found in origin matching with given constraints.
Details
First, all links are retrieved with get_link_from_html()
.
If directory
is TRUE, only finishing pattern "\" are kept, and this pattern
is erased. extension
is not used in this case.
When extension
is givens, it may contain different possibilities. All are
matched at the end of archives' names.
At this point, a check is made to see if remaining links are relative (not
starting by "http" or "https") or absolute. All relative links are completed
with local_origin
. All links are named with base::basename()
, those names
are used for further selections.
name
may contain different possibilities. It is matched at the
beginning of archives' names.
version
may contain different possibilities. Il will be matched
anywhere in archives' names. This may lead an empty character as result.
date
may contain either "last", and so anything that can be considered
as a date in archives' names ("\
against and the max is taken. If nothing matches, all archives' names are
kept. date
may also contain anything admissible for create_date()
.
If so, anything that can be considered as a date in archives' names ("\
"\
date pertain to create_date(date)
are kept, possibly nothing.