List Data from OpenML
Source:R/list_oml_collections.R, R/list_oml_data.R, R/list_oml_evaluations.R, and 5 more
list_oml.RdThis function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.
To find datasets for a specific task type, use list_oml_tasks() which supports filtering according to the task
type.
Another heuristic to search for possible regression tasks is to search for data sets with
0 number of classes, i.e. by specifying number_classes = 0.
Usage
list_oml_collections(
uploader = NULL,
status = "all",
main_entity_type = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_data(
data_id = NULL,
data_name = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_evaluations(
run_id = NULL,
task_id = NULL,
measures = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_flows(
uploader = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_measures(test_server = test_server_default())
list_oml_runs(
run_id = NULL,
task_id = NULL,
tag = NULL,
flow_id = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_setups(
flow_id = NULL,
setup_id = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_tasks(
task_id = NULL,
data_id = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
type = NULL,
...
)Arguments
- uploader
(
integer(1))
Filter for uploader.- status
(
character(1))
Should be one of "active", "in_preparation", "deactivated", "all". By default "all" studies are returned.- main_entity_type
(
character(1)|NULL)
Filter for main entity type. Can be "run" or "task".- limit
(
integer())
Limit the results tolimitrecords. Default is the value of option"mlr3oml.limit", defaulting to 5000.- test_server
(
character(1))
Whether to use the OpenML test server or public server. Defaults to value of option"mlr3oml.test_server", orFALSEif not set.- ...
(any)
Additional (unsupported) filters, as named arguments.- data_id
(
integer())
Vector of data ids to restrict to.- data_name
(
character(1))
Filter for name of data set.- number_instances
(
integer())
Filter for number of instances.- number_features
(
integer())
Filter for number of features.- number_classes
(
integer())
Filter for number of labels of the target (only classification tasks).- number_missing_values
(
integer())
Filter for number of missing values.- tag
(
character())
Filter for tags. You can provide multiple tags as character vector.- run_id
(
integer())
Vector of run ids to restrict to.- task_id
(
integer())
Vector of task ids to restrict to.- measures
(
character())
Vector of evaluation measures to restrict to.- flow_id
(
integer(1))
Filter for flow id.- setup_id
(
integer())
Vector of setup ids to restrict to.- type
(
character(1))
The task type, supported values are:"clasisf","regr","surv"and"clust".
Value
(data.table()) of results, or a null data.table if no data set matches the filter criteria.
Details
Filter values are usually provided as single atomic values (typically integer or character).
Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).
Note that only a subset of filters is exposed here.
For a more feature-complete package, see OpenML.
Alternatively, you can pass additional filters via ... using the names of the official API,
c.f. the REST tab of https://www.openml.org/apis.
References
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1–15. doi:10.1007/s00180-017-0742-2 .
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198 .
Examples
# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html