List Data from OpenML
Source:R/list_oml_data.R
, R/list_oml_evaluations.R
, R/list_oml_flows.R
, and 4 more
list_oml.Rd
This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.
To find datasets for a specific task type, use list_oml_tasks()
which supports filtering according to the task
type.
Another heuristic to search for possible regression tasks is to search for data sets with
0 number of classes, i.e. by specifying number_classes = 0
.
Usage
list_oml_data(
data_id = NULL,
data_name = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_evaluations(
run_id = NULL,
task_id = NULL,
measures = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_flows(
uploader = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_measures(test_server = test_server_default())
list_oml_runs(
run_id = NULL,
task_id = NULL,
tag = NULL,
flow_id = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_setups(
flow_id = NULL,
setup_id = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
...
)
list_oml_tasks(
task_id = NULL,
data_id = NULL,
number_instances = NULL,
number_features = NULL,
number_classes = NULL,
number_missing_values = NULL,
tag = NULL,
limit = limit_default(),
test_server = test_server_default(),
type = NULL,
...
)
Arguments
- data_id
(
integer()
)
Vector of data ids to restrict to.- data_name
(
character(1)
)
Filter for name of data set.- number_instances
(
integer()
)
Filter for number of instances.- number_features
(
integer()
)
Filter for number of features.- number_classes
(
integer()
)
Filter for number of labels of the target (only classification tasks).- number_missing_values
(
integer()
)
Filter for number of missing values.- tag
(
character()
)
Filter for tags. You can provide multiple tags as character vector.- limit
(
integer()
)
Limit the results tolimit
records. Default is the value of option"mlr3oml.limit"
, defaulting to 5000.- test_server
(
character(1)
)
Whether to use the OpenML test server or public server. Defaults to value of option"mlr3oml.test_server"
, orFALSE
if not set.- ...
(any)
Additional (unsupported) filters, as named arguments.- run_id
(
integer()
)
Vector of run ids to restrict to.- task_id
(
integer()
)
Vector of task ids to restrict to.- measures
(
character()
)
Vector of evaluation measures to restrict to.- uploader
(
integer(1)
)
Filter for uploader.- flow_id
(
integer(1)
)
Filter for flow id.- setup_id
(
integer()
)
Vector of setup ids to restrict to.- type
(
character(1)
)
The task type, supported values are:"clasisf"
,"regr"
,"surv"
and"clust"
.
Value
(data.table()
) of results, or a null data.table if no data set matches the filter criteria.
Details
Filter values are usually provided as single atomic values (typically integer or character).
Provide a numeric vector of length 2 (c(l, u)
) to find matches in the range \([l, u]\).
Note that only a subset of filters is exposed here.
For a more feature-complete package, see OpenML.
Alternatively, you can pass additional filters via ...
using the names of the official API,
c.f. the REST tab of https://www.openml.org/apis.
References
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1–15. doi:10.1007/s00180-017-0742-2 .
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198 .
Examples
# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html