Skip to contents

This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.

To find datasets for a specific task type, use list_oml_tasks() which supports filtering according to the task type. Another heuristic to search for possible regression tasks is to search for data sets with 0 number of classes, i.e. by specifying number_classes = 0.

Usage

list_oml_data(
  data_id = NULL,
  data_name = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_evaluations(
  run_id = NULL,
  task_id = NULL,
  measures = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_flows(
  uploader = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_measures(test_server = test_server_default())

list_oml_runs(
  run_id = NULL,
  task_id = NULL,
  tag = NULL,
  flow_id = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_setups(
  flow_id = NULL,
  setup_id = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  ...
)

list_oml_tasks(
  task_id = NULL,
  data_id = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = limit_default(),
  test_server = test_server_default(),
  type = NULL,
  ...
)

Arguments

data_id

(integer())
Vector of data ids to restrict to.

data_name

(character(1))
Filter for name of data set.

number_instances

(integer())
Filter for number of instances.

number_features

(integer())
Filter for number of features.

number_classes

(integer())
Filter for number of labels of the target (only classification tasks).

number_missing_values

(integer())
Filter for number of missing values.

tag

(character())
Filter for tags. You can provide multiple tags as character vector.

limit

(integer())
Limit the results to limit records. Default is the value of option "mlr3oml.limit", defaulting to 5000.

test_server

(character(1))
Whether to use the OpenML test server or public server. Defaults to value of option "mlr3oml.test_server", or FALSE if not set.

...

(any)
Additional (unsupported) filters, as named arguments.

run_id

(integer())
Vector of run ids to restrict to.

task_id

(integer())
Vector of task ids to restrict to.

measures

(character())
Vector of evaluation measures to restrict to.

uploader

(integer(1))
Filter for uploader.

flow_id

(integer(1))
Filter for flow id.

setup_id

(integer())
Vector of setup ids to restrict to.

type

(character(1))
The task type, supported values are: "clasisf", "regr", "surv" and "clust".

Value

(data.table()) of results, or a null data.table if no data set matches the filter criteria.

Details

Filter values are usually provided as single atomic values (typically integer or character). Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).

Note that only a subset of filters is exposed here. For a more feature-complete package, see OpenML. Alternatively, you can pass additional filters via ... using the names of the official API, c.f. the REST tab of https://www.openml.org/apis.

References

Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1–15. doi:10.1007/s00180-017-0742-2 .

Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198 .

Examples

# For technical reasons, examples cannot be included in this R package.
# Instead, these are some relevant resources:
#
# Large-Scale Benchmarking chapter in the mlr3book:
# https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html
#
# Package Article:
# https://mlr3oml.mlr-org.com/articles/tutorial.html