This is the class for collections (previously known as studies) served on
https://www.openml.org.
A collection can either be a task collection
or run collection.
This object can also be constructed using the sugar function ocl()
.
Run Collection
A run collection contains runs, flows, datasets and tasks.
The primary object are the runs (main_entity_type
is "run"
).
The the flows, datasets and tasks are those used in the runs.
Task Collection
A task collection (main_entity_type = "task"
) contains tasks and datasets.
The primary object are the tasks (main_entity_type
is "task"
).
The datasets are those used in the tasks.
Note: All Benchmark Suites on OpenML are also collections.
Caching
The OpenML collection itself cannot be not cached, this is because it can be modified in-place
on the server, e.g. by adding or removing tasks or runs.
The construction argument cache
therefore only controls wether caching is applied to the
OpenML objects that are contained in the collection.
mlr3 Intergration
Obtain a list of mlr3::Tasks using mlr3::as_tasks.
Obtain a list of mlr3::Resamplings using mlr3::as_resamplings.
Obtain a list of mlr3::Learners using mlr3::as_learners (if main_entity_type is "run").
Obtain a mlr3::BenchmarkResult using mlr3::as_benchmark_result (if main_entity_type is "run").
References
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. doi:10.1145/2641190.2641198 .
Super class
mlr3oml::OMLObject
-> OMLCollection
Active bindings
desc
(
list()
)
Colllection description (meta information), downloaded and converted from the JSON API response.parquet
(
logical(1)
)
Whether to use parquet.main_entity_type
(
character(n)
)
The main entity type, either"run"
or"task"
.flow_ids
(
integer(n)
)
An vector containing the flow ids of the collection.data_ids
(
integer(n)
)
An vector containing the data ids of the collection.run_ids
(
integer(n)
)
An vector containing the run ids of the collection.task_ids
(
integer(n)
)
An vector containing the task ids of the collection.runs
(
data.table()
) A data.table summarizing the runs included in the collection. Returns NULL for Task Collections.flows
(
data.table()
) A data.table summarizing the flows included in the collection. ReturnsNULL
for Task Collections.data
(
data.table()
) A data.table summarizing the datasets included in the collection.tasks
(
data.table()
) A data.table summarizing the tasks included in the collection.
Methods
Inherited methods
Method new()
Creates a new instance of this R6 class.
Usage
OMLCollection$new(
id,
cache = cache_default(),
parquet = parquet_default(),
test_server = test_server_default()
)
Arguments
id
(
integer(1)
)
OpenML id for the object.cache
(
logical(1)
|character(1)
)
See fieldcache
for an explanation of possible values. Defaults to value of option"mlr3oml.cache"
, orFALSE
if not set. The collection itself is not cached, this is because it can be modified in-place on OpenML, e.g. by adding or removing tasks or runs. This parameter therefore only controls whether the contained elements are cached when loaded, e.g. when accessing the included tasks.parquet
(
logical(1)
)
Whether to use parquet instead of arff. If parquet is not available, it will fall back to arff. Defaults to value of option"mlr3oml.parquet"
orFALSE
if not set.test_server
(
character(1)
)
Whether to use the OpenML test server or public server. Defaults to value of option"mlr3oml.test_server"
, orFALSE
if not set.
Examples
try({
library("mlr3")
# OpenML Run collection:
run_collection = OMLCollection$new(id = 232)
# using sugar
run_collection = ocl(id = 232)
print(run_collection)
# OpenML task collection:
task_collection = OMLCollection$new(id = 258)
# using sugar
task_collection = ocl(id = 258)
print(task_collection)
}, silent = TRUE)
#> <OMLCollection: 232>
#> * data: 2
#> * tasks: 2
#> * flows: 2
#> * runs: 4
#> <OMLCollection: 258>
#> * data: 12
#> * tasks: 12