This function allows to find data sets on https://openml.org/d using some simple filter criteria.

Note that only a subset of filters is exposed here. For a more feature-complete package, see OpenML.

list_oml_data_sets(
  data_id = NULL,
  number_instances = NULL,
  number_features = NULL,
  number_classes = NULL,
  number_missing_values = NULL,
  tag = NULL,
  limit = 5000L,
  ...
)

Arguments

data_id

(integer())
Vector of data ids to restrict to.

number_instances

(integer())
Filter for number of instances.

number_features

(integer())
Filter for number of features.

number_classes

(integer())
Filter for number of labels of the target (only classification tasks).

number_missing_values

(integer())
Filter for number of missing values.

tag

(character())
Filter for specific tag. You can provide multiple tags as character vector.

limit

(integer())
Limit the results to limit records. Default is 5000.

...

(any)
Additional filters as named arguments.

Value

(data.table()) of results, or NULL if no data set matches the criteria.

Details

Filter values can be provided as single atomic values (typically integer or character). Provide a numeric vector of length 2 (c(l, u)) to find matches in the range \([l, u]\).

References

Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1--15. doi: 10.1007/s00180-017-0742-2 . Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49--60. doi: 10.1145/2641190.2641198 .

Examples

# \donttest{ list_oml_data_sets(number_instances = 150, number_features = c(1, 10))
#> data_id name version status MajorityClassSize #> 1: 61 iris 1 active 50 #> 2: 969 iris 3 active 100 #> 3: 1099 EgyptianSkulls 1 active NA #> 4: 1413 MyIris 1 active 50 #> 5: 41510 iris 9 active NA #> 6: 41511 iris 10 active 50 #> 7: 41567 iris 11 active NA #> 8: 41568 iris 12 active 50 #> 9: 41582 iris 13 active NA #> 10: 41583 iris 14 active 50 #> 11: 41950 iris_test_upload 1 active 50 #> 12: 41952 TaskCreationTestDataset 1 active NA #> 13: 41953 TaskCreationTestDataset 2 active NA #> 14: 41961 TaskCreationTestDataset 3 active NA #> 15: 41962 TaskCreationTestDataset 4 active NA #> 16: 41996 iris 15 active NA #> 17: 41997 iris 16 active 50 #> 18: 42002 iris 17 active NA #> 19: 42003 iris 18 active 50 #> 20: 42010 iris 19 active NA #> 21: 42011 iris 20 active 50 #> 22: 42015 iris 21 active NA #> 23: 42016 iris 22 active 50 #> 24: 42020 iris 23 active NA #> 25: 42021 iris 24 active 50 #> 26: 42025 iris 25 active NA #> 27: 42026 iris 26 active 50 #> 28: 42030 iris 27 active NA #> 29: 42031 iris 28 active 50 #> 30: 42035 iris 29 active NA #> 31: 42036 iris 30 active 50 #> 32: 42040 iris 31 active NA #> 33: 42041 iris 32 active 50 #> 34: 42045 iris 33 active NA #> 35: 42046 iris 34 active 50 #> 36: 42050 iris 35 active NA #> 37: 42051 iris 36 active 50 #> 38: 42055 iris 37 active NA #> 39: 42056 iris 38 active 50 #> 40: 42065 iris 39 active NA #> 41: 42066 iris 40 active 50 #> 42: 42070 iris 41 active NA #> 43: 42071 iris 42 active 50 #> 44: 42091 iris 43 active NA #> 45: 42097 iris 44 active NA #> 46: 42098 iris 45 active 50 #> 47: 42186 JuanFeldmanIris 1 active 50 #> 48: 42261 iris-example 1 active 50 #> 49: 42535 TEST10e627dcde-UploadTestWithURL 1 active NA #> 50: 42661 iris 46 active NA #> 51: 42699 iris 47 active NA #> 52: 42700 iris 48 active 50 #> data_id name version status MajorityClassSize #> MaxNominalAttDistinctValues MinorityClassSize NumberOfClasses #> 1: 3 50 3 #> 2: 2 50 2 #> 3: NA NA 0 #> 4: 3 50 3 #> 5: 3 NA NA #> 6: 3 50 3 #> 7: 3 NA NA #> 8: 3 50 3 #> 9: 3 NA NA #> 10: 3 50 3 #> 11: 3 50 3 #> 12: 3 NA NA #> 13: 3 NA NA #> 14: 3 NA NA #> 15: 3 NA NA #> 16: 3 NA NA #> 17: 3 50 3 #> 18: 3 NA NA #> 19: 3 50 3 #> 20: 3 NA NA #> 21: 3 50 3 #> 22: 3 NA NA #> 23: 3 50 3 #> 24: 3 NA NA #> 25: 3 50 3 #> 26: 3 NA NA #> 27: 3 50 3 #> 28: 3 NA NA #> 29: 3 50 3 #> 30: 3 NA NA #> 31: 3 50 3 #> 32: 3 NA NA #> 33: 3 50 3 #> 34: 3 NA NA #> 35: 3 50 3 #> 36: 3 NA NA #> 37: 3 50 3 #> 38: 3 NA NA #> 39: 3 50 3 #> 40: 3 NA NA #> 41: 3 50 3 #> 42: 3 NA NA #> 43: 3 50 3 #> 44: 3 NA NA #> 45: 3 NA NA #> 46: 3 50 3 #> 47: 3 50 3 #> 48: NA 50 3 #> 49: NA NA NA #> 50: NA NA NA #> 51: NA NA NA #> 52: NA 50 3 #> MaxNominalAttDistinctValues MinorityClassSize NumberOfClasses #> NumberOfFeatures NumberOfInstances NumberOfInstancesWithMissingValues #> 1: 5 150 0 #> 2: 5 150 0 #> 3: 5 150 0 #> 4: 5 150 0 #> 5: 5 150 0 #> 6: 5 150 0 #> 7: 5 150 0 #> 8: 5 150 0 #> 9: 5 150 0 #> 10: 5 150 0 #> 11: 5 150 0 #> 12: 5 150 0 #> 13: 5 150 0 #> 14: 5 150 0 #> 15: 5 150 0 #> 16: 5 150 0 #> 17: 5 150 0 #> 18: 5 150 0 #> 19: 5 150 0 #> 20: 5 150 0 #> 21: 5 150 0 #> 22: 5 150 0 #> 23: 5 150 0 #> 24: 5 150 0 #> 25: 5 150 0 #> 26: 5 150 0 #> 27: 5 150 0 #> 28: 5 150 0 #> 29: 5 150 0 #> 30: 5 150 0 #> 31: 5 150 0 #> 32: 5 150 0 #> 33: 5 150 0 #> 34: 5 150 0 #> 35: 5 150 0 #> 36: 5 150 0 #> 37: 5 150 0 #> 38: 5 150 0 #> 39: 5 150 0 #> 40: 5 150 0 #> 41: 5 150 0 #> 42: 5 150 0 #> 43: 5 150 0 #> 44: 5 150 0 #> 45: 5 150 0 #> 46: 5 150 0 #> 47: 5 150 0 #> 48: 5 150 0 #> 49: 5 150 0 #> 50: 5 150 0 #> 51: 5 150 0 #> 52: 5 150 0 #> NumberOfFeatures NumberOfInstances NumberOfInstancesWithMissingValues #> NumberOfMissingValues NumberOfNumericFeatures NumberOfSymbolicFeatures #> 1: 0 4 1 #> 2: 0 4 1 #> 3: 0 5 0 #> 4: 0 4 1 #> 5: 0 4 1 #> 6: 0 4 1 #> 7: 0 4 1 #> 8: 0 4 1 #> 9: 0 4 1 #> 10: 0 4 1 #> 11: 0 4 1 #> 12: 0 4 1 #> 13: 0 4 1 #> 14: 0 4 1 #> 15: 0 4 1 #> 16: 0 4 1 #> 17: 0 4 1 #> 18: 0 4 1 #> 19: 0 4 1 #> 20: 0 4 1 #> 21: 0 4 1 #> 22: 0 4 1 #> 23: 0 4 1 #> 24: 0 4 1 #> 25: 0 4 1 #> 26: 0 4 1 #> 27: 0 4 1 #> 28: 0 4 1 #> 29: 0 4 1 #> 30: 0 4 1 #> 31: 0 4 1 #> 32: 0 4 1 #> 33: 0 4 1 #> 34: 0 4 1 #> 35: 0 4 1 #> 36: 0 4 1 #> 37: 0 4 1 #> 38: 0 4 1 #> 39: 0 4 1 #> 40: 0 4 1 #> 41: 0 4 1 #> 42: 0 4 1 #> 43: 0 4 1 #> 44: 0 4 1 #> 45: 0 4 1 #> 46: 0 4 1 #> 47: 0 4 1 #> 48: 0 4 1 #> 49: 0 4 1 #> 50: 0 4 0 #> 51: 0 4 1 #> 52: 0 4 1 #> NumberOfMissingValues NumberOfNumericFeatures NumberOfSymbolicFeatures
# }