check_X_y#

sklearn.utils.check_X_y(X, y, accept_sparse=False, *, accept_large_sparse=True, dtype='numeric', order=None, copy=False, force_writeable=False, force_all_finite='deprecated', ensure_all_finite=None, ensure_2d=True, allow_nd=False, multi_output=False, ensure_min_samples=1, ensure_min_features=1, y_numeric=False, estimator=None)[source]#

Input validation for standard estimators.

Checks X and y for consistent length, enforces X to be 2D and y 1D. By default, X is checked to be non-empty and containing only finite values. Standard input checks are also applied to y, such as checking that y does not have np.nan or np.inf targets. For multi-label y, set multi_output=True to allow 2D and sparse y. If the dtype of X is object, attempt converting to float, raising on failure.

Parameters:
X{ndarray, list, sparse matrix}

Input data.

y{ndarray, list, sparse matrix}

Labels.

accept_sparsestr, bool or list of str, default=False

String[s] representing allowed sparse matrix formats, such as โ€˜cscโ€™, โ€˜csrโ€™, etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparsebool, default=True

If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse will cause it to be accepted only if its indices are stored with a 32-bit dtype.

Added in version 0.20.

dtypeโ€˜numericโ€™, type, list of type or None, default=โ€™numericโ€™

Data type of result. If None, the dtype of the input is preserved. If โ€œnumericโ€, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order{โ€˜Fโ€™, โ€˜Cโ€™}, default=None

Whether an array will be forced to be fortran or c-style. If None, then the input dataโ€™s order is preserved when possible.

copybool, default=False

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_writeablebool, default=False

Whether to force the output array to be writeable. If True, the returned array is guaranteed to be writeable, which may require a copy. Otherwise the writeability of the input array is preserved.

Added in version 1.6.

force_all_finitebool or โ€˜allow-nanโ€™, default=True

Whether to raise an error on np.inf, np.nan, pd.NA in array. This parameter does not influence whether y can have np.inf, np.nan, pd.NA values. The possibilities are:

  • True: Force all values of X to be finite.

  • False: accepts np.inf, np.nan, pd.NA in X.

  • โ€˜allow-nanโ€™: accepts only np.nan or pd.NA values in X. Values cannot be infinite.

Added in version 0.20: force_all_finite accepts the string 'allow-nan'.

Changed in version 0.23: Accepts pd.NA and converts it into np.nan

Deprecated since version 1.6: force_all_finite was renamed to ensure_all_finite and will be removed in 1.8.

ensure_all_finitebool or โ€˜allow-nanโ€™, default=True

Whether to raise an error on np.inf, np.nan, pd.NA in array. This parameter does not influence whether y can have np.inf, np.nan, pd.NA values. The possibilities are:

  • True: Force all values of X to be finite.

  • False: accepts np.inf, np.nan, pd.NA in X.

  • โ€˜allow-nanโ€™: accepts only np.nan or pd.NA values in X. Values cannot be infinite.

Added in version 1.6: force_all_finite was renamed to ensure_all_finite.

ensure_2dbool, default=True

Whether to raise a value error if X is not 2D.

allow_ndbool, default=False

Whether to allow X.ndim > 2.

multi_outputbool, default=False

Whether to allow 2D y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True.

ensure_min_samplesint, default=1

Make sure that X has a minimum number of samples in its first axis (rows for a 2D array).

ensure_min_featuresint, default=1

Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when X has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.

y_numericbool, default=False

Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms.

estimatorstr or estimator instance, default=None

If passed, include the name of the estimator in warning messages.

Returns:
X_convertedobject

The converted and validated X.

y_convertedobject

The converted and validated y.

Examples

>>> from sklearn.utils.validation import check_X_y
>>> X = [[1, 2], [3, 4], [5, 6]]
>>> y = [1, 2, 3]
>>> X, y = check_X_y(X, y)
>>> X
array([[1, 2],
      [3, 4],
      [5, 6]])
>>> y
array([1, 2, 3])