check_array#

sklearn.utils.check_array(array, accept_sparse=False, *, accept_large_sparse=True, dtype='numeric', order=None, copy=False, force_writeable=False, force_all_finite='deprecated', ensure_all_finite=None, ensure_non_negative=False, ensure_2d=True, allow_nd=False, ensure_min_samples=1, ensure_min_features=1, estimator=None, input_name='')[source]#

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters:
arrayobject

Input object to check / convert.

accept_sparsestr, bool or list/tuple of str, default=False

String[s] representing allowed sparse matrix formats, such as β€˜csc’, β€˜csr’, etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparsebool, default=True

If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

Added in version 0.20.

dtypeβ€˜numeric’, type, list of type or None, default=’numeric’

Data type of result. If None, the dtype of the input is preserved. If β€œnumeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order{β€˜F’, β€˜C’} or None, default=None

Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copybool, default=False

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_writeablebool, default=False

Whether to force the output array to be writeable. If True, the returned array is guaranteed to be writeable, which may require a copy. Otherwise the writeability of the input array is preserved.

Added in version 1.6.

force_all_finitebool or β€˜allow-nan’, default=True

Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

  • True: Force all values of array to be finite.

  • False: accepts np.inf, np.nan, pd.NA in array.

  • β€˜allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.

Added in version 0.20: force_all_finite accepts the string 'allow-nan'.

Changed in version 0.23: Accepts pd.NA and converts it into np.nan

Deprecated since version 1.6: force_all_finite was renamed to ensure_all_finite and will be removed in 1.8.

ensure_all_finitebool or β€˜allow-nan’, default=True

Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

  • True: Force all values of array to be finite.

  • False: accepts np.inf, np.nan, pd.NA in array.

  • β€˜allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.

Added in version 1.6: force_all_finite was renamed to ensure_all_finite.

ensure_non_negativebool, default=False

Make sure the array has only non-negative values. If True, an array that contains negative values will raise a ValueError.

Added in version 1.6.

ensure_2dbool, default=True

Whether to raise a value error if array is not 2D.

allow_ndbool, default=False

Whether to allow array.ndim > 2.

ensure_min_samplesint, default=1

Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_featuresint, default=1

Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.

estimatorstr or estimator instance, default=None

If passed, include the name of the estimator in warning messages.

input_namestr, default=””

The data name used to construct the error message. In particular if input_name is β€œX” and the data has NaN values and allow_nan is False, the error message will link to the imputer documentation.

Added in version 1.1.0.

Returns:
array_convertedobject

The converted and validated array.

Examples

>>> from sklearn.utils.validation import check_array
>>> X = [[1, 2, 3], [4, 5, 6]]
>>> X_checked = check_array(X)
>>> X_checked
array([[1, 2, 3], [4, 5, 6]])