UnstructuredExcelLoader#

class langchain_community.document_loaders.excel.UnstructuredExcelLoader(
file_path: str | Path,
mode: str = 'single',
**unstructured_kwargs: Any,
)[source]#

Load Microsoft Excel files using Unstructured.

Like other Unstructured loaders, UnstructuredExcelLoader can be used in both β€œsingle” and β€œelements” mode. If you use the loader in β€œelements” mode, each sheet in the Excel file will be an Unstructured Table element. If you use the loader in β€œsingle” mode, an HTML representation of the table will be available in the β€œtext_as_html” key in the document metadata.

Examples

from langchain_community.document_loaders.excel import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(β€œstanley-cups.xlsx”, mode=”elements”) docs = loader.load()

Parameters:
  • file_path (str | Path) – The path to the Microsoft Excel file.

  • mode (str) – The mode to use when partitioning the file. See unstructured docs for more info. Optional. Defaults to β€œsingle”.

  • **unstructured_kwargs (Any) – Keyword arguments to pass to unstructured.

Methods

__init__(file_path[, mode])

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

lazy_load()

Load file.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(
file_path: str | Path,
mode: str = 'single',
**unstructured_kwargs: Any,
)[source]#
Parameters:
  • file_path (str | Path) – The path to the Microsoft Excel file.

  • mode (str) – The mode to use when partitioning the file. See unstructured docs for more info. Optional. Defaults to β€œsingle”.

  • **unstructured_kwargs (Any) – Keyword arguments to pass to unstructured.

async alazy_load() β†’ AsyncIterator[Document]#

A lazy loader for Documents.

Yields:

the documents.

Return type:

AsyncIterator[Document]

async aload() β†’ list[Document]#

Load data into Document objects.

Returns:

the documents.

Return type:

list[Document]

lazy_load() β†’ Iterator[Document]#

Load file.

Return type:

Iterator[Document]

load() β†’ list[Document]#

Load data into Document objects.

Returns:

the documents.

Return type:

list[Document]

load_and_split(
text_splitter: TextSplitter | None = None,
) β†’ list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Raises:

ImportError – If langchain-text-splitters is not installed and no text_splitter is provided.

Returns:

List of Documents.

Return type:

list[Document]

Examples using UnstructuredExcelLoader