UnstructuredXMLLoader
This notebook provides a quick overview for getting started with UnstructuredXMLLoader document loader. The UnstructuredXMLLoader
is used to load XML
files. The loader works with .xml
files. The page content will be the text extracted from the XML tags.
Overviewโ
Integration detailsโ
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
UnstructuredXMLLoader | langchain_community | โ | โ | โ |
Loader featuresโ
Source | Document Lazy Loading | Native Async Support |
---|---|---|
UnstructuredXMLLoader | โ | โ |
Setupโ
To access UnstructuredXMLLoader document loader you'll need to install the langchain-community
integration package.
Credentialsโ
No credentials are needed to use the UnstructuredXMLLoader
If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installationโ
Install langchain_community.
%pip install -qU langchain_community
Initializationโ
Now we can instantiate our model object and load documents:
from langchain_community.document_loaders import UnstructuredXMLLoader
loader = UnstructuredXMLLoader(
"./example_data/factbook.xml",
)
Loadโ
docs = loader.load()
docs[0]
Document(metadata={'source': './example_data/factbook.xml'}, page_content='United States\n\nWashington, DC\n\nJoe Biden\n\nBaseball\n\nCanada\n\nOttawa\n\nJustin Trudeau\n\nHockey\n\nFrance\n\nParis\n\nEmmanuel Macron\n\nSoccer\n\nTrinidad & Tobado\n\nPort of Spain\n\nKeith Rowley\n\nTrack & Field')
print(docs[0].metadata)
{'source': './example_data/factbook.xml'}
Lazy Loadโ
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)
page = []
API referenceโ
For detailed documentation of all __ModuleName__Loader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html
Relatedโ
- Document loader conceptual guide
- Document loader how-to guides