Skip to main content

Expanding Document Support in Transformer Lab with Markitdown

ยท 2 min read

We're excited to announce a significant enhancement to Transformer Lab - integration with the open-source Markitdown library from Microsoft! This update dramatically expands the types of documents you can work with in Transformer Lab, making it more versatile and powerful for your AI projects.

Enhanced Document Supportโ€‹

Previously, Transformer Lab supported a limited set of document formats. With the Markitdown integration, you can now upload and process:

  • Microsoft Word documents (docx)
  • Excel spreadsheets
  • PowerPoint presentations (ppt/pptx)
  • HTML files
  • ZIP archives containing multiple documents
  • Plus all previously supported formats (PDF, Markdown, etc.)

Let's explore some of these new capabilities!

Document Upload Demonstrationsโ€‹

Excel Filesโ€‹

Excel spreadsheets are now seamlessly converted to a readable format within Transformer Lab:

GIF

Word Documentsโ€‹

Microsoft Word documents maintain their structure when imported:

GIF

Bulk Uploads with ZIP Archivesโ€‹

Need to process multiple documents at once? Simply zip them up and upload:

GIF

RAG with Various Document Typesโ€‹

One of the most powerful applications of this enhanced document support is the ability to perform Retrieval-Augmented Generation (RAG) on a wider variety of content types. Here's how you can use presentations in your RAG pipeline:

GIF GIF

Introducing Web Content Importโ€‹

Perhaps the most exciting new feature is our "Add Webpage" functionality. When clicking the "+" icon on the Documents page, you'll now see this option which allows you to:

  1. Import any webpage by URL (automatically converted to Markdown)
  2. Import YouTube videos (including metadata and transcript extraction)

Here's how it works:

GIF

The YouTube import feature is particularly powerful - it automatically extracts the video transcript and presents it as a markdown file within Transformer Lab, making video content immediately available for your AI applications.

Conclusionโ€‹

With these enhancements, Transformer Lab becomes an even more versatile platform for working with diverse data sources. Whether you're building RAG applications, training models, or exploring document processing, the expanded format support means less time converting documents and more time focusing on your AI projects.

Give these new features a try and let us know what you think! We're excited to see the new use cases this enables for our community.