This article provides a step-by-step guide for users wanting to split document pages using Power Automate. When handling extensive documents with delineated pages, such as mortgage applications or HR onboarding packets, it's common practice to group like documents together, facilitated by blank pages, barcodes, or page type alterations.
The recent introduction of the Custom Classification Model within Microsoft Forms Recognizer aids in distinguishing various segments of a document. This facilitates the breakdown of a large document into manageable, smaller sections and pairs well with Microsoft Syntex for classification and metadata extraction.
This process involves understanding the structure of the document and the portions which the Custom Classification Model should be trained on. For instance, the standard invoice example used in the post is split into 2-page invoice, a separator sheet, and a 1-page invoice.
Your document training process should include at least five sample documents for each desired document section. The output will be individual documents corresponding to each section/place marker, all created within a target document library.
The Custom Classification Model can be developed using the Forms Recognizer Studio, a step-by-step wizard that guides you through the Azure portal configuration. The endpoint and key for the Power Automate workflow creation are yielded post-setup.
Post model setup, the appropriate Document Types (Bar Code Separator and Invoice in the example) are created, and five documents for each Document Type are uploaded. Training the model involves labelling the documents and connecting them to the correct Document Type, then testing it with a sample document featuring the corresponding page sections/Document Types.
Having trained the model according to the page sections in the document, it can be utilized in Power Automate for SharePoint document handling. The article provides a complete solution for this available for download from GitHub, including critical actions and configurations.
Some highlighted areas include the name of the model and the resource group key created by Forms Recognizer, which was found in Azure. Also, the initial request to Forms Recognizer returning a JSON file containing the URL needed for result retrieval and the delay activity allowing Forms Recognizer time to process the document and return the JSON response.
Power Automate's logic evaluates each docType, and when a docType changes from "Invoice" to "Bar Code Separator", a document with a 2-page invoice starting on page 1 and ending on page 2 is created. This process repeats until all docType attributes have been evaluated. Documents are then created using the Split PDF action from Adobe.
The final stages involve saving the resultant file to the final SharePoint library, where a Microsoft Syntex model has been configured to classify and extract metadata from the file.
In an era of growing digitalization, effective document management is critical for organizations. The complexity and volume of documents have made traditional document handling methods insufficient. Power Automate proves to be a robust tool for organizing, classifying, and managing large volumes of varying documents professionally and efficiently.
Read the full article Step-by-Step Guide: Splitting Document Pages with PowerAutomate
Our guide today will provide you step by step instructions on how to split document pages using an automation software. More often than not, documents may contain several pages which are separated by a blank page, barcode, or some change in the page format, signaling a logical pause or break. This is frequently seen when you need to conserve storage space or compile similar documents into one bunch.
The recently introduced Custom Classification Model in Microsoft Forms Recognizer allows you to efficiently train a model on your documents, enabling it to recognize various portions of the document. This makes it relatively easy to break down your document into smaller, logical sub-documents, and then use another automation tool to categorize and extract metadata.
This post will elaborate on how to process the document and segregate the pages into smaller documents for processing. First, you need to grasp the architecture of your document and what sections you need to train the Custom Classification Model with.
As an example, we'll use a conventional invoice. The structure of the document would include a 2-page invoice, a separator sheet, and a 1-page invoice. While training the Classification model, prepare at least 5 sample documents for each document section that you want to segregate the pages on and create its own document.
The final result will be 3 separate documents: a 2-page invoice, a separator sheet, and a 1-page invoice. All of these will be produced in a target document library.
Upon creating a Custom Classification Model with Forms Recognizer Studio, the wizard will assist you in creating the right configuration in your Azure portal. Once Forms Recognizer has finalized setting up your project, you'll see the components in Azure. This will provide the endpoint and key required when we build the automation workflow.
Keep in mind to label and train documents appropriately. After the project has been created, generate the suitable Document Types (like Bar Code Separator and Invoice) and then upload 5 documents for each Document Type created. Once you associate them with the correct Document Type, you can proceed to train the model.
Upon model training completion, test it using a sample document with page sections aligned to the document types and have a look at the accurate results on the right.
At this point, since we have a trained model corresponding to the page sections in your document, this can be utilized in an automation software against documents stored in SharePoint. The solution can be conveniently downloaded from GitHub. The key from Azure can be located in the resource group that was built by Forms Recognizer.
The returned initial request to Forms Recognizer will be a JSON file that includes the URL required to retrieve the results once the document has been processed. This URL is located in the response header under the name "Operation-Location".
Afterward, wait for Forms Recognizer to fully process the document and import the JSON response. This response will contain the results we can utilize to split the document.
The logic behind the automation activities will evaluate each docType. On detecting a transition from "Invoice" to "Bar Code Separator", a single document containing a 2-page invoice beginning on page 1 and ending on page 2 will be created. This process repeats until all docType attributes have been evaluated.
Once the automation logic determines that a new document needs to be created with the pages specified in the JSON file, the Split PDF action from Adobe can be used to split the original file that was uploaded. The final file is then saved to the SharePoint library where a Microsoft Syntex model has been configured to positively classify and extract metadata from the file.
Splitting Document Pages, PowerAutomate Guide, Document Splitting, Step-by-Step PowerAutomate, PowerAutomate Document Pages, Guide Splitting Pages, PowerAutomate Step-by-Step, Document Pages Guide, Splitting Pages PowerAutomate, Step-by-Step Document Splitting.