What is Power Automate?
Power Automate is a service that can be used to automate tasks and thus increase the efficiency of
any organization. You can create cloud flows, desktop flows, or business process flows.
Microsoft Power Automate (formerly Microsoft Flow) is cloud-based software that allows employees to create and
run tasks and tasks across multiple applications and services without manual effort.
Extensive data cleanup for CSV using Power Automate:
Requirement:
An email with a CSVCSV attachment is received in the Inbox of the shared mailbox, whose attachment needs data clean up, and after the cleanup, the file has to be uploaded to SharePoint automatically.
Details for the dynamic CSV attachment file and required functionality:
- Data is extensive, with 60K+ records.
- There are more than 15 columns in the CSV file.
- CSV file needs the following cleanup:
- Removal of line breaks for field values in every column.
- Removal of rows with empty Revision numbers where the Revision number is the column.
- Removal of duplicate records (if the entire row is the the same as the existing one).
- Need to drop the CSV file to SharePoint.
- While dropping the file to SharePoint, we the . Ifit otherwise need to check if the file with the same name already exists in the location and if exists, we need to drag the file to another location appending date to the file name and then drop the new file to the SharePoint location.
Observations as a Developer:
- Power automates free license when applied to each loop allowing only 5K records at a time to process, which is why huge data processes will take a lot of time for whole data processing.
- CSV files can be in UNIX and window formats.
- Windows format uses CRLF(\r\n) for row differentiation when a file read as a string.
- When the file is read as a string, LINUX uses LF(\n) for row differentiation, which is the identical for a line break or following line(\n) in between the column values if it exists.
- In power automate for large CSV files. When file content is read directly from an email attachment, it takes more time than temporarily storing it in OneDrive and deleting it once the process is completed.
Power Automate Solution Implementation:
Power Automate Trigger: Whenever an email is sent from a specific account to the account used in power automate connections by filtering the email subject and a CSVCSV file as an attachment, the flow runs.
Step 1
Storing the attachment temporarily in OneDrive by creating a new file and adding file content by taking the attachment content
NOTE: The time taken directly to read large significant CSV file content from an email attachment is greater than the time to save the same content in OneDrive and read from there.
Step 2:
Get file content from the OneDrive file created in the previous step. The file content can be base64 or string. If base64, We need to convert the base64 to string.
Step 3:
Converting base64 to a string and storing it as file content using a variable.
Compose formula:
base64ToString(body(‘Get_file_content’))
if compose success:
replace(outputs(‘Compose’), uriComponentToString(‘%EF%BB%BF’), ”)
replacing the compose output for BOM character
compose fails:
@{body(‘Get_file_content’)}
Step 4:
In the flow, there is no possibility to use Regular expressions to remove line breaks inside every field value in the CSV file
To achieve the above functionality, we use power automate connector Excel run script, pass the file content, remove the line breaks from the field values, and return the row result as JSON.
Create an empty Excel book in OneDrive and link an office script to the Excel workbook created. This way, we can inject JavaScript using office scripts in power to automate and achieve the desired output from the office script.
The following is the office script used to replace line breaks inside the field values and return JSON.
Step 5:
Parse the JSON output from the above step.
Step 6:
Filter the JSON for empty Revision Number field values and get the results has have the data for the Revision Number field value.
Formula used:
@and(equals(empty(item()?[‘Revision Number’]), false), not(equals(length(item()?[‘Revision Number’]), 0)))
Step 7:
Duplicates can be removed from the result using the Union function in power automate.
To get the same order for the columns, we can use the select function and select the value key-value pair from the filtered.
The formula used for the input selection:
union(body(‘Filter_data_that_does_not_have_revision_number’), body(‘Filter_data_that_does_not_have_revision_number’))
Step 8:
Covert data to CSV table
Step 9:
Check if the file already exists in the SharePoint output location.
Step 10:
If the file already exists, update the file name by appending the date to the existing name.
Step 11:
Get the file identifier from Get file properties and move the file to the desired folder using the file identifier.
Step 12:
Create a file in SharePoint using the output from step 8.
Step 10:
Deleting the temporary CSV file created in OneDrive in step 1.
Conclusion:
We can automate complex manual work using power automation and reduce manual effort.
Reference links:
https://learn.microsoft.com/en-us/power-automate/getting-started
https://learn.microsoft.com/en-us/office/dev/scripts/overview/excel