There are numerous threads on the Alteryx Community where people are asking how to covert xls files to xlsx. The solutions generally suggest one of two approaches, either
- Read the file in with an Input Data tool then output again in the new format
- Use the Run Command tool to use an an Excel component to do the conversion
In my (admittedly edge!) case the server doesn’t have Excel installed and I don’t want to pay for a license just for this. Plus the file cannot be read natively by the input tool as it requires pre-processing in python first (it’s a really weird Excel file that looks like an PDF invoice but in Excel 😬)
LibreOffice Batch Converting
My solution is to use LibreOffice. It’s open source, free to use and includes batch conversions that can be run from the command line. By default the UI will not be shown when run in this way.
The basic command to do the conversion is shown below.
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\input_path\file.xls" -outdir "C:\output_path"
–convert-to xlsx should be followed by the name of the xls file to convert and the -outdir will be used to write the xlsx files to. The files will have the same name as the original just with the new extension.
Integrating this into an Alteryx workflow is just like any other using the Run Command to run an external program.
- Use a Directory tool to read all the xls files
- A Formula tool to create the command line for each file
- A Run Command to first write the flow to a batch file and then run the file
The most difficult part is configuring the Run Command. The configuration should look like below so the flow is first written to a file ending with .bat followed by the tool running this newly created batch file.
The xlsconvert.bat file that is created should look something like this with a line per conversion.
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0191928.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0192237.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"
"C:\Program Files\LibreOffice\program\soffice.exe" --convert-to xlsx "C:\Customer projects\Project\Alteryx\Data\Invoice - SBRY0192914.XLS" -outdir "C:\Customer projects\Project\Alteryx\Data\"
...
Each file will be processed turn and written to the -outdir you specified in the formula tool. Voilà .
Conversion Using the Python Tool
If you prefer to use python instead of the run command tool it’s very easy to run the same command. This code will run LibreOffice in headless mode again and silently convert the files from xls to xlsx format.
The only import required is subprocess which is already available to Alteryx in the default install.
from ayx import Alteryx
import subprocess
import os
import pandas as pd
# prepare outbound data
data = {"Files":[]}
# path with files to convert (assume in Data subdirectory to workflow)
PATH = Alteryx.getWorkflowConstant('Engine.WorkflowDirectory') + "Data"
# path to LibreOffice your executable
EXE = 'C:\Program Files\LibreOffice\program\soffice.exe'
# loop files, convert and get converted filename for outputting
for file in os.listdir(PATH):
if file.lower().endswith("xls"):
subprocess.run([EXE, '--convert-to','xlsx',os.path.join(PATH, file),'-outdir',PATH])
filename, file_extension = os.path.splitext(file)
data['Files'].append(os.path.join(PATH, filename, file_extension.lower().replace("xls", "xlsx")))
Alteryx.write(pd.DataFrame(data),1)
I personally prefer the python method over the run command purely because I find it more flexible and easier to set up but it works the same either way.
You could of course also use this method for any other Analytics platform such as KNIME, it could also be integrated into an RPA solution using UiPath or similar tool.
Hi. You can use a free online tool to convert XLS to XLSX https://freetools.site/document-converters/xls-to-xlsx