Why Upgrading XLS to XLSX Is Worth Your Time
Ask any seasoned Java developer who’s worked with Excel files long enough, and you’ll probably hear a similar refrain: the old XLS Excel format is clunky and annoying. It’s been around since the late ’80s, and while it’s still supported in a lot of systems, it’s not doing us many favors today. It was, after all, replaced with XLSX for a reason.
Unfortunately, there’s still a lot of important data packed in those old binary XLS containers, and some developers are tasked with making clean conversions to XLSX to improve the usability (and security) of that data for the long run.
In this article, we’re taking a close look at why converting legacy XLS files to the newer XLSX format is important. We’ll dig into what changes under the hood during that conversion, why XLSX is clearly much better suited to modern workflows, and what your best options are to efficiently build out programmatic XLS to XLSX conversions in a Java application.
Why People Still Use XLS — And Why It’s a Headache
The older XLS format uses a binary file structure. That alone sets it apart from almost every major document format we use today, which tend to favor XML or JSON-based standards. If you’ve stumbled across an old finance export or been forced to inherit reporting logic from 2006, there’s a good chance you’ve dealt with binary Excel data, and you probably haven’t been excited to do that again.
XLS format has some hard limits baked in that don’t make a ton of sense today. It’s capped at just over 65,000 rows and 256 columns per sheet (a staggering 983,576 rows and 16,128 columns short of modern XLSX capabilities), and finding clean interoperability for XLS with newer APIs or cloud services can be hit-or-miss.
Even more frustratingly, because XLS is a binary container, you can’t easily crack it open to see what’s wrong internally. You’re stuck relying on some library to parse XLS contents correctly — and good luck if you hit something nonstandard during that process. In comparison, looking for errors in the Open XML file structure that XLSX (and all modern MS Office) files use is like searching for typos in a kid’s book.
And then there’s the tooling: popular open-source Java conversion libraries like Apache POI can handle XLS files, but that option requires a different code path, different classes, and generally more brittle behavior compared to working with XLSX and other Open XML files. We’ll cover this particular challenge in some more detail later on in this article.
What’s So Great About XLSX Anyway?
Modern Excel’s XLSX format is part of Microsoft’s Open Office XML standard. That means it’s just a ZIP archive full of plain, neatly organized XML files at its most basic level. XML is both human-readable and machine-friendly: the best of both worlds.
Instead of being one giant binary blob, each part of the XLSX spreadsheet — the worksheets, the shared string table, the style definitions — is broken out into neatly into a series of structured XML documents.
For example, if we built a simple spreadsheet with the following content:
We would find this exact data represented in the worksheet XML file like so:
The column display settings are defined in the <cols>
tag, and the actual cell data (which carries shared string references in this case) is represented in the <sheetData>
tag. You don’t need an advanced degree in any computational field to figure out what’s going on here, and that’s a great thing.
This structure really does matter. It makes debugging easier, version control more sensible, the format extensible, and the entire file more future-proof. And, of course, it plays far, far more nicely with open-source tools, cloud APIs, and Java libraries, which typically prefer a diet of well-defined portable formats.
So, if you’re working on anything that involves transforming spreadsheet data, exposing it via APIs, or piping it through cloud platforms, XLSX is by far the safer and more scalable choice. There are numerous reasons why binary containers were eliminated in favor of compressed XML to begin with, and everything we’ve talked about here is a contributing factor.
What Actually Happens During the XLS to XLSX Upgrade
Upgrading XLS to XLSX programmatically is a bit more complex than the simple “Save As” operation Excel lets you do manually within the Excel desktop application.
Under the hood, binary to compressed XML conversion involves some heavy lifting. The old binary workbook must be unpacked and rewritten entirely into an XML-based structure. That means all the cells, rows, and sheets get redefined as the appropriate set of XML elements. Each style, font, and border from the XLS binary container gets converted into an XML equivalent, too, and the formulas get reserialized.
If XLS files carry legacy macros or embedded objects (which, by the way, you should never implicitly trust the security of in ANY spreadsheet handler), the story gets a little messier. Old macros and objects don’t always translate cleanly into modern Excel, and you can easily lose fidelity depending on the conversion library you’re using. Excel XLSX also doesn’t support macros directly the way XLS does; macros will either be cleansed from the XLSX file automatically, or the Excel application will suggest redefining the file as an XLSM (macro-enabled XLSX) document.
Thankfully, though, the vast majority of XLS spreadsheet conversions will store little more than tabular data, basic formatting, and simple formulas. The conversion for those files to XLSX tends to be much smoother.
Open-Source Libraries That Get the Job Done
Apache POI is still the best open-source default for Excel work in Java, and it supports both XLS and XLSX.
That said, there’s a significant catch in this instance: you’re working with two separate APIs to handle XLS and XLSX documents. For XLS files, you’ll be using the HSSF API (which literally stands for “Horrible Spreadsheet Format”), and for XLSX, you’ll be using the XSSF API (which simply stands for “XML Spreadsheet Format”).
In practice, building a conversion workflow via Apache POI means you’ll need to 1) load the XLS file with HSSFWorkbook, 2) build a new XSSFWorkbook, and 3) (tragically) manually copy each sheet, row, and cell from one to the other. It’s certainly doable — but it’s also extremely tedious. In this case, POI unfortunately doesn’t give you the magic method you probably want for file format conversions. You’ll need to write that translation logic yourself.
Still, if you’re already using POI in your project for another purpose, or if you just want maximum control over the workbook structure, it’s a solid option. Just don’t expect it to be elegant.
Handling XLS to XLSX With a Third-Party Web API
A simpler option for handling XLS to XLSX conversions involves using a fully realized web API solution. This abstracts the complexity away from your environment. The option we’ll demonstrate here isn’t open source, and it does require an API key, but it’ll plug straight into your Java project, and it’ll use very minimal code compared to patchwork open-source solutions. Below, we’ll walk through code examples you can use to structure your API call for XLS to XLSX conversions.
If we’re working with Maven, we’ll first add the following reference to our pom.xml
repository:
And we’ll then add a reference to our pom.xml
dependency:
If we’re working with Gradle, we’ll need to add it in our root build.gradle
(at the end of repositories):
allprojects {
repositories {
...
maven { url 'https://jitpack.io' }
}
}
And then add the dependency in build.gradle
:
dependencies {
implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25'
}
After we’ve installed the SDK, we’ll place the Import classes at the top of our file (commented out for now):
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ConvertDocumentApi;
Finally, we’ll configure the API client, set our API key in the authorization snippet, and make our XLS to XLSX conversion:
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on.
try {
byte[] result = apiInstance.convertDocumentXlsToXlsx(inputFile);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling ConvertDocumentApi#convertDocumentXlsToXlsx");
e.printStackTrace();
}
We’ll get our XLSX file content as a byte array (byte[] result
), and we can write that content to a new file with the .xlsx
extension. This simplifies automated XLS to XLSX conversion workflows considerably.
Conclusion
In this article, we learned about the differences between XLS and XLSX formats and discussed the reasons why XLSX is clearly the superior modern format. We suggested a popular open-source library as one option for building automated XLS to XLSX conversion logic in Java and a fully realized web API solution to abstract the entire process away from our environment.