The simplest definition of "Big Data" is "it doesn't fit in Excel".
― Stéphane Hamel
Spreadsheets have been a wonderful tool used by mankind ever since we became agricultural and had to keep records of how much wheat we had stored and how much we owed to the government.
The spreadsheet was an instant hit for users of PCs in the late 20th century. Excel has been such a popular tool for storing data, that I would venture to say that even in 2022, most office workers see Excel as their go-to way to store data with its nice columns and rows.
As our world has become more and more centered around data, however, the sheer volume of data can sometimes outgrow Excel's limitations of 1,048,576 rows and 16,384 columns. A single cell can contain 32,767 characters.
But I think Stéphane Hamel had something else in mind besides simply the quantity of data when he said the above quote. The rest of this article we fill focus on the quality or complexity of data.
Excel is great when your data can fit into neat columns and rows. But often data goes much deeper than that. Let me give two examples.
1. A one to many problem
Let's say you want to track two types of things in a single spreadsheet. Like on the left side you want to have a list of companies. But then toward the right side of this spreadsheet you want to show the locations where the companies are located. On the left side you have a single row per company. But then on the right side you would like to have multiple rows, one row per location.
You could resolve this by duplicating on the left side all the company details. But now you have duplicate rows when it would be better to have one per company. Any changes in your company data would have to be changed on every row for that company. There's a similar problem when adding a new location in the future, you would have to copy the Company data into the new row and make sure the Company data matches on all rows. This simply doesn't make sense if we want to efficiently maintain our data.
Another way you could do this is to have two different spreadsheet workbooks: One for the list of companies and one for the list of locations. This doesn't make sense from the user experience perspective, though, because we would like to be able to see both sets of data together.
2. An organizational tree structure
The next example is a question of how to incorporate an organizational tree structure into our data. Your organization might have different levels, different divisions and in different regions. You might want to have a list of companies that service your different organizational units. You might have a "Corporate" vendor, or a vendor that services all your "Manufactring" units, or maybe they only service your Manufacturing units in Argentina. You could try to show this tree structure in your spreadsheet by having columns for the first, second, and third level in your tree structure, but then you can't very easily see the underlying tree structure when viewing the spreadsheet.
The Solution

A "Spreadsheet" that shows an organizational structure and a one to many relationship
The problem is that this complexity of data simply doens't fit into a spreadsheet. To solve this you need to present your data using a different tool. I built the tool that you can see above. You can try a Demo of this tool at
https://automateofficework.com/tools/sheets/demo.
Note: Those of you who take the time to try out the demo will notice a few things. It's a beautiful user experience. The tool takes care of all the formatting so the user can focus simply on content. The second column if you try to edit it is filled by a drop-down menu of the risk area. Select one and the first column is automatically filled in with the parent category. You also cannot have a duplicate risk area in the same organizational unit. So if you try to select the same risk area that is already present in the spreadsheet under that organizational unit, you can't. It's grayed out. You'll also notice once you have selected a cell and clicked away from it it now has a red border. You can now use the arrow keys on your keyboard to navigate around. Press "Enter" to edit a cell and the "Esc" key to exit. You can also press Ctrl+C and Ctrl+V to copy and paste or press "Delete" to remove the contents of a cell. The "+" buttons are to add a risk area under an organizational unit or to add a mitigation plan under a risk area.
The problem is that a spreadsheet can only show data in two dimensions. Your data might need 3, 4, 5, or 6 dimensions.
You're used to thinking of data in terms of columns and rows like
+----------+---------+------------------------+----------------+
| Col1 | Col2 | Col3 | Numeric Column |
+----------+---------+------------------------+----------------+
| Value 1 | Value 2 | 123 | 10.0 |
| Separate | cols | with a tab or 4 spaces | -2,027.1 |
| One cell | | | |
+----------+---------+------------------------+----------------+
But a different way of representing data, one that allows for an infinite number of dimensions, is like this:
{
"tree": {
"Global": {
"problems": [{
"Risk Area": "Organizational",
"Risk Category": "Funding",
"Risk Level": "Low",
"Global Priority": "Data and Technology",
"plans": [{
"ID": "123",
"Mitigation Activity Type": "Avoidance",
"Title": "Project Red Shield",
"Description": "Put in place plan to align agreement process",
"Frequency": "Monthly",
"Start Date": "1 Jan 2022",
"End Date": "1 Jan 2024",
"Risk Owner Email": "john@acme.com",
"Budget": "$20,000",
"Status": "yellow",
"Status Description": ""
}, {
"ID": "135",
"Mitigation Activity Type": "Minimization",
"Title": "",
"Description": "",
"Frequency": "",
"Start Date": "",
"End Date": "",
"Risk Owner Email": "",
"Budget": "",
"Status": "green",
"Status Description": ""
}, {
"ID": "246",
"Mitigation Activity Type": "Compensatory Mitigation",
"Title": "",
"Description": "",
"Frequency": "jkljfdljafsd",
"Start Date": "",
"End Date": "",
"Risk Owner Email": "",
"Budget": "",
"Status": "red",
"Status Description": ""
}]
}, {
"Risk Area": "Technical risks",
"Risk Category": "Requirements",
"Risk Level": "",
"Global Priority": "",
"plans": [{
"ID": "",
"Mitigation Activity Type": "",
"Title": "",
"Description": "",
"Frequency": "",
"Start Date": "",
"End Date": "",
"Risk Owner Email": "",
"Budget": "",
"Status": "",
"Status Description": ""
}]
}],
"tree": {
"Divisions": {
"problems": [],
"tree": {
"Manufacturing": {
"problems": [{
"Risk Area": "External",
"Risk Category": "Subcontractors and Suppliers",
"Risk Level": "Significant",
"Global Priority": "Other",
"plans": [{
"ID": "ABCXYZ",
"Mitigation Activity Type": "Minimization",
"Title": "",
"Description": "",
"Frequency": "",
"Start Date": "",
"End Date": "",
"Risk Owner Email": "",
"Budget": "",
"Status": "yellow",
"Status Description": ""
}]
}],
"tree": {
"Procurement": {
"problems": [],
"tree": {
"Direct": {
"problems": []
},
"Indirect": {
"problems": []
}
}
}
}
},
"Commercial": {
"problems": [],
"tree": {
"Indirect": {
"problems": [{
"Risk Area": "Technical risks",
"Risk Category": "Requirements",
"Risk Level": "",
"Global Priority": "",
"plans": [{
"ID": "",
"Mitigation Activity Type": "",
"Title": "",
"Description": "",
"Frequency": "",
"Start Date": "",
"End Date": "",
"Risk Owner Email": "",
"Budget": "",
"Status": "",
"Status Description": ""
}]
}]
},
"Retail": {
"problems": []
}
}
}
}
},
"Regions": {
"problems": [],
"tree": {
"Asia": {
"problems": [],
"tree": {
"China": {
"problems": []
},
"India": {
"problems": []
}
}
},
"Europe": {
"problems": [],
"tree": {
"France": {
"problems": []
},
"Germany": {
"problems": []
},
"UK": {
"problems": []
}
}
}
}
}
}
}
}
}
This data is formatted in JSON, JavaScript Object Notation. This is all the data we need to store the data shown in the screenshot of the solution.
Conclusion
We need to stop limiting our data to whatever can fit into Excel. Everyday office workers will need to become more data literate and able to work with data. We need to be able to have conversations with the people who provide the tools for us to do our work.
I'm not saying that we need to start hand-coding all our data in JSON. I mean that people working with data should be able to recognize that their data is too complex for the tools they are using. And they should demand the right tool for the job. A simple tool like the one that I've demonstrated parses out the JSON data into a format that humans can much more easily work and engage with. Each time the "spreadsheet" is changed, the data is automatically compiled back into JSON and stored on the server.
Going Further
Wouldn't it be nice if we could delegate editing rights for certain rows to the people who need to be able to edit them? We are often the overseers of a "Master data" spreadsheet but we get changes from other people around the organization. Here we have a tree structure and we could easily indicate which poeple have editing rights for the data related to their organizational unit. Having them come to this large spreadsheet would probably be overwhelming. But we could turn the name of their organizational unit into a hyper link and link to a page dedicated to showing only the data for their unit in a simple and elegant way. We could then allow them to edit their data on this page and their changes would automatically be applied to the entire set. We could even keep a log of the changes and email the master spreadsheet aministrator so they are aware when changes are made. See how much better this is than taking the lazy way and creating yet another spreadsheet? When we think through our processes we can demand better tools.
Please let me know in the comments below or on LinkedIn if you would like to know how I built the Sheets demo. I can show how to build it inside of SharePoint or as a stand-alone web site. If there's demand, I could also make this a tool available on the Automate Office Work website where you can create, share, and manage your big data "spreadsheets".
—Justin