The Curse of Dimensionality: The Art AND Science of Data

Even if you don’t dig deep into data on a daily basis, you can understand the sheer complexity facing the AI industry. There’s so much data that goes into everyday living from grocery shopping to ads served, to healthcare recommendations, and more. We all now understand how much data is employed on a moment-by-moment basis. So it’s easy to understand how machine learning and AI can be plagued by this Curse of Dimensionality as well!

What IS the Curse of Dimensionality?

The Curse of Dimensionality describes the exponential increase in computational activity required to process/analyze ever-increasing data dimensions. In today’s world, we’re seeing this curse affecting fields like machine learning, data mining, and data analysis. While the increase in dimensions can add more information to the data and therefore theoretically improve the quality of that data, it also can add to the redundancy, noise, and complexity of the analysis. 

Machine Learning and AI, specifically, can be affected tremendously by this Curse of Dimensionality by creating new, and exponential, layers of complexity to sift through for a successful machine learning algorithm. Risk of error increases with more features, and therefore, algorithms are more difficult to design and execute with greater features within the data.

What can be done? 

Addressing the Curse of Dimensionality

One solution is found in Dimensionality Reduction. Dimensionality Reduction is a strategy of converting high-dimensional variables to lower-dimensional variables without altering the specific information. It contains no additional variables that make the data analysis easier, and the result is a faster output for an algorithm. 

Another solution is in PCA (Principal Component Analysis), which transforms the data into the most informative space, allowing the use of lesser dimensions in a linear tool. Lesser dimensions are almost as informative as the original data.

Ultimately, the companies addressing the Curse of Dimensionality best are ones who are both great scientists and precise, but also great artists and understand when they have enough data to create great art with data and technology!

In art, one of the hardest things to learn is when to complete a piece. The tendency to overwork something is very real and it takes an expert eye to know when enough is enough. The same can apply here in machine learning and AI when it comes to knowing how much data will give you the optimal performance in your algorithm while leveraging the data in a comprehensive manner. It can be easy to suggest you use it ALL, but in many cases, you fall into the same trap as a fine artist and find yourself under the Curse of Dimensionality. If you’re 95% accurate, it may not be worth the cost of making that last 5% improvement, and the companies that succeed in this industry know when they have enough accuracy to run with for success.