Data structure, Algorithm, Big O Notation

It is not possible that as a programmer or computer scientist, you never heard or thought about Data Structure and Algorithms. Even if you are not a programmer, you deal with them in everyday life without knowing!

For example, let’s say you bought some stuff as groceries and want to put them in your refrigerator. You definitely think about: How can I store them to access them easier when I need something?

The above simple question is always one of the most essential and challenging aspects of every computer system. Efficiency in access and process. If you are not efficient, you cost yourself, your pocket, and your company a leg and arm.

So what is exactly a data structure? what is an algorithm? how many types do they have? and how can we decide which one to choose?

I try to cover the basics for the above questions.

**Note**: These are really complex and big topics. I am just telling the basics here. If you want to go deep, I suggest taking a course or reading a book!

Let’s go!

Data Structure

Imagine you have some documents such as old bills, certifications, etc. You will access them from time to time for your paperwork. The very first question that comes to everybody’s mind is: How can I organize them?

This is basically the definition of Data Structure. How do we organize our data? The goal of any data structure is to organize the data so that we can access them efficiently when we need them.

The definition of Efficiency depends on the problem you want to solve and your goals. For instance, you might say I need my bills more often than certifications, so I put them on top of the certifications. Or, you can say I need my recent papers more so I order my papers chronologically.

In computer programming, we have different types of data structures to store the data. We can group them like this:

Let’s have a look at the names above:

Primitive: These are the basic data structures that are built in a programming language. These types contain only a value and have a fixed size in memory such as integer, character, and boolean. We use these types to build more complex data structures. They are often directly supported by the hardware.

Non Primitive: These are complex data structures. They are not built-in in most programming languages. Often you need to either use a library or implement them yourself. They can be linear such as arrays and queues. Or can be non-linear such as Hash maps and Graphs. Also, there are dynamic types that do not have a fixed size and can change such as Stack. Or they can be fixed and Static such as Arrays.

Algorithm

Consider you want to access the bill in your papers with the highest cost. What can you do?

In this situation, you need a formula that helps you to access the highest cost as efficiently as possible. We call this Algorithm.

Algorithms are the operations that you perform on your data structure to achieve a specific goal.

For example, finding the maximum in a list of values.

An algorithm must be:

  • Correct: Leads to the expected result
  • Efficient: It is not only enough to be correct. We want the answer as fast and resource-efficient as possible.
How this is related to data structures?

The efficiency of your chosen algorithm depends directly on how you store your data. You can run the same algorithm on two different structures and end up with different efficiency measurements. For example, let’s say we have an algorithm that requires direct access to list members. Here, if we choose a linked list over an array, then we ruin our performance since each access to a linked list requires traversing it. (Read more here about the linked list if you like.) Also, some algorithms belong to a specific data structure. For example, the shortest path between two nodes in a Graph.

There are different groups for algorithms. But one way of seeing it is based on the way they solve a problem:

Divide and Conquer: These types of algorithms divide a complex problem into smaller ones with the same nature. The idea is by solving the small ones we reach the solution for the main problem. An example would be the Merge Sort algorithm that breaks a list into smaller ones and merges those to reach a sorted big list. The problem at hand has to be dividable (Optimal Substructure). This means the solution for small sub-problems should lead to the solution for the main problem. Otherwise, it does not work.

Dynamic programming: It works like divide and conquer. However, here we optimize our performance by remembering repetitive patterns to avoid redundant calculations (Overlapping Subproblems). For instance, in the Fibonacci problem, we can save the old answers and reuse them for the future. Let’s say we calculated Fibonacci(20). Then if we want to calculate Fibonacci(22), we can reuse the answer for Fibonacci(20) without repeating the entire calculation from scratch.

Greedy: As the name suggests, these algorithms pick the “best answer at the moment” with the hope that the best local answer will be the best global one. We also do NOT update our selected answers in the future. An example of it would be Insertion Sort where we insert an incoming element in the position that we think is the best based on the current situation.

Brute-force: This is very simple. We just try to find all possible answers and then decide which one is correct! may sound absurd, but in many problems we have to do it. For example, if you want to find all possible combinations of some words or characters

Randomized: Here we include randomness in our solution. You may say that it jeopardizes the correctness. You are right! But sometimes in the real world, efficiency is more important than 100% precision! so we tolerate some mistakes but “most of the time” works and works really efficiently! (Example: quicksort)

Decide what to pick: Big O Notation

Image by Tumisu from Pixabay

As we saw, we have many algorithms and data structures to pick. But how can we decide what to pick? We can use the Big O Notation.

This concept measures the performance of an algorithm or solution. The performance usually depends on Time and Memory.

But what is it? The ‘O’ stands for order (we measure the order of a solution). The order is the upper bound of the cost (worst-case). Basically, we measure how much our solution consumes time and space to solve the problem in the worst-case scenario.

Why the worst-case?

Most of the time we cannot predict the size of input data in the future. Besides, our resources are limited and costly to maintain or expand. Therefore, we need to know how our solution performs under pressure in the real world.

How does the calculation work? It measures how much the cost grows with respect to the size of the input.

For example, let’s say we have a list of members with the length of n. If the solution we design requires access to each list member once, we can say the order of the solution is linear O(n).

Generally, we have these types of orders: (fastest to slowest)

Constant O(1): When we have a fixed cost regardless of the input size. Note that the fixed cost does not have to be 1. For example, O(100) is fixed so we express it as O(1).

Logarithmic O(log n): When our solution can divide the input and process them partially. For example, in the Binary Search, we can decide which subtree to traverse based on the value comparison with the root.

Linear O(n): When cost grows at the same rate as the input. For instance, when you have a For-loop in your code to traverse the input list.

Linearithmic O(n log n): Slower than linear. We face it by using sorting algorithms such as Quick Sort.

Polynomial O(n^k): This happens in cases with multiple nested loops. K is the degree of polynomial. If the K is 2 we call it Quadratic.

Exponential O(2^n): Very costly as it is obvious!

Time and Memory both matter

Let’s say we have two solutions. The first one has order O(n log n) and the second one is quadratic O(n^2). Does it mean we should go for the first one directly? NO. We need to also consider the Memory. Imagine you have a limited memory and the first solution takes order O(n^2) and the second one order takes O(n) in memory allocation. Which one do you pick now?

Always look at the bottleneck

How can we say what is the order of a solution? This is important since most of the time our code comprises different parts and components. We need to look at the slowest part.

So if we have three parts:

O(n^2), O(n), O(1)

Our code order is quadratic.

O(n^2 + n + 1) == O(n^2)

Hope this will be useful. for you!

The End.