RDD vs Dataframe vs Dataset

 RDD vs Dataframe vs Dataset 

Data Serialization 

Off-heap space: Pros and cons

In a sense, the off-heap space lies outside the control of the JVM (though it belongs to the Java process), and for this reason, it is not possible to write complex structures used in JVM languages into it. This raises the need for an intermediate step of serializing the data into a plain byte array, which can then be stored in the off-heap area. When the data is loaded, the reverse process must be performed: deserialization into a form that we can use in Java. These additional steps will of course come at an extra cost, which is why accessing off-heap data will, for obvious reasons, take longer than accessing on-heap data directly.

Since writing and reading data in the off-heap space takes longer, what is the benefit of this approach then? Well, the data stored in the off-heap space are not subject to GC processes, so on the one hand we – the programmers – are responsible for each freeing of memory after a given variable is no longer useful. On the other hand, we relieve the management processes in the JVM by releasing CPU’s time for the rest of the application, so, theoretically, it should result in some resource savings

Comments

Popular posts from this blog

Read and Navigate XML - Beautiful Soup

difference-between-stream-processing-and-message-processing

WordNet in Python