Decorators in Python

Posts

Showing posts from March, 2024

Performance factors affecting spark

March 30, 2024

Spark Memory Performance is sensitive to application code, configuration settings, data layout and storage, multi-tenancy, resource allocation and elasticity in cloud deployments like Amazon EMR, Microsoft Azure, Google Dataproc, Qubole, etc. tuning memory usage: the amount of memory used by your objects, the cost of accessing those objects, and the overhead of “garbage collection”

Spark Context , Spark Session and JVM

March 26, 2024

Why we have one spark context per JVM ? handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run. If one of the processes hangs, or fails, or it's security is compromised, the others don't get affected. I think having separate runtimes also helps GC, because it has less references to handle than if it was altogether. What is JVM ? The Java Virtual Machine (JVM) is the virtual machine that runs the Java bytecodes. The JVM doesn't understand Java source code; that's why you need compile your *.java files to obtain *.class files that contain the bytecodes understood by the JVM. It's also the entity that allows Java to be a "portable language" ( write once, run anywhere ). Indeed, there are specific implementations of the JVM for different systems (Windows, Linux, macOS,...