Programming in Big data
A student wrote: “Since Big data is in high demand in every industry, what programming skills do I need to work in this area? Please help.”
Answer: The majority of Big data works today is programmed in R, Python, Java, and MatLab. If you want to do programming for Big data analytics, Python is probably the first programming language that I recommend. Python is easy to learn with some data mining and statistical analysis capability. It also has a lot of toolkits and strong support community. Another popular language is R, it is a simple and popular language where you can process complex data sets, manipulate data through sophisticated modeling functions, and create graphics to represent the numbers, in just a few lines of code. According to industry report, currently there is over 2 million people use R in Big data projects, especially in data modeling. Although most universities are teaching Java as the basic programming language but Java does not provide the same quality as R and Python do, as it is NOT designed for statistical modeling. MatLab is another programming language in Big data area but it is not popular in the industry. Mathlab is used mostly in university’s research.
No comments: