Data scientists can discover potentially interesting patterns in big data sets, but only after the data is in usable form — which is often the most labor intensive part of the solution. What follows is some interesting research from MIT.
Better Machine Learning
MIT News (02/24/15) Eric Brown
Kalyan Veeramachaneni tackles some of the biggest bottlenecks holding back the data science industry.
Much of Veeramachaneni’s recent research has focused on how to automate this lengthy data prep process. “Data scientists go to all these boot camps in Silicon Valley to learn open source big data software like Hadoop, and they come back, and say ‘Great, but we’re still stuck with the problem of getting the raw data to a place where we can use all these tools,’
Veeramachaneni and his team are also exploring how to efficiently integrate the expertise of domain experts, “so it won’t take up too much of their time,” he says. “Our biggest challenge is how to use human input efficiently, and how to make the interactions seamless and efficient. What sort of collaborative frameworks and mechanisms can we build to increase the pool of people who participate?”