Beyond the tools of the data scientist

on Nov 12, 15 • by Jennifer Locke • with No Comments

We all have a favorite tool to use, so how does your tool help you face the prototype to production challenge...

Home » Analytics and Data Mining » Beyond the tools of the data scientist

Regardless of the size of your data, small, medium, large, or “Big”, whether you’re a data analyst, data scientist, statistician, mathematician, or physicist, you have a favorite tool. Maybe it’s MATLAB, R, or Python, but whatever tool you use, your process is similar.

You play with different algorithms and methods to find the right mathematical representation or statistical model which best represents the problem you’re trying to solve.

If you’re looking to find the optimum blend of ingredients to reduce cost while retaining quality, you may look for an optimization algorithm. For example, if you’re trying to maintain inventory, you may look for a statistical model that best represents your current monthly sales, use it to predict future sales, and optimize your inventory to match predictions. You may want to incorporate “what if” scenarios into your model to see what your outcome could be. You may even test a new product offering or promotion in select stores and use the results to determine which of your stores have similar demographics and will benefit various promotions or products.

Tools like R and MATLAB are great to try different approaches to analyze your data. But how often do you run the same analysis in production with the same methods? What if you provided an application that used your best-found methods and freed you to focus on new problems? How often do you hand your work off to another group to implement in an application? The challenge is reproducing the same algorithms you used in R or MATLAB in a programming language suitable for production.

The IMSL Numerical Libraries have been filling this gap for years, providing well-tested, documented, and trusted algorithms in common programming languages such as C, Java, and Fortran. There may be some differences in the IMSL implementation versus R or MATLAB, but they can be mitigated by building tests to compare against your results. The ideal approach is to call IMSL directly from your favorite tool, in your environment with your tests. Tools such as R and MATLAB allow loading external DLL’s and Jar files.

We have a new whitepaper written by my colleagues, Ed Stewart and Mark Sweeney, that walks through the process of calling IMSL C# from R. If you’re looking to spend less time jumping from prototyping to production, this paper shows you how to do it quickly and easily.

If you are attending Supercomputing 2015, drop by booth 1324. We’d love to talk to you more about how IMSL Numerical Libraries can help you.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top