Top 10 Data Science Platforms And Their Customer Reviews 2022

Data science platforms are software products that enable data science teams to run code, train models, and deploy APIs, and can replace a data scientist having to manually set up their programming environment themselves.

Data science platforms are software products that enable data science teams to run code, train models, and deploy APIs, and can replace a data scientist having to manually set up their programming environment themselves. In this article, we’re rounding up some of the best platforms from the voices of the users themselves, sharing the pros and cons below:

Features of Data Science Platforms

  • Accessible Computing Environments Data scientists get access to prebuilt computation environments - high memory notebooks, GPUs etc, each connected to hardware in the back-end and ready to use.

  • Deploy Dashboards and APIs Data scientists get access to prebuilt computation. With their built-in tools, data science platforms make it easy for you to turn your results from Jupyter notebooks to Dashboards or REST APIs. (Download a free eBook here: Seamlessly deploy with APIs and dashboards, without relying on DevOps engineers.)

  • Schedule Tasks and Pipelines You do not need to rely on an engineer to setup or run recurring tasks. Data science platforms provide you with built-in tools to easily create and run you jobs.

  • Control Access to Resources Administative tools allow data science managers to restrict access resources or hardware types, manage cost, and oversee the system.

  • Collaborate with Coworkers When working as a team, data scientists can easily share their work. This avoids situations where code can’t run for other teammates, code become stale, and other critical issues.

  • Integrate with Other Tools A data science platform often has built-in capabilities to connect to data services, version control tools, and other technologies to improve your work.


“Your infrastructure decisions will become more solidified as your team matures. More and more processes will be built around your particular databases and workplaces, and your team will become more comfortable with them. In general, this is a positive thing! It means your team is working through issues and becoming faster and more experienced. You may, however, find a discrepancy when it comes to bringing new people onto your team. When hiring new people, you’ll need to assess how much of the infrastructure that you use is familiar for a candidate. This can be decided explicitly by only considering resumes that have the necessary skill set or implicitly by having interview questions that weed out people without experience”

Jacqueline Nolis, Leading Data Science Teams.


This list is a curation by subject matter experts on the most popular data science platforms, including testimonials from actual customers to help you understand the pros and cons of each option.


Saturn Cloud Logo

1. Saturn Cloud

Saturn Cloud is a data science platform for scalable Python, R, and Julia for teams and individuals enabling GPU computing to speed up data science by up to 2000x.

Saturn provides a flexible environment where data scientists can launch high-powered notebooks (Jupyter, R, VS Code, and more) in the cloud, quickly use Dask clusters, GPUs, deploy cloud resources to expand their data science capabilities, collaborate throughout an entire project lifecycle, and more. Get started for free here.

The positivesThe challenges
“Saturn Cloud makes my work so much easier. When I sit down at the beginning of the day, I just want my environment to work. I want my favorite packages installed and available on demand. I want it to be easy to scale my workspace and have it shut down automatically when I’m done. Saturn Cloud solves all of that. Their customer service is also top-notch.”

*- Daniel B (G2)
“Running hyperparameter searches can be done through Dask, but this does not always feel like the most natural way to perform this task. Having a jobs API that allows you to run arbitrary jobs similar to SLURM would be a helpful addition.”

- Joost V (G2)*
“Moving the code to SaturnCloud was quite painless – all I had to do was to switch out the distributed Dask scheduler with the one provided by SaturnCloud, and re-point to S3 instead of local disk for my data.”

- Sujit Pal (AWS)
“It wasn’t very easy to edit docker image”

- Dinesh K (G2)

Amazon SageMaker Logo

2. Amazon SageMaker

Amazon SageMaker provides purpose-built tools for machine learning operations (MLOps) to help teams automate and standardize processes across the ML lifecycle. Using SageMaker MLOps tools, you can train, test, troubleshoot, deploy, and govern ML models at scale to boost productivity of data scientists and ML engineers while maintaining model performance in production. Learn more about their free tier here.

The positivesThe challenges
“I like how it lets you choose different machine types for each phase of development so no resources are wasted. Sagemaker also supports distributed training. I’ve used Sagemaker to build, train and deploy deep learning models using PyTorch, Tensorflow as well as Keras.”

*- Judy T (G2)
1) “The very unclear documentation/support for Models monitoring after deployment over Sagemaker endpoints.
2) Too many APIs to do the same thing.
3) UI can be better.
4) Project Management and adding collaborators feature is missing
5) The logs tracking can be better managed.”

- Ramavtar M. (G2)*
“It offers a lot of prettained deep learning and ML models that reduces the time of small projects drastically. It’s easy to work with for the first time and doesn’t require prior knowledge. It’s the best for deploying mosel with simplicity.”

*- Anonymous (G2)
“It’s bit pricey and the UI doesn’t exactly tell you if you’ve got unused instances or deployed models lying around. Amazon sneaks up on you with unexpected bills if you aren’t careful. It’s a bit difficult to get started with as the whole user creation vs instance creation kinda trips you up. As soon as you click on preview, amazon directs you to the user creation part and you hace a whole lot of unnecessary setup when all you probably need is to create a single notebook instance."

- Judy T (G2)*



Domino Data Lab Logo

3. Domino Data Lab

Domino Data Lab’s MLOps platform enable data scientists to develop better medicines, grow more productive crops, adapt risk models to major economic shifts, and more. Data scientists and machine learning engineers can do exploratory data analysis and model development without configuring and using their own compute resources. DDL has a 14-day, no obligation free trial where you can experience a full Domino Enterprise MLOps Platform. Learn more here.

The positivesThe challenges
“A unified way to manage and run artificial intelligence, machine learning and data science workloads. It’s very easy to run and train model through Domino. The ability to create our own environment is very useful and the integration of Jupyter notebook, Rstudio, visual studio code is very useful and easy to use for developing and debugging”

*- Shubham C (G2)
“I dislike how long it takes upon the initial load time when it updates another project that you have a dependency from. You’re pulling source material and I’m wondering if that load could then be done prior to the startup of a project”

- Anonymous (G2)
“Great UI/UX experience. Data Scientists enjoy using the tool. Easily spin up compute and deploy models or apps. Customer success team is very supportive. As a manager of data scientists, I have visibility into their work and activities and can manage costs easily.”

*- Sean O (G2)
“Just the documents should be made more clear and easy to implement. Sometimes the document are of older release, please improve that”

- Shubham C (G2)*



Paperspace Gradient Logo

4. Paperspace Gradient

Paperspace Gradient is an end-to-end machine learning platform where individuals and teams can build, train, and deploy Machine Learning models of any size and complexity.Paperspace offers a free plan with limits to CPU and GPU machines. They also offer paid plans for greater access.

The positivesThe challenges
Paperspace is hands down the best cloud GPU platform for ML students like myself. It powered through 3 personal projects of mine with unlimited GPU time limits for just $8 per month.

- Hyeonmok K. (Capterra)
The support is bad. I’m waiting for an answer since weeks. I have login problems. I tried to reset my password but never received an email. Also the referral program seems not working. I send the link to a friend and never got the money to my account. The pricing is also bad. I chose the wrong OS for my virtual machine. It was necessary to create a new one. They charged my money for a few seconds of storage.

- Dawid K (Capterra)
Free virtual machines with GPU access allow casual users and professionals to perform computationally-intensive program development and machine learning training and testing.

- Paul H (G2)
Terrible billing system. Customer Support talks weeks, even months to respond.

- Ryan H (Capterra)



Anaconda Logo

5. Anaconda

Anaconda distribution is a free and open-source platform for Python/R programming languages. It can be easily installed on any OS such as Windows, Linux, and MAC OS. It provides more than 1500 Python/R data science packages which are suitable for developing machine learning and deep learning models. Anaconda has a free tier you can learn more about here

The positivesThe challenges
The User Interface is quite good. It is a comprehensive tool for performing data analytics , machine learning and related tasks It is open source and provides a good interface for python developers to navigate between different environment. A great tool for beginner where we can easily install required packages using pretty simple commands

- Anonymous (Gartner)
Anaconda recently changed terms of service, at least commercial license is needed if team is over certain size. Consider different options, you may not need the Enterprise tier.

- Anonymous (Gartner)
This is a very powerful tool with many great tool integrated with it, such as Pandas, Numpy Matplotlib etc. Before I use anaconda, it was a hassle to install packages using pip or wheel files. I would recommend this tool to everyone who is working upon data science.

- *Anonymous (Gartner)
“When we run into some problems, there’s not much clear documentation available on the anaconda site. We ll have to rely on 3rd party solutions. Maybe this can be improved”

- Sughosh J (G2)*



Azure ML Studio Logo

6. Azure ML Studio

Azure Machine Learning studio is a web portal in Azure Machine Learning that contains low-code and no-code options for project authoring and You can start for free with a $200 credit to use within 30 days. Learn more here.

The positivesThe challenges
It fulfilled my goal in a single channel. Even haven’t worr[ied] about the maintenance or any fault tolerance. This provide[s] the user interactive UI to grab the features easily. [Their] support teams also very help[ful],

- Anonymous (TrustRadius)
It is not as robust or as supported as matlab’s machine learning toolkit. Also doesn’t work well with tensorflow, a python library, that almost 50% of machine learning developers use.

- Anonymous (Capterra)
If you’re new to Machine Learning there could be nothing better than Azure Machine Learning. The best part is without much coding knowledge you may leverage the benefits of Machine Learning. Intuitive UI and help materials make it even easier

- Ashish T (Capterra)
The price is a bit high for independent consultants who do not have the support of large companies. It should also have a more comprehensive guide to using its features” -

Suarez O. (Capterra)




H20 Logo

7. H2O.ai

H2O.ai is a fully open-source platform, which is a distributed in-memory ML platform with linear scalability.H2O supports statistical and machine learning algorithms for those who are looking for deep learning platforms because deep learning is one of the significant advantages of it. H20 offers a 90-day free trial for those looking to get started. Learn more here

The positivesThe challenges
“Excellent support for commercial product Driverless AI. Rapid iteration. Performance is generally better than one can be achieved in code.”

- Marc S (AWS)
“Programmatically using the software is difficult because the documentation is lacking and it is hard to find the documentation that they do have. It’s easier to use the GUI, but that isn’t good for an end-to-end solution.”

- Anonymous (Capterra)
“They developed top-quality open source tools, including the H2O-3 and AutoML families. I do not have a license for their Driverless AI, but my experience with it through tutorials and other demos has been superb. I should mention that their efforts to develop frameworks for ML interpretability are spot on, and their learning center is shaping up as a valuable resource to the community in general. The interfaces with R and Python enable a smooth transition of pre-existing workflows into the H2O framework.”

- Renzo S (AWS)
“Less well documented developing guideline. It’s some what difficult to build something above it. Notebook has its own shortcomings, which some operations seem less convenient. For instance, if the error output is too lengthy, I have to scroll all the way up to get where I need edit the code.”

- Jianlin S (Capterra)




Alteryx Logo

8. Alteryx

Alteryx allows analysts to prep, blend, and analyze data faster with no-code, low-code analytic building blocks that enable highly configurable and repeatable workflows. You can quickly build predictive models without coding or performing complex statistics. Alteryx offers a free 30-day trial you can learn more about here

The positivesThe challenges
“There is no requirement for coding, however, Alteryx provides R, Python, and Spark coding capabilities. It’s an excellent solution for difficult ETL and data reshaping tasks, with a plethora of user-friendly tools and functions.”

- Anonymous (Gartner)
“The pricing of the software is extremely high, limiting companies looking for people with Alteryx skills.”

Jeremy R (G2)
“The code-free code-friendly concept and speed of processing are unique. Super intuitive to use and easy to master. No code needed, but if you are a coder, Alteryx offers R, Python & Spark coding tools.”

- AJ G (G2)
“Pricing model is awful. There is an individual seat that is about 25k, but then, for deploying a server, the cost goes almost close to 100k. This is big No for smaller companies.”

-Bhushan E (G2)




https://databricks.com

9. Databricks

Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers and data analysts with a simple collaborative environment to run interactive and scheduled data analysis workloads. You can begin a free trial here.

The positivesThe challenges
“This is so far the best tool I have used ( doesn’t mean other tools are not good) that allows a unified platform for data engineering and data science workloads and it does full justice to both types of audience. I hope the PaaS can provision external hive metastore OOTB and make the SQL Analytics more friendly to SQL & BI users”

- Anonymous (Azure Marketplace)
“Some of the cons are that the primary language is Java/Scala, whereas many data scientists are using python or R, which run slower on Databricks than Java and Scala.”

- Rayla V (Capterra)
“1) Latest versions of Spark are available seamlessly
2) Auto scaling is useful
3) Auto termination is extremely valuable and a cost saver!”


- Anonymous (Gartner)
“Would like better syntax recognition for the programming languages supported by Spark. Currently databricks lacks some features that major IDE’s like Spyder for Python or R Studio for R have.”

- Andres E (G2)




Deepnote Logo

10. Deepnote

Deepnote is a collaborative data science notebook that helps teams discover, understand, and share their findings with anyone. It’s Jupyter compatible, runs in the cloud, and works with any data stack. Learn more here.

The positivesThe challenges
“It’s so easy to share reproducible Jupyter notebooks when collaborating with folks. I love my current usage and I do not even need to install too many libraries on my laptop. Deepnote just does the heavy lifting on that front and I feel so empowered doing a lot of work and school projects with their platform.”

- Helen Mary B (G2)
“Support is not very responsive or helpful. Some of their suggestions were at a level of “have you tried turning on and off” as if their audience is not tech-savvy. Notebooks crashed multiple times and in 1-2 of them I lost my code and it was not saved in history”

- Anonymous (G2)
“Real-time collaboration between team members which speeds up analysis workflow Integration to existing data stacks (Redshift, MySQL, Github) SQL on Pandas DataFrame

- Khanh N (G2)
“The RAM is not enough and when we want to use higher machine. Rates are higher than Google Colab.”

- Shah N (G2)

Summary

The data science platforms listed in this article are the most popular and used. By comparing the product offerings with customer reviews, we hope you found this comprehensive coverage of tools helpful. If you’d like to contribute to this article, reach out to mel@saturncloud.io.

Additional Resources:


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.