Foundations Atlas

Atlas Documentation

Atlas is a flexible Machine Learning platform that consists of a Python SDK, CLI, GUI & Scheduler to help Machine Learning Engineering teams dramatically reduce the model development time & reduce effort in managing infrastructure.

Atlas is a subset of Foundations which is a group of tools we have built for Machine Learning Engineers.

Platform Support Python Support Downloads

BETA Note

Atlas has evolved quite a lot throughout our history. The latest open-source version of Atlas includes a lot of architectural and design changes.

This version is currently in BETA.

Here are some of the core features:

Experiment Management & Tracking:
Tag experiments and easily track hyperparameters, metrics, and artifacts such as images, GIFs, and audio clips in a web-based GUI to track the performance of your models

GUI

Job queuing & scheduling:
Launch and queue thousands of experiment variations to fully utilize your system resources

GUI

Collaboration & Bookkeeping:
Keep a journal of thoughts, ideas, and comments on projects

Reproducibility:
Maintain an audit trail of every single experiment you run, complete with code and any saved items

Authentication & other integrations:
Collaborate across your team by seting up Atlas on a cluster and aproviding user access controls via KeyCloak integration.

Slice and dice your models from the GUI via the Tensorboard integration.

How does Atlas Work?

Atlas consists of the following core modules:

  • GUI - A Dockerized web application to view job status for various projects.
  • Foundations SDK & CLI - A programmatic and command-line interfaces for Atlas.
  • Local Scheduler - A scheduler which is used for job orchestration and management.

Here is an example workflow:

  1. Install Atlas on your machine or on the Cloud
  2. Use the SDK to log various metrics, hyperparameters or use the submit SDK command to automate submission of jobs for e.g:
    # main.py
    
    import foundations 
    
    foundations.log_metrics('accuracy', acc)
    foundations.log_param('batch_size', 64)
    
    foundations.save_artifact('loss_graph', loss_plt)
    
    foundations.submit(scheduler_config="scheduler"
                        command=["main.py"],
                        num_gpus=1,
                        stream_job_logs=False)
    
  3. Use the Foundations CLI to submit your jobs to your local or remote machine (or cluster).
    foundations submit my_aws_scheduler_instance . main.py --num-gpus=3
    
  4. Atlas packages your code into a container, reviews resources available, schedules and executes your job and displays the results on the GUI.

Contribute to Atlas

Atlas is open-source (Apache 2.0) and we welcome all levels of contributors!

To contribute to Atlas, get started on our Github.

Join the community

Want to contribute to Atlas? or just want to get in touch with Atlas users and Dessa Deep Learning Engineers? Join our community Slack here.

Prefer discussing over e-mail? Send us a message here.

License

We ❤️ open source, Atlas is licensed using the Apache 2.0 License.

Copyright 2015-2020 Square, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

© 2020 Square, Inc. ATLAS, DESSA, the Dessa Logo, and others are trademarks of Square, Inc. All third party names and trademarks are properties of their respective owners and are used for identification purposes only.