Datajoint
Description
Datajoint is a tool that allows you to model schemas for experiments in Python or MATLAB and then save these models in a database. It is a language-independent way for you to define and use scientific pipelines. More information about Datajoint and its purpose can be found in its documentation.
Privileges in Datajoint
Each lab has its own set of databases in Datajoint, which are only accessible by lab members. There are 2 kinds of databases, both of which are open to all lab members with a Datajoint account:
- Common Databases are used for data management and storing acquired data. These databases are named according to the following prefix convention: <Lab Name>_common_.
- Analysis Databases are used for pipelines and analyses. These databases are named according to the following prefix convention: <Lab Name>_analyses_.
Within a lab, there are 4 tiers of access to databases:
- Data Managers have all permissions on all lab databases. This means that they can create, read, update, and delete individual data points for all databases. They also can create or delete entire databases (including all data points within a database) for their lab.
- Researchers can create, read, update, and delete individual data points within common databases. They also have all permissions for analysis databases, which means that they can edit data points within analysis databases and also create or delete entire analysis databases as well.
Technicians have more limited access, and can only create and read individual data points within common databases. They cannot edit individual data points. Within analysis databases, they can only read data points.
- Readers can only read data points from all lab databases.
Additionally, each individual researcher can create and update their own private databases as well. Private researcher databases start with the researcher's UNI.
Adding and Removing Lab Members to/from Datajoint
The privileges described above can be granted and removed by lab managers and principal investigators using a web interface called Grouper. Grouper is a tool that can be used to manage access to various services at Columbia, among them CUIT-provided AWS accounts, Google Groups mailing lists. It is also used behind the scenes to facilitate access to Axon. For Datajoint, Grouper is used to empower researchers by giving them the ability to directly add lab members and Columbia-based collaborators to their Datajoint instance (collaborators without UNIs are not supported at this time).
To add a new lab member, perform the following steps:
- Navigate to grouper.cc.columbia.edu and log in with your UNI and password when prompted.
- You will then be presented with a screen similar to the following:
In the Quick links sidebar on the left, click on My groups. - On the next page, there should be a tab labelled Groups I manage. Under this tab, there is a 2 column table, where you should see 4 groups that correspond to each of the Datajoint roles. The left row of the table indicates a path to the Datajoint role, similar to a hard disk filesystem, while the right row shows an alias for this path.
If you manage Datajoint roles for multiple labs, you may notice that the path aliases for the roles are duplicated. In this case you can use the fully-qualified path in the left column to differentiate between labs since it contains the lab name.
Click on the role that you want to add someone to and you will be taken to the next screen. - You are now on the screen that allows you to add lab members to Datajoint roles.
- Click on the golden Add members button in the upper right.
- Type in the lab member's UNI in Member name or ID:.
- Select the lab member from the dropdown.
- Leave Default privileges checked.
- Finally click the golden Add button farther down on the screen.
- Datajoint's user database is updated every 15 minutes starting at the top of the hour, so it may take a brief interval before your lab member has been added to Datajoint. When he/she is added, he/she will receive a temporary password in an email to his/her Lionmail account that looks like the following:
- Once your lab member has received that email, have him/her perform a password change as soon as possible using the dj.set_password() function described here.
Development vs Production Datajoint Instance
The U19 project has two Datajoint installations available for use:
- The Development installation is used for prototyping new analysis pipelines and data management workflows. This instance is a sandbox that allows you to evaluate how well your Datajoint code works for your lab. This installation can be accessed via the datajoint-dev.u19motor.zi.columbia.edu domain name.
- The Production installation is used for analyses and data streams that will ultimately be used in the results section of your paper. It runs mature, well-tested pipelines that you know work well for your lab. This installation can be accessed via the datajoint.u19motor.zi.columbia.edu domain name.
The chief difference between these two instances is that the development installation doesn't adhere to rigid naming conventions for database names, and is also not currently integrated with Grouper. This means that you will need to obtain the credentials for accessing the development instance from the Data Engineer or another lab member. Each lab has their own set of credentials for logging into the development instance.