The Browse Database provides simple tools for protein search. Firstly, structures can be filtered based on:
Predicting model category: the AlphaFold version 4 (AF4) or ESMFold model version (ESM1). Moreover you can make more precise search of AF4 structures based on our manual assignment (i.e. AF4 Reviewed or AF4 Unreviewed).
Biological category: you can narrow down results to a chosen source organism, family of organisms, gene or enzyme comission number (EC). During entering a text prompt window will appear.
Lasso Orient and Term: search only for lassos representing selected orientation and term.
Figure 1. Filter panel.
The features displayed on the default screen change relative to the chosen categories after you press "Filter". Secondly, the database can be further searched based on:
Lasso types: enables the user to print structures containing loops with defined lasso type. More than one type of lasso-type entanglement can be chosen. The user can also define a subclass of interest that includes some specific geometrical information: the tail which pierces the surfaces (N- or C-terminal) and the direction of piercing the surface. Apart from major classes the user can also search through L0 class in which the proteins with unpierced closed loop (trivial loops) are stored, and the Artifact class. In the latter one the structures to which correctness we had doubt are stored. For chains in the Artifact class the topology can be non-trivial, however one has to be very carefull with interpreting the results, as the topology can be a result of errors.
Lasso loop length: structures are grouped based on their loop length (between closing bridge atoms).
Lasso tail length: structures are grouped based on their lasso tail length (parts of the backbone chain between the loop and the N or C terminal – referred to as lasso N-end length and lasso C-end length respectively).
Figure 2. Search panel.
The Advanced search section provides more comprehensive tool for protein search. It lets you to create more complex queries by joining subqueries with logical operators, and control output format.
Possible subqueries can be composed of following categories:
Lasso type: Type of lasso. Lassos are divided into 4 categories which are reflected in the letters in their names and the numbers which represent the crossings: Simple lassos (L), Two-sided (LL), Supercoiling (LS) and the Two-sided and Supercoiling ones (LLS, LSL, LSLS).
Fingerprint: Represents a short tag containing all lasso types identified in a given structure.
Category of a predicting model: AlphaFold version 4 or EMSFold version 1. Additionally results of AlphaFold version 4 are divided into credibility categories: AF4 Reviewed or AF4 Unreviewed
pLDDT chain: mean pLDDT value for a whole protein chain (What is pLDDT?)
Loop length: number of residues in the loop formed by the bond.
Loop area: area of the loop formed by the bond.
Loop perimeter: perimeter of the loop formed by the bond.
Loop gyradius: gyradius of the loop formed by the bond.
Loop pLDDT deep: mean pLDDT value for the part of the chain representing a deep lasso (What is pLDDT?)
Loop pLDDT shallow: mean pLDDT value for the part of the chain representing a shallow lasso (What is pLDDT?)
Loop GLN N: the Gaussian Linking Number (GLN) computed for the N-end of the chain. (GLN calculation)
Loop GLN C: the Gaussian Linking Number (GLN) computed for the C-end of the chain. (GLN calculation)
Chain length: number of residues in protein chain.
Protein name based on UniProtKB record.
Gene name based on UniProtKB record.
Taxonomic: kingdom, family, organisms.
Bioinformatic codes: Uniprot, InterPro, PDB, PFAM, EC.
If you leave Show results button alone, your results will be shown in your browser in same format as when using "Browse database panel". Otherwise you can choose a number of different output options: display .csv in browser, download .csv file or download compressed (gzipped) .csv file. Furthermore, you can Customize result columns and add columns to the default output, or remove any columns. Finally press Search button.
The same possibilities are available using our multi-criteria API.
The Browse Database section provides a list of proteins which contain lassos with at least 70% probability (see lasso detection section). The list consists of proteins with their lasso type (Lasso type), Uniprot accession code (Uniprot), source organism (Organism), protein name for the corresponding UniprotKb entry (Protein name), pLDDT of a whole chain (pLDDT chain) and database from which structure prediction comes from (Category of predcting model). In the default screen, entries are sorted by the chain pLDDT.
The category AF4 Reviewed contains proteins which were manually reviewed by us and determined to be correct.
Users can also choose to view the results in raw text format.
After selecting a particular protein, the details for the query will be presented. Each protein page contains a few tabs:
Lassos tabs for AlphaFold v4 and ESMFold predictions (if available).
Protein information tab with corresponding gene, PFAM and InterPro codes, PDB structures and organism information.
Similar proteins tab with lists of proteins which have at least 40% sequence similarity to this protein. (available after recomputing)
Each Lassos tab is separated into following sections:
Structure view displayed using Mol*.
Model details providing:
Barycentric view providing a visualization of the surface spanned on a loop.
Lasso map (only after recomputing): a matrix diagram of a protein. The matrix can be expanded by selecting Zoom lasso map option. Further details on how to interpret lasso data are described in the section How to interpret lasso data.
Model information
Figure 4. Single protein data presentation of protein A0A1W2H5V4 in AlphaLasso. The view provides general information as well as details about the topological type for the selected protein.
Structure is a 3D graphic representation of the structure determined by AlphaFold and the amino acids sequence. The surface spanned on the loop is shown by default. User has the ability to adjust its opacity. Smoothed model can be viewed after clicking Smooth button. Residues in the sequence are highlighted based on the 3D structure selection. By default the structure is colored by the AlphaFold scheme (pLDDT). It can be also colored by Rainbow or Default. Users can also define the amino acids range to be displayed (Fig. 5).
Figure 5. Amino acid residues displayed in the 3D structure of protein A0A1W2H5V4. The residues are colored by the pLDDT value.
Lasso types & Model sequence: a list of lasso types present in the protein. After selecting the View Details option, the lasso loop range is highlighted in the Model sequence underneath the table and in the 3D structure (Fig. 6).
Figure 6. Lasso types, model sequence, and structure panel. A list of all the lasso types present in protein A0A1W2H5V4. The loop sequence, bridge and crossing residues are marked in the model sequence.
Lasso diagram: in more complex structures there might be a need to view lasso locations on a simple diagram. The same lasso type is marked with the same color. Hovering the mouse cursor over particular areas displays the lasso type and its loop range. This view also allows comparing different models deposited in the database by providing their UNIID (multiple models can be compared by separating their IDs with commas) in the input box and clicking Compare.
Figure 7. A plot showing the comparison of lasso types present in protein A0A1W2H5V4 and A0A172UQ65.