Augmented Interval List | Databio Slides

# LOLA refresher

![LOLA abstract](/shorts/ailist/lola-abstract.svg)

---

# LOLA requires comparing sets of intervals

![Subject query](/shorts/ailist/subject-query.svg)

Can we improve the efficiency to enable faster, larger-scale analysis?

---

# If subject list has no containment, identifying overlaps is fast

![Subject query annotated](/shorts/ailist/subject-query-annotated.svg)

binary search on start intervals, followed by backward steps:

![Binary search](/shorts/ailist/binary-search.svg)

---

# The problem arises with contained interval overlaps

![Subject query containment](/shorts/ailist/subject-query-containment.svg)

![Binary search fail](/shorts/ailist/binary-search-fail.svg)

---

# How can we improve efficiency without guaranteeing no containment?

---

# Many approaches to solve the 'containment' issue:

- Nested Containment Lists (GRanges) (Alekseyenko and Lee, 2007; Aboyoun, P, Pages, H, and Lawrence, 2012)
- R-trees (bedtools) (Kent et al., 2002; Quinlan and Hall, 2010), Augmented interval trees (Cormen et al., 2001)

These methods try to structure the data to provide non-containment guarantees

---

# Methods provide non-containment guarantees

### R-trees

Annotates tree nodes with a *minimum bounding rectangle* of elements. A query that does not intersect the bounding rectangle will not intersect any child element.

</div>

### Nested Containment Lists

![NCList](/shorts/ailist/nclist.png)

</div>

---

# Augmented Interval List

1. Augment the list with the running maximum *end* value. *solves the problem for lowly-contained lists*

2. Decompose the list to minimize containment. *extends the solution to highly-contained lists*

---

# Augment with the running maximum end value, `maxE`

Provides a *local guarantee* of no containment.

![AIList maxE](/shorts/ailist/ailist-maxE.svg)

---

# AIList works on contained lists

![Subject query containment](/shorts/ailist/subject-query-containment.svg)

![Binary search maxE](/shorts/ailist/binary-search-maxE.svg)

---

# But long containment runs are problematic

![Subject query containment2](/shorts/ailist/subject-query-containment2.svg)

![Binary search maxE2](/shorts/ailist/binary-search-maxE2.svg)

---

# Decompose long runs with constant `maxE`

![AIList decompose](/shorts/ailist/ailist-decompose.svg)

---

# Performance

- How does the `maxE` minimum run length affect performance?
- How does it compare to existing approaches?
- How does it scale with increasing size of subject?

---

# Datasets

![Table 1](/shorts/ailist/ailist_table1.svg)

---

# How does the `maxE` minimum run length affect performance?

![Figure 2](/shorts/ailist/ailist_fig2.svg)

---

# How does it compare to existing approaches?

![Figure 3](/shorts/ailist/ailist_fig3.svg)

---

# How does it scale with increasing size of subject?

![Figure 4](/shorts/ailist/ailist_fig4.svg)

---

# Conclusion and Directions

AIList is best-in-class for one-to-one interval comparisons

---

## Acknowledgments

**Sheffield lab**
- John Lawson
- Vince Reuter
- Ognen Duzlevski
- Jason Smith
- **Jianglin Feng**
- Michal Stolarczyk
- Aaron Gu
- Anant Tewari

</div>

**Funding:**

</div>