The Gap and Island Problem in Python: Finding Consecutive Sequences in Data
The gap and island problem asks you to group a sorted sequence of integers into “islands” — contiguous ranges — and identify the “gaps” between them. It is common in data analysis: finding date ranges where a user was active, finding consecutive sensor readings, or identifying missing periods in time-series data.
Island: a consecutive run (e.g. 1, 2, 3 or 7, 8, 9, 10) Gap: the space between two islands (e.g. 4, 5, 6 are absent between the islands above)
Approach 1: Pure Python Loop
Walk through the sorted array. Start a new island whenever the current element is not one more than the previous one.
def find_islands_and_gaps(numbers): """ Given a sorted list of integers, return islands (consecutive ranges) and gaps (missing ranges between islands). """ if not numbers: return [], []
numbers = sorted(set(numbers)) # sort and deduplicate islands = [] gaps = []
island_start = numbers[0] prev = numbers[0]
for num in numbers[1:]: if num == prev + 1: # Still in the same island prev = num else: # Current island ends at prev; new island starts at num islands.append((island_start, prev)) gaps.append((prev + 1, num - 1)) # gap between islands island_start = num prev = num
# Close the last island islands.append((island_start, prev))
return islands, gaps
data = [1, 2, 3, 7, 8, 9, 10, 15, 16]islands, gaps = find_islands_and_gaps(data)print("Islands:", islands)print("Gaps: ", gaps)# Output:# Islands: [(1, 3), (7, 10), (15, 16)]# Gaps: [(4, 6), (11, 14)]Approach 2: itertools.groupby
itertools.groupby can group consecutive integers by subtracting their index — a classic trick. When you subtract the position index from a consecutive sequence, the result is constant.
from itertools import groupby
def find_islands_groupby(numbers): """Use itertools.groupby to find consecutive runs.""" numbers = sorted(set(numbers)) islands = []
# Group by (value - index): consecutive integers share the same key for key, group in groupby(enumerate(numbers), lambda pair: pair[1] - pair[0]): group_list = list(group) start = group_list[0][1] # value of first element in group end = group_list[-1][1] # value of last element in group islands.append((start, end))
return islands
data = [1, 2, 3, 7, 8, 9, 10, 15, 16]print(find_islands_groupby(data))# Output: [(1, 3), (7, 10), (15, 16)]Why does value - index work? For [1, 2, 3] at indices [0, 1, 2], the differences are [1, 1, 1] — constant. For the next island [7, 8] at indices [3, 4], the differences are [4, 4] — a different constant. So they fall into separate groups.
Approach 3: Pandas groupby (Data Analysis Context)
When working with data frames, pandas has a concise idiom:
import pandas as pd
data = [1, 2, 3, 7, 8, 9, 10, 15, 16]df = pd.DataFrame({'value': data})
# Create island groups: diff != 1 marks the start of a new islanddf['island'] = (df['value'].diff() != 1).cumsum()
# Summarise each islandresult = df.groupby('island')['value'].agg(['min', 'max'])result.columns = ['island_start', 'island_end']print(result)# Output:# island_start island_end# island# 1 1 3# 2 7 10# 3 15 16diff() computes the difference between consecutive values. When it is not 1, a new island starts. cumsum() turns those flags into group IDs.
Binary Sequence Variant
Some versions of this problem use 0s and 1s directly — 1 means “active”, 0 means “gap”:
def analyze_binary_sequence(sequence): """Find island and gap lengths in a binary sequence.""" island_lengths = [] gap_lengths = [] current_len = 1 current_type = sequence[0] # 1 = island, 0 = gap
for i in range(1, len(sequence)): if sequence[i] == current_type: current_len += 1 else: if current_type == 1: island_lengths.append(current_len) else: gap_lengths.append(current_len) current_type = sequence[i] current_len = 1
# Handle the last run if current_type == 1: island_lengths.append(current_len) else: gap_lengths.append(current_len)
return island_lengths, gap_lengths
seq = [1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0]islands, gaps = analyze_binary_sequence(seq)print("Island lengths:", islands) # [1, 2, 3]print("Gap lengths: ", gaps) # [1, 3, 1]The SQL equivalent of this problem uses ROW_NUMBER() and window functions — knowing the Python version helps you understand why the SQL version works the way it does.