Caching

In DataJunction, caching is a crucial component that helps optimize performance by storing and reusing results of expensive operations, such as computing the dimension DAG (Directed Acyclic Graph). This section discusses how caching is used within DataJunction and how you can implement a custom caching solution using FastAPI’s dependency injection.

How Caching is Used

DataJunction employs caching in multiple areas to enhance performance and reduce the load on the database. One of the primary use cases is caching the results of expensive operations like computing the dimension DAG. By caching these results, DataJunction can quickly return previously computed results without having to recompute them, thereby saving time and resources.

Default Caching Implementation

Out of the box, DataJunction comes with a simple in-memory cache that uses SimpleCache from the cachelib library. This implementation is straightforward and efficient for development and small-scale deployments.

Here’s a brief look at the default caching implementation:

from cachelib import SimpleCache

class CachelibCache(Cache):
    """A standard implementation of CacheInterface that uses cachelib"""

    def __init__(self):
        super().__init__()
        self.cache = SimpleCache()

    def get(self, key: str) -> Optional[Any]:
        """Get a cached value from the simple cache"""
        super().get(key)
        return self.cache.get(key)

    def set(self, key: str, value: Any, timeout: int = 3600) -> None:
        """Cache a value in the simple cache"""
        super().set(key, value, timeout)
        self.cache.set(key, value, timeout=timeout)

    def delete(self, key: str) -> None:
        """Delete a key in the simple cache"""
        super().delete(key)
        self.cache.delete(key)

Custom Caching Implementation

You can implement a custom cache by using FastAPI’s dependency injection and injecting a get_cache dependency. The custom cache must implement the CacheInterface, which includes the get, set and delete methods.

Here’s the CacheInterface definition:

from abc import ABC, abstractmethod
from typing import Any, Optional

class CacheInterface(ABC):
    """Cache interface"""

    @abstractmethod
    def get(self, key: str) -> Optional[Any]:
        """Get a cached value"""

    @abstractmethod
    def set(self, key: str, value: Any, timeout: int = 300) -> None:
        """Cache a value"""

    @abstractmethod
    def delete(self, key: str) -> None:
        """Delete a cache key"""

Implementing a Custom Cache

To implement a custom cache, create a class that extends CacheInterface and override the get, set, and delete methods. Then, use FastAPI’s dependency injection to inject your custom cache.

Here’s an example of a custom cache implementation:

from fastapi import Request
from datajunction_server.internal.caching.noop_cache import noop_cache

class MyCustomCache(CacheInterface):
    """A custom cache implementation"""

    def __init__(self):
        # Initialize your custom cache here
        ...

    def get(self, key: str) -> Optional[Any]:
        # Implement the logic to retrieve a cached value
        ...

    def set(self, key: str, value: Any, timeout: int = 300) -> None:
        # Implement the logic to cache a value
        ...

    def delete(self, key: str) -> None:
        # Implement the logic to delete a cache key
        ...

def get_cache(request: Request) -> Optional[CacheInterface]:
    """Dependency for retrieving a custom cache implementation"""
    cache_control = request.headers.get("Cache-Control", "")
    skip_cache = "no-cache" in cache_control
    return noop_cache if skip_cache else MyCustomCache()

Respecting the no-cache Header

The open-source get_cache dependency respects the no-cache header in requests. This means that if a request contains the Cache-Control: no-cache header, the cache will be bypassed, and fresh data will be fetched. This is done by returning an instance of NoOpCache which simply wraps the base Cache implementation that logs caching activity. It is recommended that custom cache implementations also respect this header to ensure consistency. “Turning off the cache” when a no-cache header is detected is as simple as making sure the dependency injected function returns the NoOpCache instance that can be imported from datajunction_server.internal.caching.noop_cache.