Skip to content

GoogleCloudPlatform/google-cloud-spanner-hibernate

Repository files navigation

Google Cloud Spanner Dialect for Hibernate ORM

This is a dialect compatible with Hibernate 6.5 for the Google Cloud Spanner database service. The SpannerDialect produces SQL, DML, and DDL statements for most common entity types and relationships using standard Hibernate and Java Persistence annotations.

Version 1.x and 2.x of this library supports Hibernate 5.4.

Please see the following sections for important details about dialect differences due to the unique features and limitations of Cloud Spanner.

Quick Set-Up

See the Spring Data JPA Sample Application for a working sample application.

First, add the Maven dependencies for the Cloud Spanner Hibernate Dialect and the Cloud Spanner JDBC driver.

Maven coordinates for the dialect:

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-spanner-hibernate-dialect</artifactId>
  <version>3.4.0</version>
</dependency>

Maven coordinates for the official open source Cloud Spanner JDBC Driver.

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-spanner-jdbc</artifactId>
  <version>2.18.0</version>
</dependency>
Note
Hibernate ORM with Cloud Spanner is officially supported only with the open source Cloud Spanner JDBC Driver.

If you’re using a SNAPSHOT version of the dialect, please add the Sonatype Snapshots repository to your pom.xml:

<repository>
  <id>snapshots-repo</id>
  <url>https://oss.sonatype.org/content/repositories/snapshots</url>
  <releases><enabled>false</enabled></releases>
  <snapshots><enabled>true</enabled></snapshots>
</repository>

Configuring the SpannerDialect and a Cloud Spanner Driver class is typical of all Hibernate dialects in the hibernate.properties file:

hibernate.dialect=com.google.cloud.spanner.hibernate.SpannerDialect
hibernate.connection.driver_class=com.google.cloud.spanner.jdbc.JdbcDriver
hibernate.connection.url=jdbc:cloudspanner:/projects/{INSERT_PROJECT_ID}/instances/{INSERT_INSTANCE_ID}/databases/{INSERT_DATABASE_ID}

The service account JSON credentials file location should be in the GOOGLE_APPLICATION_CREDENTIALS environment variable. The driver will use default credentials set in the Google Cloud SDK gcloud application otherwise.

You are now ready to begin using Hibernate with Cloud Spanner.

Examples

To see a full working sample application for using the dialect, please see our Spring Data JPA sample application.

User Guide

This guide contains a variety of best practices for using Hibernate with Spanner which can significantly improve the performance of your application.

Schema Creation and Entity Design

Hibernate generates statements based on your Hibernate entity design. Following these practices can result in better DDL and DML statement generation which can improve performance.

Use Generated UUIDs for ID Generation

The Universally Unique Identifier (UUID) is the preferred ID type in Cloud Spanner because it avoids hotspots as the system divides data among servers by key ranges. UUIDs are strongly preferred over sequentially increasing IDs for this reason.

It is also recommended to use Hibernate’s @GeneratedValue annotation to generate this UUID automatically; this can reduce the number of statements that Hibernate generates to perform an insert because it does not need to run extra SELECT statements to see if the record already exists in the table.

You can configure UUID generation like below:

@Entity
public class Employee {

  @Id
  @GeneratedValue(strategy = GenerationType.AUTO)
  @Type(type="uuid-char")
  public UUID id;
}

The @Type(type="uuid-char") annotation specifies that this UUID value will be stored in Cloud Spanner as a STRING column. Leaving out this annotation causes a BYTES column to be used.

Use @GeneratedValue for ID Generation

NOTE: Read to the end of this section to see the recommended way to set up @GeneratedValue.

Hibernate’s @GeneratedValue annotation for numeric fields is supported, and will by default use a positive bit-reversed sequence. A bit-reversed sequence internally uses a monotonically increasing counter that is reversed before being returned to Hibernate. This means that the identifiers that are generated are in the form bitReversePositive(1), bitReversePositive(2), …​:

@Entity
public class Employee {

  @Id
  @GeneratedValue   // Generates a bit-reversed sequence with an increment_size=1, this is not recommended!
  public Long id;
}

Bit-reversed sequences do not support an increment size larger than 1. This means that entities that use this style of identifiers by default require a round-trip to the database for each entity that is inserted. The PooledBitReversedSequenceStyleGenerator provided in this repository fixes this problem by using the configured increment_size to generate a query that fetches multiple identifier values from the sequence in one query, instead of setting an increment_size on the sequence in the database.

The increment_size for this pooled generator can not exceed 200.

This is the recommended configuration for bit-reversed sequences:

@Entity
public class Employee {
  // Recommended
  @Id
  @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "employeeId")
  @GenericGenerator(
    name = "employeeId",
    // Use this custom strategy to ensure the use of a bit-reversed sequence that is compatible with
    // batching multiple inserts. See also https://docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#batch.
    strategy = "com.google.cloud.spanner.hibernate.PooledBitReversedSequenceStyleGenerator",
    parameters = {
      // Use a separate sequence name for each entity.
      @Parameter(name = SequenceStyleGenerator.SEQUENCE_PARAM, value = "employee_seq"),
      // The increment_size is not actually set on the sequence that is created, but is used to
      // generate a SELECT query that fetches this number of identifiers at once.
      @Parameter(name = SequenceStyleGenerator.INCREMENT_PARAM, value = "200"),
      @Parameter(name = SequenceStyleGenerator.INITIAL_PARAM, value = "50000"),
      // Add any range that should be excluded by the generator if your table already
      // contains existing values that have been generated by other generators.
      @Parameter(name = PooledBitReversedSequenceStyleGenerator.EXCLUDE_RANGE_PARAM,
                 value = "[1,1000]"),
    })
  public Long id;
}

Query Hints

Spanner supports multiple query hints that can be used to optimize specific queries. You can use these with this Hibernate dialect by adding them either as a Hibernate query hint, or by adding them as specifically formatted comments. These specifically formatted comments are processed by this Hibernate dialect, which then modifies the generated query before it is sent to the JDBC driver.

Usage with Hibernate directly using a query hint:

import com.google.cloud.spanner.hibernate.hints.Hints;

CriteriaBuilder cb = session.getCriteriaBuilder();
CriteriaQuery<Singer> cr = cb.createQuery(Singer.class);
Root<Singer> root = cr.from(Singer.class);
root.join("albums", JoinType.LEFT);
cr.select(root);
Query<Singer> query = session.createQuery(cr)
  .addQueryHint(
      Hints.forceIndexFrom("Singer", "idx_singer_active", ReplaceMode.ALL).toQueryHint())
  .addQueryHint(
      Hints.forceIndexJoin("Album", "idx_album_title", ReplaceMode.ALL).toQueryHint());
List<Singer> singers = query.getResultList().size();

You can also add hints as comments to queries that are generated by JPA:

// The hint value that is used here is generated by calling the method:
// Hints.forceIndexFrom("singer", "idx_singer_active", ReplaceMode.ALL).toComment()
// manually and then copy-paste the value to the annotation.
@QueryHints(@QueryHint(name = AvailableHints.HINT_COMMENT, value = "{\n"
  + "  \"spanner_replacementss\": [\n"
  + "    {\n"
  + "      \"regex\": \" from singer \",\n"
  + "      \"replacement\": \" from singer @{FORCE_INDEX=idx_singer_active} \",\n"
  + "      \"replace_mode\": \"ALL\"\n"
  + "    }\n"
  + "  ]\n"
  + "}"))
List<Singer> findByActive(boolean active);

This working sample application shows how to use the above hint.

Transaction Tags

Spanner supports adding transaction tags for troubleshooting queries and transactions. You can add transaction tags to your Hibernate or Spring Data JPA application by adding the com.google.cloud.spanner.hibernate.TransactionTagInterceptor to your Hibernate configuration, and then adding the com.google.cloud.spanner.hibernate.TransactionTag annotation to the method that starts the transaction.

Example for adding the TransactionTagInterceptor:

package com.google.cloud.spanner.sample;

import com.google.cloud.spanner.hibernate.TransactionTagInterceptor;
import com.google.common.collect.ImmutableSet;
import java.util.Map;
import org.hibernate.cfg.AvailableSettings;
import org.springframework.boot.autoconfigure.orm.jpa.HibernatePropertiesCustomizer;
import org.springframework.stereotype.Component;

/** This component adds the TransactionTagInterceptor to the Hibernate configuration. */
@Component
public class TaggingHibernatePropertiesCustomizer implements HibernatePropertiesCustomizer {
  @Override
  public void customize(Map<String, Object> hibernateProperties) {
    hibernateProperties.put(AvailableSettings.INTERCEPTOR, new TransactionTagInterceptor(
        ImmutableSet.of(MyApplication.class.getPackageName()), false));
  }
}

Then add the @TransactionTag to the methods that should be tagged:

@Service
public class VenueService {

  private final VenueRepository repository;

  public VenueService(VenueRepository repository) {
    this.repository = repository;
  }

  /**
   * Deletes all Venue records in the database.
   */
  @Transactional
  @TransactionTag("delete_all_venues")
  public void deleteAllVenues() {
    repository.deleteAll();
  }
}

This working sample application shows how to use transaction tags.

Custom Spanner Column Types

This project offers the following Hibernate type mappings for specific Spanner column types:

Spanner Data Type Hibernate Type

ARRAY<BOOL>

com.google.cloud.spanner.hibernate.types.SpannerBoolArray

ARRAY<BYTES>

com.google.cloud.spanner.hibernate.types.SpannerBytesArray

ARRAY<DATE>

com.google.cloud.spanner.hibernate.types.SpannerDateArray

ARRAY<FLOAT64>

com.google.cloud.spanner.hibernate.types.SpannerFloat64Array

ARRAY<INT64>

com.google.cloud.spanner.hibernate.types.SpannerInt64Array

ARRAY<JSON>

com.google.cloud.spanner.hibernate.types.SpannerJsonArray

ARRAY<NUMERIC>

com.google.cloud.spanner.hibernate.types.SpannerNumericArray

ARRAY<STRING>

com.google.cloud.spanner.hibernate.types.SpannerStringArray

ARRAY<TIMESTAMP>

com.google.cloud.spanner.hibernate.types.SpannerTimestampArray

You can use these type mappings through the Hibernate @Type annotation:

@Entity
public class Singer {

  // Specify the custom type with the @Type annotation.
  @Type(SpannerStringArray.class)
  private List<String> nickNames;

  ...
}

A working example of this feature can be found in the Hibernate Basic Sample.

JSON Data Type

JSON data type can be used by adding a @JdbcTypeCode(SqlTypes.JSON) annotation to a field. The type of the field should be a Serializable POJO.

  /**
   * {@link VenueDescription} is a POJO that is used for the JSON field 'description' of the
   * {@link Venue} entity. It is automatically serialized and deserialized when an instance of the
   * entity is loaded or persisted.
   */
  public static class VenueDescription implements Serializable {

    private int capacity;
    private String type;
    private String location;

    public int getCapacity() {
      return capacity;
    }

    public void setCapacity(int capacity) {
      this.capacity = capacity;
    }

    public String getType() {
      return type;
    }

    public void setType(String type) {
      this.type = type;
    }

    public String getLocation() {
      return location;
    }

    public void setLocation(String location) {
      this.location = location;
    }
  }

  /**
   * This field maps to a JSON column in the database. The value is automatically
   * serialized/deserialized to a {@link VenueDescription} instance.
   */
  @JdbcTypeCode(SqlTypes.JSON)
  private VenueDescription description;

See Spring Data JPA Full Sample for a full working sample. The JSON field is in the Venue entity.

Auto-generate Schema for Faster Development

It is often useful to generate the schema for your database, such as during the early stages of development. The Spanner dialect supports Hibernate’s hibernate.hbm2ddl.auto setting which controls the framework’s schema generation behavior on start-up.

The following settings are available:

  • none: Do nothing.

  • validate: Validate the schema, makes no changes to the database.

  • update: Create or update the schema.

  • create: Create the schema, destroying previous data.

  • create-drop: Drop the schema when the SessionFactory is closed explicitly, typically when the application is stopped.

Hibernate performs schema updates on each table and entity type on startup, which can take more than several minutes if there are many tables. To avoid schema updates keeping Hibernate from starting for several minutes, you can update schemas separately and use the none or validate settings.

Leverage Cloud Spanner Foreign Key Constraints

The dialect supports all of the standard entity relationships:

  • @OneToOne

  • @OneToMany

  • @ManyToOne

  • @ManyToMany

These can be used via @JoinTable or @JoinColumn.

The Cloud Spanner Hibernate dialect will generate the correct foreign key DDL statements during schema generation for entities using these annotations.

The dialect also supports unique column constraints applied through @Column(unique = true) or @UniqueConstraint. In these cases, the dialect will create a unique index to enforce uniqueness on the specified columns.

Advanced Cloud Spanner Features (via. JDBC)

Cloud Spanner offers several features that traditional databases typically do not offer. These include:

  • Stale Reads

  • Read-only transactions

  • Partitioned DML

  • Mutations API (faster insert/update/delete operations)

We provide a Cloud Spanner Features Sample Application which demonstrates best practices for accessing these features through the Cloud Spanner JDBC driver.

Please consult the Cloud Spanner JDBC driver documentation for more information.

Performance Optimizations

There are some practices which can improve the execution time of Hibernate operations.

Be Clear About Inserts or Updates

Hibernate may generate additional SELECT statements if it is unclear whether you are attempting to insert a new record or update an existing record. The following practices can help with this:

  • Let Hibernate generate the ID by leaving the entity’s id null and annotate the field with @GeneratedValue. Hibernate will know that the record did not exist prior if it generates a new ID. See the above section for more details.

  • Or use session.persist() which will explicitly attempt the insert.

Enable Hibernate Batching

Batching SQL statements together allows you to optimize the performance of your application by including a group of SQL statements in a single remote call. This allows you to reduce the number of round-trips between your application and Cloud Spanner.

By default, Hibernate does not batch the statements that it sends to the Cloud Spanner JDBC driver.

Batching can be enabled by configuring hibernate.jdbc.batch_size in your Hibernate configuration file:

<property name="hibernate.jdbc.batch_size">100</property>

The property is set to 100 as an example; you may experiment with the batch size to see what works best for your application.

Use Interleaved Tables for Parent-Child Entities

Cloud Spanner offers the concept of Interleaved Tables which allows you to co-locate the rows of an interleaved table with rows of a parent table for efficient retrieval. This feature enforces the one-to-many relationship and provides efficient queries and operations on entities of a single domain parent entity.

If you would like to generate interleaved tables in Cloud Spanner, you must annotate your entity with the @Interleaved annotation. The primary key of the interleaved table must also include at least all of the primary key attributes of the parent. This is typically done using the @IdClass or @EmbeddedId annotation.

The Hibernate Basic Sample contains an example of using @Interleaved for the Singer and Album entities. The code excerpt of the Album entity below demonstrates how to declare an interleaved entity in the Singer table.

@Entity
@Interleaved(parentEntity = Singer.class, cascadeDelete = true)
@IdClass(AlbumId.class)
public class Album {

  @Id
  @GeneratedValue(strategy = GenerationType.AUTO)
  @Type(type = "uuid-char")
  private UUID albumId;

  @Id
  @ManyToOne
  @JoinColumn(name = "singerId")
  @Type(type = "uuid-char")
  private Singer singer;

  // Constructors, getters/setters

  public static class AlbumId implements Serializable {

    // The primary key columns of the parent entity
    // must be declared first.
    Singer singer;

    @Type(type = "uuid-char")
    UUID albumId;

    // Getters and setters
  }
}

The parent entity should define a @OneToMany relationship with the child entity as well. Use the mappedBy setting to specify which field in the child maps back to the parent.

@Entity
public class Singer {

  @OneToMany(mappedBy = "singer")
  List<Album> albums;

  // continued...
}

Tune JDBC Driver Parameters

The Spanner JDBC driver allows you to set the number of GRPC channels initialized through the JDBC connection URL. Each channel can support up to 100 concurrent requests; for applications that require a high amount of concurrency this value can be increased (from the default of 4).

jdbc:cloudspanner:/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID?numChannels=8

The full list of configurable properties can be found in the Spanner JDBC Driver Java docs.

Use Spanner Query Optimization

The Cloud Spanner SQL syntax offers a variety of query hints to tune and optimize the performance of queries. If you find that you need to take advantage of this feature, you can achieve this in Hibernate using native SQL queries.

This is an example of using the @{FORCE_JOIN_ORDER=TRUE} hint in a native Spanner SQL query.

SQLQuery query = session.createSQLQuery("SELECT * FROM Singers AS s
                                         JOIN@{FORCE_JOIN_ORDER=TRUE} Albums AS a
                                         ON s.SingerId = a.Singerid
                                         WHERE s.LastName LIKE '%x%'
                                         AND a.AlbumTitle LIKE '%love%';");

// Executes the query.
List<Object[]> entities = query.list();

Also, you may consult the Cloud Spanner documentation on general recommendations for optimizing performance.

Cloud Spanner Hibernate ORM Limitations

The Cloud Spanner Hibernate Dialect supports most of the standard Hibernate and Java Persistence annotations, but there are minor differences in supported features because of differences in Cloud Spanner from other traditional SQL databases.

Unsupported Feature Description

Large DML Transactions

Each Spanner transaction may only have up to 80,000 operations which modify rows of a table.

Catalog and schema scoping for table names

Tables name references cannot contain periods or other punctuation.

Mutations

Cloud Spanner supports both DML and mutations for modifying data. Hibernate does not support mutations, and mutations can therefore not be used with this Hibernate dialect.

Locking

Cloud Spanner does not support explicit lock clauses. Setting the lock mode of a query is therefore not supported.

Large DML Transactions Limits

Cloud Spanner has a mutation limit on each transaction - each Spanner transaction may only have up to 80,000 operations which modify rows of a table.

Note
Deleting a row counts as one operation and inserting/updating a single row will count as a number of operations equal to the number of affected columns. For example if one inserts a row that contains 5 columns, it counts as 5 modify operations for the insert.

Consequently, users must take care to avoid encountering these constraints.

  1. We recommend being careful with the use of CASCADE_TYPE.ALL in Entity annotations because, depending on the application, it might trigger a large number of entities to be deleted in a single transaction and bring you over the 80,000 limit.

  2. Also, when persisting a collection of entities, be mindful of the 80,000 mutations per transaction constraint.

Catalog/Schema Table Names

The Cloud Spanner Dialect only supports @Table with the name attribute. It does not support table names with catalog and/or schema components because Cloud Spanner does not support named catalogs and schemas:

// Supported.
@Table(
  name = "book"
)

// Not supported.
@Table(
  catalog = "public",
  schema = "store",
  name = "book"
)

Mutations

Cloud Spanner supports both DML and mutations for modifying data. Hibernate does not support mutations. You can therefore not use this Hibernate dialect to generate mutations for Cloud Spanner. The dialect will only generate DML statements.

Locking

Cloud Spanner does not support explicit locking clauses like SELECT …​ FOR UPDATE. Setting an explicit lock level in Hibernate is therefore not supported. The following is for example not supported:

entityManager.find(MyEntity.class, studentId, LockModeType.PESSIMISTIC_WRITE);