Google Cloud Bigtable: Python¶
To use the API, the Client
class defines a high-level interface which handles authorization
and creating other objects:
from gcloud_bigtable.client import Client
cluster = Client()
This will use the Google Application Default Credentials if you don’t pass any credentials of your own.
The Cluster Admin API has been fully implemented. Create a
Cluster
to get
a high-level interface to cluster management:
cluster = client.cluster(zone, cluster_id)
List Clusters¶
If you want a comprehensive list of all existing clusters, make a ListClusters request:
clusters = client.list_clusters()
This will return a list of
Cluster
s.
List Zones¶
If you aren’t sure which zone
to create a cluster in, find out
which zones your project has access to with a ListZones request:
zones = client.list_clusters()
This will return a list of string
s.
Create a new Cluster¶
After creating the cluster object, make a CreateCluster API request:
cluster.display_name = 'My very own cluster'
cluster.create()
If you would like more than the minimum number of nodes (3
) in your cluster:
cluster.serve_nodes = 10
cluster.create()
Note
When modifying a cluster (via a CreateCluster, UpdateCluster or
UndeleteCluster request), the Bigtable API will return a long-running
Operation. This will be stored on the object after each of
create()
,
update()
and
undelete()
are called.
Check on Current Operation¶
You can check if a long-running operation (for a
create()
,
update()
or
undelete()
) has finished
by making a GetOperation request:
>>> cluster.operation_finished()
True
Note
The operation data is stored in protected fields on the
Cluster
:
_operation_type
, _operation_id
and _operation_begin
.
If these are unset, then
operation_finished()
will fail. Also, these will be removed after a long-running operation
has completed (checked via this method). We could easily surface these
properties publicly, but it’s unclear if end-users would need them.
Get metadata for an existing Cluster¶
After creating the cluster object, make a GetCluster API request:
cluster.reload()
This will load serve_nodes
and display_name
for the existing
cluster
in addition to the cluster_id
, zone
and project_id
already set on the Cluster
object.
Update an existing Cluster¶
After creating the cluster object, make an UpdateCluster API request:
client.display_name = 'New display_name'
cluster.update()
Undelete a deleted Cluster¶
Make a UndeleteCluster API request:
cluster.undelete()
Documented Modules¶
Constants¶
Constants for Google Cloud Bigtable API.
-
gcloud_bigtable.constants.
CLUSTER_ADMIN_HOST
= 'bigtableclusteradmin.googleapis.com'¶ Cluster Admin API request host.
-
gcloud_bigtable.constants.
CLUSTER_ADMIN_PORT
= 443¶ Cluster Admin API request port.
-
gcloud_bigtable.constants.
TABLE_ADMIN_HOST
= 'bigtabletableadmin.googleapis.com'¶ Table Admin API request host.
-
gcloud_bigtable.constants.
TABLE_ADMIN_PORT
= 443¶ Table Admin API request port.
Data Connection¶
Connection to Google Cloud Bigtable Data API.
-
gcloud_bigtable.data_connection.
DATA_API_HOST
= 'bigtable.googleapis.com'¶ Data API request host.
-
class
gcloud_bigtable.data_connection.
DataConnection
(credentials=None)[source]¶ Bases:
object
Connection to Google Cloud Bigtable Data API.
Enables interaction with data in an existing table.
Parameters: credentials ( oauth2client.client.OAuth2Credentials
orNoneType
) – The OAuth2 Credentials to use for this connection.-
READ_ONLY_SCOPE
= 'https://www.googleapis.com/auth/cloud-bigtable.data.readonly'¶ Read-only scope for data API requests.
-
SCOPE
= 'https://www.googleapis.com/auth/cloud-bigtable.data'¶ Scope for data API requests.
-
read_rows
(table_name, row_key=None, row_range=None, filter_=None, allow_row_interleaving=None, num_rows_limit=None, timeout_seconds=10)[source]¶ Read rows from table.
Streams back the contents of all requested rows, optionally applying the same Reader filter to each. Depending on their size, rows may be broken up across multiple responses, but atomicity of each row will still be preserved.
Note
If neither
row_key
norrow_range
is set, reads from all rows. Otherwise, at most one ofrow_key
androw_range
can be set.Parameters: - table_name (string) – The name of the table we are reading from. Must be of the form “projects/../zones/../clusters/../tables/..” Since this is a low-level class, we don’t check this, rather we expect callers to pass correctly formatted data.
- row_key (bytes) – (Optional) The key of a single row from which to read.
- row_range (
_generated.bigtable_data_pb2.RowRange
) – (Optional) A range of rows from which to read. - filter (
_generated.bigtable_data_pb2.RowFilter
) – (Optional) The filter to apply to the contents of the specified row(s). If unset, reads the entire table. - allow_row_interleaving (boolean) – (Optional) By default, rows are read sequentially, producing results which are guaranteed to arrive in increasing row order. Setting “allow_row_interleaving” to true allows multiple rows to be interleaved in the response stream, which increases throughput but breaks this guarantee, and may force the client to use more memory to buffer partially-received rows.
- num_rows_limit (integer) – (Optional) The read will terminate after committing to N rows’ worth of results. The default (zero) is to return all results. Note that if “allow_row_interleaving” is set to true, partial results may be returned for more than N rows. However, only N “commit_row” chunks will be sent.
- timeout_seconds (integer) – Number of seconds for request time-out.
If not passed, defaults to
TIMEOUT_SECONDS
.
Return type: bigtable_service_messages_pb2.ReadRowsResponse
Returns: The response returned by the backend.
-
sample_row_keys
(table_name, timeout_seconds=10)[source]¶ Returns a sample of row keys in the table.
The returned row keys will delimit contiguous sections of the table of approximately equal size, which can be used to break up the data for distributed tasks like mapreduces.
Parameters: - table_name (string) – The name of the table we are taking the sample from. Must be of the form “projects/../zones/../clusters/../tables/..” Since this is a low-level class, we don’t check this, rather we expect callers to pass correctly formatted data.
- timeout_seconds (integer) – Number of seconds for request time-out.
If not passed, defaults to
TIMEOUT_SECONDS
.
Return type: messages_pb2.SampleRowKeysResponse
Returns: The sample row keys response returned.
-
Client¶
Parent client for calling the Google Cloud Bigtable API.
This is the base from which all interactions with the API occur.
In the hierarchy of API concepts * a client owns a cluster * a cluster owns a table * a table owns column families * a table owns data
-
gcloud_bigtable.client.
ADMIN_SCOPE
= 'https://www.googleapis.com/auth/cloud-bigtable.admin'¶ Scope for interacting with the Cluster Admin and Table Admin APIs.
-
class
gcloud_bigtable.client.
Client
(credentials=None, project_id=None, read_only=False, admin=False, user_agent='gcloud-bigtable-python', timeout_seconds=10)[source]¶ Bases:
object
Client for interacting with Google Cloud Bigtable API.
Parameters: - credentials (
oauth2client.client.OAuth2Credentials
orNoneType
) – (Optional) The OAuth2 Credentials to use for this cluster. If not provided, defaulst to the Google Application Default Credentials. - project_id (string) – (Optional) The ID of the project which owns the clusters, tables and data. If not provided, will attempt to determine from the environment.
- read_only (boolean) – (Optional) Boolean indicating if the data scope should be
for reading only (or for writing as well). Defaults to
False
. - admin (boolean) – (Optional) Boolean indicating if the client will be used to
interact with the Cluster Admin or Table Admin APIs. This
requires the
ADMIN_SCOPE
. Defaults toFalse
. - user_agent (string) – (Optional) The user agent to be used with API request.
Defaults to
DEFAULT_USER_AGENT
. - timeout_seconds (integer) – Number of seconds for request time-out. If not
passed, defaults to
DEFAULT_TIMEOUT_SECONDS
.
Raises: ValueError
if bothread_only
andadmin
areTrue
-
cluster
(zone, cluster_id, display_name=None, serve_nodes=3)[source]¶ Factory to create a cluster associated with this client.
Parameters: - zone (string) – The name of the zone where the cluster resides.
- cluster_id (string) – The ID of the cluster.
- display_name (string) – (Optional) The display name for the cluster in the Cloud Console UI. (Must be between 4 and 30 characters.) If this value is not set in the constructor, will fall back to the cluster ID.
- serve_nodes (integer) – (Optional) The number of nodes in the cluster. Defaults to 3.
Return type: Cluster
Returns: The cluster owned by this client.
-
credentials
¶ Getter for client’s credentials.
Return type: oauth2client.client.OAuth2Credentials
Returns: The credentials stored on the client.
-
classmethod
from_service_account_json
(json_credentials_path, project_id=None, read_only=False, admin=False)[source]¶ Factory to retrieve JSON credentials while creating client object.
Parameters: - json_credentials_path (string) – The path to a private key file (this file was given to you when you created the service account). This file must contain a JSON object with a private key and other credentials information (downloaded from the Google APIs console).
- project_id (string) – The ID of the project which owns the clusters,
tables and data. Will be passed to
Client
constructor. - read_only (boolean) – Boolean indicating if the data scope should be
for reading only (or for writing as well). Will be
passed to
Client
constructor. - admin (boolean) – Boolean indicating if the client will be used to
interact with the Cluster Admin or Table Admin APIs. Will
be passed to
Client
constructor.
Return type: Returns: The client created with the retrieved JSON credentials.
-
classmethod
from_service_account_p12
(client_email, private_key_path, project_id=None, read_only=False, admin=False)[source]¶ Factory to retrieve P12 credentials while creating client object.
Note
Unless you have an explicit reason to use a PKCS12 key for your service account, we recommend using a JSON key.
Parameters: - client_email (string) – The e-mail attached to the service account.
- private_key_path (string) – The path to a private key file (this file was given to you when you created the service account). This file must be in P12 format.
- project_id (string) – The ID of the project which owns the clusters,
tables and data. Will be passed to
Client
constructor. - read_only (boolean) – Boolean indicating if the data scope should be
for reading only (or for writing as well). Will be
passed to
Client
constructor. - admin (boolean) – Boolean indicating if the client will be used to
interact with the Cluster Admin or Table Admin APIs. Will
be passed to
Client
constructor.
Return type: Returns: The client created with the retrieved P12 credentials.
-
list_clusters
(timeout_seconds=None)[source]¶ Lists clusters owned by the project.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on client. Return type: tuple Returns: A pair of results, the first is a list of Cluster
s returned and the second is a list of strings (the failed zones in the request).
-
list_zones
(timeout_seconds=None)[source]¶ Lists zones associated with project.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on client. Return type: list of strings Returns: The names of the zones Raises: ValueError
if one of the zones is not inOK
state.
-
project_id
¶ Getter for client’s project ID.
Return type: string Returns: The project ID stored on the client.
-
project_name
¶ Project name to be used with Cluster Admin API.
Note
This property will not change if
project_id
does not, but the return value is not cached.The project name is of the form “projects/{project_id}”.
Return type: string Returns: The project name to be used with the Cloud Bigtable Admin API RPC service.
- credentials (
-
gcloud_bigtable.client.
DATA_SCOPE
= 'https://www.googleapis.com/auth/cloud-bigtable.data'¶ Scope for reading and writing table data.
-
gcloud_bigtable.client.
PROJECT_ENV_VAR
= 'GCLOUD_PROJECT'¶ Environment variable used to provide an implicit project ID.
-
gcloud_bigtable.client.
READ_ONLY_SCOPE
= 'https://www.googleapis.com/auth/cloud-bigtable.data.readonly'¶ Scope for reading table data.
Cluster¶
User friendly container for Google Cloud Bigtable Cluster.
-
class
gcloud_bigtable.cluster.
Cluster
(zone, cluster_id, client, display_name=None, serve_nodes=3)[source]¶ Bases:
object
Representation of a Google Cloud Bigtable Cluster.
We can use a
Cluster
to:reload()
itselfcreate()
itself- Check if an
operation_finished()
(each ofcreate()
,update()
andundelete()
return with long-running operations) update()
itselfdelete()
itselfundelete()
itself
Note
For now, we leave out the properties
hdd_bytes
andssd_bytes
(both integers) and also thedefault_storage_type
(an enum) which if not sent will end up asdata_pb2.STORAGE_SSD
.Parameters: - zone (string) – The name of the zone where the cluster resides.
- cluster_id (string) – The ID of the cluster.
- client (
client.Client
) – The client that owns the cluster. Provides authorization and a project ID. - display_name (string) – (Optional) The display name for the cluster in the Cloud Console UI. (Must be between 4 and 30 characters.) If this value is not set in the constructor, will fall back to the cluster ID.
- serve_nodes (integer) – (Optional) The number of nodes in the cluster. Defaults to 3.
-
client
¶ Getter for cluster’s client.
Return type: client.Client
Returns: The client stored on the cluster.
-
create
(timeout_seconds=None)[source]¶ Create this cluster.
Note
Uses the
project_id
,zone
andcluster_id
on the currentCluster
in addition to thedisplay_name
andserve_nodes
. If you’d like to change them before creating, reset the values viacluster.display_name = 'New display name' cluster.cluster_id = 'i-changed-my-mind'
before calling
create()
.Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster.
-
credentials
¶ Getter for cluster’s credentials.
Return type: oauth2client.client.OAuth2Credentials
Returns: The credentials stored on the cluster’s client.
-
delete
(timeout_seconds=None)[source]¶ Delete this cluster.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster.
-
classmethod
from_pb
(cluster_pb, client)[source]¶ Creates a cluster instance from a protobuf.
Parameters: - cluster_pb (
bigtable_cluster_data_pb2.Cluster
) – A cluster protobuf object. - client (
client.Client
) – The client that owns the cluster.
Return type: Returns: The cluster parsed from the protobuf response.
Raises: ValueError
if the cluster name does not match_CLUSTER_NAME_RE
or if the parsed project ID does not match the project ID on the client.- cluster_pb (
-
list_tables
(timeout_seconds=None)[source]¶ List the tables in this cluster.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster. Return type: list of Table
Returns: The list of tables owned by the cluster. Raises: ValueError
if one of the returned tables has a name that is not of the expected format.
-
name
¶ Cluster name used in requests.
Note
This property will not change if
zone
andcluster_id
do not, but the return value is not cached.The cluster name is of the form “projects/{project_id}/zones/{zone}/clusters/{cluster_id}”.
Return type: string Returns: The cluster name.
-
operation_finished
(timeout_seconds=None)[source]¶ Check if the current operation has finished.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster. Return type: boolean Returns: A boolean indicating if the current operation has completed. Raises: ValueError
if there is no current operation set.
-
project_id
¶ Getter for cluster’s project ID.
Return type: string Returns: The project ID for the cluster (is stored on the client).
-
reload
(timeout_seconds=None)[source]¶ Reload the metadata for this cluster.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster.
-
table
(table_id)[source]¶ Factory to create a table associated with this cluster.
Parameters: table_id (string) – The ID of the table. Return type: Table
Returns: The table owned by this cluster.
-
timeout_seconds
¶ Getter for cluster’s default timeout seconds.
Return type: integer Returns: The timeout seconds default stored on the cluster’s client.
-
undelete
(timeout_seconds=None)[source]¶ Undelete this cluster.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster.
-
update
(timeout_seconds=None)[source]¶ Update this cluster.
Note
Updates the
display_name
andserve_nodes
. If you’d like to change them before updating, reset the values viacluster.display_name = 'New display name' cluster.serve_nodes = 3
before calling
update()
.Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on cluster.
Table¶
User friendly container for Google Cloud Bigtable Table.
-
class
gcloud_bigtable.table.
Table
(table_id, cluster)[source]¶ Bases:
object
Representation of a Google Cloud Bigtable Table.
Note
We don’t define any properties on a table other than the name. As the proto says, in a request:
Thename
field of the Table and all of its ColumnFamilies must be left blank, and will be populated in the response.This leaves only the
current_operation
andgranularity
fields. Thecurrent_operation
is only used for responses whilegranularity
is an enum with only one value.We can use a
Table
to:create()
the tablerename()
the tabledelete()
the tablelist_column_families()
in the table
Parameters: - table_id (string) – The ID of the table.
- cluster (
cluster.Cluster
) – The cluster that owns the table.
-
client
¶ Getter for table’s client.
Return type: client.Client
Returns: The client that owns this table.
-
cluster
¶ Getter for table’s cluster.
Return type: cluster.Cluster
Returns: The cluster stored on the table.
-
column_family
(column_family_id, gc_rule=None)[source]¶ Factory to create a column family associated with this table.
Parameters: - column_family_id (string) – The ID of the column family.
- gc_rule (
column_family.GarbageCollectionRule
,column_family.GarbageCollectionRuleUnion
orcolumn_family.GarbageCollectionRuleIntersection
) – (Optional) The garbage collection settings for this column family.
Return type: Returns: A column family owned by this table.
-
create
(initial_split_keys=None, timeout_seconds=None)[source]¶ Creates this table.
Note
Though a
_generated.bigtable_table_data_pb2.Table
is also allowed (as thetable
property) in a create table request, we do not support it in this method. As mentioned in theTable
docstring, the name is the only useful property in the table proto.Note
A create request returns a
_generated.bigtable_table_data_pb2.Table
but we don’t use this response. The proto definition allows for the inclusion of acurrent_operation
in the response, but in example usage so far, it seems the Bigtable API does not return any operation.Parameters: - initial_split_keys (iterable of strings) – (Optional) List of row keys that will be
used to initially split the table into
several tablets (Tablets are similar to
HBase regions). Given two split keys,
"s1"
and"s2"
, three tablets will be created, spanning the key ranges:[, s1)
,[s1, s2)
,[s2, )
. - timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on table.
- initial_split_keys (iterable of strings) – (Optional) List of row keys that will be
used to initially split the table into
several tablets (Tablets are similar to
HBase regions). Given two split keys,
-
delete
(timeout_seconds=None)[source]¶ Delete this table.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on table.
-
list_column_families
(timeout_seconds=None)[source]¶ Check if this table exists.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on table. Return type: dictionary with string as keys and column_family.ColumnFamily
as valuesReturns: List of column families attached to this table. Raises: ValueError
if the column family name from the response does not agree with the computed name from the column family ID.
-
name
¶ Table name used in requests.
Note
This property will not change if
table_id
does not, but the return value is not cached.The table name is of the form
"projects/../zones/../clusters/../tables/{table_id}"
Return type: string Returns: The table name.
-
rename
(new_table_id, timeout_seconds=None)[source]¶ Rename this table.
Note
This cannot be used to move tables between clusters, zones, or projects.
Note
The Bigtable Table Admin API currently returns
BigtableTableService.RenameTable is not yet implemented
when this method is used. It’s unclear when this method will actually be supported by the API.
Parameters: - new_table_id (string) – The new name table ID.
- timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on table.
-
timeout_seconds
¶ Getter for table’s default timeout seconds.
Return type: integer Returns: The timeout seconds default stored on the table’s client.
Table Column Families¶
When creating a Table
,
it is possible to set garbage collection rules for expired data.
By setting a rule, cells in the table matching the rule will be deleted during periodic garbage collection (which executes opportunistically in the background).
The types
GarbageCollectionRule
,
GarbageCollectionRuleUnion
and
GarbageCollectionRuleIntersection
can all be used as the optional gc_rule
argument in the
ColumnFamily
constructor. This value is then used in the
create
and
update
methods.
These rules can be nested arbitrarily, with
GarbageCollectionRule
at the lowest level of the nesting:
import datetime
max_age = datetime.timedelta(days=3)
rule1 = GarbageCollectionRule(max_age=max_age)
rule2 = GarbageCollectionRule(max_num_versions=1)
# Make a composite that matches anything older than 3 days **AND**
# with more than 1 version.
rule3 = GarbageCollectionIntersection(rules=[rule1, rule2])
# Make another composite that matches our previous intersection
# **OR** anything that has more than 3 versions.
rule4 = GarbageCollectionRule(max_num_versions=3)
rule5 = GarbageCollectionUnion(rules=[rule3, rule4])
Column Family Module¶
User friendly container for Google Cloud Bigtable Column Family.
-
class
gcloud_bigtable.column_family.
ColumnFamily
(column_family_id, table, gc_rule=None)[source]¶ Bases:
object
Representation of a Google Cloud Bigtable Column Family.
We can use a
ColumnFamily
to:Parameters: - column_family_id (string) – The ID of the column family.
- table (
table.Table
) – The table that owns the column family. - gc_rule (
GarbageCollectionRule
,GarbageCollectionRuleUnion
orGarbageCollectionRuleIntersection
) – (Optional) The garbage collection settings for this column family.
-
client
¶ Getter for column family’s client.
Return type: client.Client
Returns: The client that owns this column family.
-
create
(timeout_seconds=None)[source]¶ Create this column family.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on column family.
-
delete
(timeout_seconds=None)[source]¶ Delete this column family.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on column family.
-
name
¶ Column family name used in requests.
Note
This property will not change if
column_family_id
does not, but the return value is not cached.The table name is of the form
"projects/../zones/../clusters/../tables/../columnFamilies/.."
Return type: string Returns: The column family name.
-
table
¶ Getter for column family’s table.
Return type: table.Table
Returns: The table stored on the column family.
-
timeout_seconds
¶ Getter for column family’s default timeout seconds.
Return type: integer Returns: The timeout seconds default.
-
update
(timeout_seconds=None)[source]¶ Update this column family.
Note
The Bigtable Table Admin API currently returns
BigtableTableService.UpdateColumnFamily is not yet implemented
when this method is used. It’s unclear when this method will actually be supported by the API.
Parameters: timeout_seconds (integer) – Number of seconds for request time-out. If not passed, defaults to value set on column family.
-
class
gcloud_bigtable.column_family.
GarbageCollectionRule
(max_num_versions=None, max_age=None)[source]¶ Bases:
object
Table garbage collection rule.
Cells in the table fitting the rule will be deleted during garbage collection.
These values can be combined via
GarbageCollectionRuleUnion
andGarbageCollectionRuleIntersection
.Note
At most one of
max_num_versions
andmax_age
can be specified at once.Note
A string
gc_expression
can also be used with API requests, but that value would be superceded by agc_rule
. As a result, we don’t support that feature and instead support via this native object.Parameters: - max_num_versions (integer) – The maximum number of versions
- max_age (
datetime.timedelta
) – The maximum age allowed for a cell in the table.
Raises: ValueError
if bothmax_num_versions
andmax_age
are set.-
to_pb
()[source]¶ Converts the
GarbageCollectionRule
to a protobuf.Return type: data_pb2.GcRule
Returns: The converted current object.
-
class
gcloud_bigtable.column_family.
GarbageCollectionRuleIntersection
(rules=None)[source]¶ Bases:
object
Intersection of garbage collection rules.
Parameters: rules (list) – List of garbage collection rules, unions and/or intersections.