Skip to content

Other Services

ML

  • AI > ML > Deep Learning
    • AI is more general discipline
    • ML a toolset within AI discipline
    • deep learning is a type of ML that can process unstructured data such as images, speech, video, NLP etc
  • Many ways to implement ML on GCP
    • ML Frameworks: such as TensorFlow for advanced users
    • Cloud AutoML: Bring your own data to train model
    • Pre-trained Models: services such as NLP API, Vision API, ML, Speech API, Translate API
  • Three ways to build ML model:
    • BigQueryML: SQL Based, fast, runs in minutes, not very accurate for production
    • AutoML: runs in a few hours, does feature engineering, much more accurate
    • Custom: (using Keras, TensorFlow and Python),
  • TensorFlow ML requires
    • An Estimator object that implements fit(), eval(), predict() and export_savedmodel() methods
    • An input function that returns a tuple of features and labels.
      • Features are a dict containing feature column and tensor, labels contains tensor containing label column
    • A serving function that make predictions based on user input, returns a tuple of features and inputs
      • it's similar to training function, except the user input data might have different format
    • Train and Evaluate functions
      • defined using estimator, TrainSpec and EvalSpec

AutoML

  • 3 phases Train, Deploy, Serve
  • Train with Prepared Dataset
    • Prepared Dataset must reside in the same bucket as source files.
      • 1st optional column identifies TRAIN, VALIDATION, TEST; 80-10-10 is the default
    • VALIDATION data is used for identify incorrect labels and improve the model
    • TEST data is used for model evaluation and remove bias
  • One run through all trained groups is called Epoch
  • Custom models are temporary and deleted after some time
    • cannot be exported
    • must train new models periodically
  • Serve provisions a REST endpoint
    • accepts model name and payload (data that needs to be classified)
    • returns displayName (matched labels), classification and score
  • AutoML Vision: classify images
    • ROT training data: low score => increase data; perfect score => increase variety
  • AutoML NLP
  • AutoML Table: for structured data in tabular format

Deployment Manager

  • Supports python or jinja2 templates
  • python template must
    • define GenerateConfig(context) or generate_config(config) method
    • return python dict
  • names used are from the template and not from configuration
  • both, template and configuration, can use references
  • jinja2 templates can reference environment variables using env and properties using properties
  • All resources are created in parallel except that have dependencies
  • Use selfLink to ensure dependencies are created first, e.g. refer a network selfLink in firewall to ensure VPC is create before firewall rule
  • Use gcloud deployment-manager types list | grep instance to find exact type of resource
  • Use gcloud deployment-manager deployments create dminfra --config=config.yaml --preview to preview
    • immediately after --preview, gcloud deployment-manager deployments update dminfra doesn't need --config=...

Stackdriver

  • monitoring, logging, alert and notification, Application Performance Management (APM)
  • APM includes: application profiler, debugger and cloud trace
  • can run query against logs (json fields)
  • Logging
    • recommended to install logging agent on VM or EC2 instances
    • Platform, system and application logging
    • logs retained only for 30 days
  • Tracing: real-time, latency reporting
    • App Engine, Google HTTP(S) load balancers and any application using Slackdriver Trace SDK
  • Debugging:
    • ability to inject debug logging at runtime
  • Error reporting: supported by all services: Kubernetes, App Engine, Compute Engine
  • multi-cloud can monitor AWS
  • Q: how do you live debug microservice that bursts logs? A: set up monitor for a burst of log-lines in stackdriver
  • App Engine, Kubernetes don't need logging agent installed, but compute engine does

Monitoring

  • A workspace can monitor other projects than the hosting project
    • Anyone with access to workspace can monitor all projects that are in the workspace
    • BP: Use different workspace to limit visibility of specific projects is desired
  • Without Monitoring Agent, still can monitor CPU Utilization, Some Disk usage, network traffic and uptime information
  • Monitor domain specific information by defining Custom Metric (eg number of gamers on a VM)
  • Chart consists of
    • Metric: Resource Type, Metric, Filter, Group By, Aggregator
    • View Options: Threshold, "Compare to Past", Use Log scale
  • Alerting consists of 1+ conditions and 1+ Notifications
    • Condition consists of a Metric and Condition (above/below etc), Threshold (value) and For (how long)
    • Notification Channel is email, SMS, pager, slack, pubsub (beta)
    • creates an incident when alerting policy fails, automatically resolves when policy succeeds again
      • views past incidents in Events
  • Resource Group allows grouping resources using name, tag, region, resource type etc
    • a dashboard is automatically created
  • Uptime checks
    • Check Type: HTTP, HTTPS, TCP
    • Resource Type: URL, VM Instance, App Engine, ELB or AWS Instance
    • Applies To: Single or Group
    • Check Every: 1/5/10/15 minutes