Skip to content

Other Services

ML

  • AI > ML > Deep Learning
  • AI is more general discipline
  • ML a toolset within AI discipline
  • deep learning is a type of ML that can process unstructured data such as images, speech, video, NLP etc
  • Many ways to implement ML on GCP
  • ML Frameworks: such as TensorFlow for advanced users
  • Cloud AutoML: Bring your own data to train model
  • Pre-trained Models: services such as NLP API, Vision API, ML, Speech API, Translate API
  • Three ways to build ML model:
  • BigQueryML: SQL Based, fast, runs in minutes, not very accurate for production
  • AutoML: runs in a few hours, does feature engineering, much more accurate
  • Custom: (using Keras, TensorFlow and Python),
  • TensorFlow ML requires
  • An Estimator object that implements fit(), eval(), predict() and export_savedmodel() methods
  • An input function that returns a tuple of features and labels.
    • Features are a dict containing feature column and tensor, labels contains tensor containing label column
  • A serving function that make predictions based on user input, returns a tuple of features and inputs
    • it's similar to training function, except the user input data might have different format
  • Train and Evaluate functions
    • defined using estimator, TrainSpec and EvalSpec

AutoML

  • 3 phases Train, Deploy, Serve
  • Train with Prepared Dataset
  • Prepared Dataset must reside in the same bucket as source files.
    • 1st optional column identifies TRAIN, VALIDATION, TEST; 80-10-10 is the default
  • VALIDATION data is used for identify incorrect labels and improve the model
  • TEST data is used for model evaluation and remove bias
  • One run through all trained groups is called Epoch
  • Custom models are temporary and deleted after some time
  • cannot be exported
  • must train new models periodically
  • Serve provisions a REST endpoint
  • accepts model name and payload (data that needs to be classified)
  • returns displayName (matched labels), classification and score
  • AutoML Vision: classify images
  • ROT training data: low score => increase data; perfect score => increase variety
  • AutoML NLP
  • AutoML Table: for structured data in tabular format

Deployment Manager

  • Supports python or jinja2 templates
  • python template must
  • define GenerateConfig(context) or generate_config(config) method
  • return python dict
  • names used are from the template and not from configuration
  • both, template and configuration, can use references
  • jinja2 templates can reference environment variables using env and properties using properties
  • All resources are created in parallel except that have dependencies
  • Use selfLink to ensure dependencies are created first, e.g. refer a network selfLink in firewall to ensure VPC is create before firewall rule
  • Use gcloud deployment-manager types list | grep instance to find exact type of resource
  • Use gcloud deployment-manager deployments create dminfra --config=config.yaml --preview to preview
  • immediately after --preview, gcloud deployment-manager deployments update dminfra doesn't need --config=...

Stackdriver

  • monitoring, logging, alert and notification, Application Performance Management (APM)
  • APM includes: application profiler, debugger and cloud trace
  • can run query against logs (json fields)
  • Logging
  • recommended to install logging agent on VM or EC2 instances
  • Platform, system and application logging
  • logs retained only for 30 days
  • Tracing: real-time, latency reporting
  • App Engine, Google HTTP(S) load balancers and any application using Slackdriver Trace SDK
  • Debugging:
  • ability to inject debug logging at runtime
  • Error reporting: supported by all services: Kubernetes, App Engine, Compute Engine
  • multi-cloud can monitor AWS
  • Q: how do you live debug microservice that bursts logs? A: set up monitor for a burst of log-lines in stackdriver
  • App Engine, Kubernetes don't need logging agent installed, but compute engine does

Monitoring

  • A workspace can monitor other projects than the hosting project
  • Anyone with access to workspace can monitor all projects that are in the workspace
  • BP: Use different workspace to limit visibility of specific projects is desired
  • Without Monitoring Agent, still can monitor CPU Utilization, Some Disk usage, network traffic and uptime information
  • Monitor domain specific information by defining Custom Metric (eg number of gamers on a VM)
  • Chart consists of
  • Metric: Resource Type, Metric, Filter, Group By, Aggregator
  • View Options: Threshold, "Compare to Past", Use Log scale
  • Alerting consists of 1+ conditions and 1+ Notifications
  • Condition consists of a Metric and Condition (above/below etc), Threshold (value) and For (how long)
  • Notification Channel is email, SMS, pager, slack, pubsub (beta)
  • creates an incident when alerting policy fails, automatically resolves when policy succeeds again
    • views past incidents in Events
  • Resource Group allows grouping resources using name, tag, region, resource type etc
  • a dashboard is automatically created
  • Uptime checks
  • Check Type: HTTP, HTTPS, TCP
  • Resource Type: URL, VM Instance, App Engine, ELB or AWS Instance
  • Applies To: Single or Group
  • Check Every: 1/5/10/15 minutes