Other
SageMaker
Types of SageMake Pipeline steps
Processing: requires a processor, a python script, for processing data and model evaluation, e.g. pyspark job that does data cleaning
Training: Training a model, requires estimator, training and validation inputs
Tuning: hyperparameter tuning. Associated with SageMaker experiment , runs multiple training jobs as trials . Requires a HyperparameterTuner and TrainingInput
can get top performing model(s) (max 50) from the job
CreateModel: create a model
RegisterModel: Register Model or PipelineModel with Sagemaker Registry
Transform: batch transformation to run inference on entire dataset
Kinesis Streaming Data
Streams
Scaling using shards
24h to 7 days retention
EC2 consumers can write to S3, RedShift etc
Firehose
No scaling required
No retention period because it’s delivered to destinations such as
- S3, RedShift (via S3), Lambda, Elastic Search
Analytics
Uses SQL to process data in Streams/Firehose
Write data to S3, ElasticSearch, Redshift
BP: Kinesis is for handling massive streaming data, whereas SQS is for inter application communication
SWF Simple Workflow Service
1 year retention
Has 3 type of Actors: WF starters, Deciders and Activity Workers
Can involve non-AWS or manual tasks as part of workflow
CloudWatch: Monitor CPU, network, IOs,
Available via: console, API, SDK or CLI
CloudFormation: JSON formatted template
AWS Trusted Advisor: Best practices in Cost, Security, Performance and Fault Tolerance
Analytics
Elastic Map Reduce Apache Hadoop + EC2, + S3
Kinesis: Streaming
Application Services
API Gateway: Frontend to lambda, Kinesis or HTTP
Traffic management, Access, Monitoring, API versioning
Simple Queue Service (SQS): messaging queuing
Enterprise Applications
Workspaces: Windows, Mac, Chromebook, iPad, Fire/Android tablets
Ports: TCP 443, TCP+UDP: 4172
Value (2GB), Standard Plus (4GB), Performance Plus (7.5GB)
MS Office, Trend Micro
SWF: Simple WorkFlow
AWS Services Access
Management Console, CLI, SDK, Query API (Using HTTP)
Support: 4 levels (Basic (free), Developer, Business, Enterprise)
AWS DevOps
Code pipeline: CodeCommit, CodeBuild, CodeDeploy, CodePipeline
AWS Infrastructure as code: CloudFormation, OpsWorks (Chef), Config
Active Monitoring: CloudWatch, CloudTrail (API caller info such as IP, time params etc)
PaaS: EBS (Elastic Beanstalk): upload code and beanstalk does provisioning, scale, load balancing, health monitoring
Cloud Migration
Unmanaged: rsync, S3 and glacier CLI
Replace Internet: Direct connect, Snowball, S3 Transfer acceleration
Friends S3: Gateways, Partners, Kinesis Firehose
- Gateways: on-premise device that interfaces with SAN or VTL
- Partners: Write backups directly to S3
- Kinesis Firehose: load streaming data to S3/Redshift/Elastic Search
Application DIscovery Service: identify running on-premise applications, performance profiles, configuration data
AWS SMS, Server Migration Service: migrate VM servers including volumes
AWS DMS: Database Migration Service
Uses replication, allows to migrate to different database
AWS Web Architecture
Auto Scaling Group=[Security Group[EC2+CloudWatch] in AZ1 + SecGroup in AZ2]
Can be triggered to expand/shrink
Sec Group = Protocol + Ports + IP ranges
Eg Internet => only ports 80+443 to webserver
Eg Corporate => only ports 22 to App servers
No access to DB servers
Mobile
Cognito: Sign-In
Sync data using Cognito and S3
SNS: Push notification
KMS
AWS Pricing
Compute, Storage, Data Transfer
Simple Monthly Calculator,
TCO Calculator: compare on-premise vs AWS
Billing and Cost Management Console: View, pay bills
Security and Compliance
Infrastructure:
Facility: video surveillance, two factor
Decommissioning; magnetic storage degaussed and physically destroyed
Compliance: HIPAA, ISO 27001, NIST etc
Network
Boundary devices, ACL
Secure Access points: API endpoint, HTTPS access
Transmission protection: TLS, VPC, IPSec VPN
Amazon Corporate is segregated from AWS
Fault Tolerant
Network monitoring and protection, prevents
- DDoS, Man in the Middle, IP Spoofing, packet sniffing, port scanning
Computer: shared instances are isolated by Xen hypervisor, AWS FIrewall, Signed API calls
Database security:
RDS: DB security groups, permissions
DynamoDB: requires HMAC-SHA256
Storage: KMS (Key Management Service), S3 encryption client, S3 server side encryption
IAM: Least privilege, explicitly permission, temporary access, multi-factor
CloudTrail: API call tracking