Sunday, December 07, 2014

Using a Brother MFC-J450DW All-In-One with Ubuntu

Are you thinking about buying a Brother MFC-J450DW Wireless Office All-In-One Printer/Scanner/Copier/Fax Machine and wondering if it will work with your Ubuntu computer? The answer is YES! Brother actually includes Linux on their Supported OS Page.  I was able to get things working on Ubuntu 14.04 LTS "Trusty Tahr".

+

Brother Software Installation


Driver Downloads

The CD provided with the printer contains Drivers, Utilities, and Users Manuals, but only for Windows and Macintosh computers.  To get the Linux Drivers, you must visit the MFC-J450DW Downloads Page.  Select Linux as the OS and Linux (deb) as the OS Version or go directly to this page.  Follow directions to download and install the Driver Install Tool.

From a terminal:

gunzip linux-brprinter-installer-*.*.*-*.gz
sudo su -
bash linux-brprinter-installer-*.*.*-* MFC-J450DW
Will you specify the DeviceURI? Y, 14 (Auto)

After all this, you should be able to see your new printer:

System Settings -> Printers -> MFCJ450DW

You should also be able to scan with the application Simple Scan which comes with Ubuntu.

No Brother Software Installation


If you would rather not install any software on your computer whatsoever, the MFC-J450DW makes this possible.  I actually prefer this because it makes it easy to print and scan to and from any device you might have including Macs and iOS devices.

Google Cloud Print

Another option is to setup Google Cloud Print which lets you print from any device via your Google Account.  Brother's Google Cloud Print Instruction Page just tells you to find the right manual for your model.  If you visit the Brother MFC-J450DW Manuals Page, you will see the Brother Google Cloud Print Guide Page which let's you download the actual Brother Google Cloud Print Guide [PDF].  To save time, just make sure you have a Google Account and then go directly to your printer's admin web interface which is accessible via a web browser at http://[PRINTER_IP_ADDRESS]/ where PRINTER_IP_ADDRESS is the address of your printer on the local network.  I was able to obtain this IP address by looking at my router's list of connected clients.  You can also find out via the printer's control panel by selecting Menu -> Network -> WLAN -> TCP/IP -> IP Address.

The printer's Google Cloud Print admin page, at http://[PRINTER_IP_ADDRESS]/net/net/gcp.html?pageid=1084, only let's you associate the printer with one Google account.  However Google let's you share the printer with other people very easily.

Scan to the Web

Another option for scanning is to use Brother Web Connect.  Instructions can be found at the Brother Web Connect Page which lets you download the Brother Web Connect Guide [PDF].  I found it very easy to setup Web Connect for Google Drive and Dropbox.  Other options available that I haven't tried yet are Picassa, Flickr, Facebook, Evernote, Onedrive, and Box.  After you get it setup, you simply use the printer's control panel with WEB -> [Service] -> [Configured Service].  After scanning, the file will show up in your Configured Service in a folder called From_BrotherDevice.  So easy!

Please comment if you have any problems or other tips for people trying to get the Brother MFC-J450DW (or similar printer) working with Ubuntu.

Sunday, November 23, 2014

Facebook Friend Clustering

My wife is currently reading Dataclysm: Who We Are (When We Think No One's Looking) and pointed me to this interesting Relationship Test tool she learned about in the book. It analyzes your Facebook friends and then shows both their connections to you and to each other. The graph assigns weight to relationships, so "cliques" of friends will cluster together.  Here is the result I get:

Ken Weiner Friend Graph
Ken Weiner's Friend Graph

The white circle in the center is myself.  The cluster in the upper left identifies my friends from my
high school, Agoura High School.  The cluster in the lower right is mostly coworkers from my previous job at LowerMyBills.com. The 2 less-defined clusters to the right of my high school cluster is a mix of family and college friends.

In addition to visualizing friend clusters, this tool also tries to identify your spouse or romantic partner using an algorithm from a paper called Romantic Partnerships and the Dispersion of Social Ties:
A Network Analysis of Relationship Status on Facebook. It tries to answer this question:
Given all the connections among a person's friends, can you recognize his or her romantic partner from the network structure alone?
It measures something called dispersion - the extent to which two people's mutual friends are not themselves well-connected.

On my network, it worked extremely well, identifying my wife as the person with whom I have the highest dispersion score (called assimilation score in this tool).

Portion of my Facebook Connections Sorted by Assimilation Score
Portion of my Facebook Connections Sorted by Assimilation Score

This means that it is through my wife that I am connected to the most people that aren't themselves connected.  This is natural, of course, because I have met my wife's high school friends, college friends, and coworkers, most of which don't know each other.

There are a few people that rank high on the list that aren't actually "central" to my life.  If I ignore all the people with less that 3 mutual friends, the list becomes much more meaningful and predictive of who really is central to my life.

The ability to analyze my social network like this motivated me to visit Facebook and Unfriend acquaintances that I really never developed much of a relationship with.  Now, both my Facebook news feed and Dispersion scores will be that much more meaningful!

Sunday, November 16, 2014

HTTP Request Timings with cURL

When a client makes an HTTP request, the following things happen:

  1. DNS name resolution
  2. SSL/SSH/etc connect/handshake to the remote host if applicable
  3. TCP connection to the remote host
  4. Negotiations specific to the particular protocol(s) involved
  5. Redirects if applicable
  6. Content generation on remote host
  7. Content transfer

Have you ever wanted to know how long each phase of an HTTP request takes?

It turns out this is relatively straightforward to do with the command line tool cURL and its powerful "Write Out" option.

Write Out Option

cURL comes with an option to print out a lot of useful information related to a request.  The following is taken directly from the cURL man page:

-w, --write-out

Defines what to display on stdout after a completed and successful operation. The format is a string that may contain plain text mixed with any number of variables. The string can be specified as "string", to get read from a particular file you specify it "@filename" and to tell curl to read the format from stdin you write "@-".

The variables present in the output format will be substituted by the value or text that curl thinks  fit, as described below. All variables are specified as %{variable_name} and to output a normal % you just write them as %%. You can output a newline by using \n, a carriage return with \r and a tab space with \t.

NOTE: The %-symbol is a special symbol in the win32-environment, where all occurrences of % must be doubled when using this option.

The variables available are:

VariableDescriptionSince
content_typeThe Content-Type of the requested document, if there was any.
filename_effectiveThe ultimate filename that curl writes out to. This is only meaningful if curl is told to write to a file with the --remote-name or --output option. It's most useful in combination with the --remote-header-name option.7.25.1
ftp_entry_pathThe initial path curl ended up in when logging on to the remote FTP server.7.15.4
http_codeThe numerical response code that was found in the last retrieved HTTP(S) or FTP(s) transfer. In 7.18.2 the alias response_code was added to show the same info.
http_connectThe numerical code that was found in the last response (from a proxy) to a curl CONNECT request.7.12.4
local_ipThe IP address of the local end of the most recently done connection - can be either IPv4 or IPv6.7.29.0
local_portThe local port number of the most recently done connection.7.29.0
num_connectsNumber of new connects made in the recent transfer.7.12.3
num_redirectsNumber of redirects that were followed in the request.7.12.3
redirect_urlWhen an HTTP request was made without -L to follow redirects, this variable will show the actual URL a redirect would take you to.7.18.2
remote_ipThe remote IP address of the most recently done connection - can be either IPv4 or IPv6.7.29.0
remote_portThe remote port number of the most recently done connection.7.29.0
size_downloadThe total amount of bytes that were downloaded.
size_headerThe total amount of bytes of the downloaded headers.
size_requestThe total amount of bytes that were sent in the HTTP request.
size_uploadThe total amount of bytes that were uploaded.
speed_downloadThe average download speed that curl measured for the complete download. Bytes per second.
speed_uploadThe average upload speed that curl measured for the complete upload. Bytes per second.
ssl_verify_resultThe result of the SSL peer certificate verification that was requested. 0 means the verification was successful.7.19.0
time_appconnectThe time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed.7.19.0
time_connectThe time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.
time_namelookupThe time, in seconds, it took from the start until the name resolving was completed.
time_pretransferThe time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.
time_redirectThe time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections.7.12.3
time_starttransferThe time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
time_totalThe total time, in seconds, that the full operation lasted. The time will be displayed with millisecond resolution.
url_effectiveThe URL that was fetched last. This is most meaningful if you've told curl to follow location: headers.

Here's a sample cURL command to show each one of these properties while making a request to Google's home page:

curl -w '\ncontent_type=%{content_type}\nfilename_effective=%{filename_effective}\nftp_entry_path=%{ftp_entry_path}\nhttp_code=%{http_code}\nhttp_connect=%{http_connect}\nlocal_ip=%{local_ip}\nlocal_port=%{local_port}\nnum_connects=%{num_connects}\nnum_redirects=%{num_redirects}\nredirect_url=%{redirect_url}\nremote_ip=%{remote_ip}\nremote_port=%{remote_port}\nsize_download=%{size_download}\nsize_header=%{size_header}\nsize_request=%{size_request}\nsize_upload=%{size_upload}\nspeed_download=%{speed_download}\nspeed_upload=%{speed_upload}\nssl_verify_result=%{ssl_verify_result}\ntime_appconnect=%{time_appconnect}\ntime_connect=%{time_connect}\ntime_namelookup=%{time_namelookup}\ntime_pretransfer=%{time_pretransfer}\ntime_redirect=%{time_redirect}\ntime_starttransfer=%{time_starttransfer}\ntime_total=%{time_total}\nurl_effective=%{url_effective}\n\n' -o /dev/null -s 'https://www.google.com/'

Where:
  • -w shows which properties to write out
  • -o /dev/null redirects the output of the request to /dev/null
  • -s tells cURL not to show a progress bar
  • http://www.cnn.com/ is the URL we are requesting

Timing

From the options above, the following are of interest when it comes to timing the phases of a request.

time_appconnectThe time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed.7.19.0
time_connectThe time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.
time_namelookupThe time, in seconds, it took from the start until the name resolving was completed.
time_pretransferThe time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.
time_redirectThe time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections.7.12.3
time_starttransferThe time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
time_totalThe total time, in seconds, that the full operation lasted. The time will be displayed with millisecond resolution.


If we consider a simple request to a non-SSL page that involves no redirects, there would be 4 main phases:

HTTP Request Phases: DNS Lookup, TCP Connection, Content Generation, and Content Transfer
HTTP Request Phases



Let's run the following command to gather timings while loading CNN's home page:


curl -w '\ntime_namelookup=%{time_namelookup}\ntime_appconnect=%{time_appconnect}\ntime_connect=%{time_connect}\ntime_redirect=%{time_redirect}\ntime_pretransfer=%{time_pretransfer}\ntime_starttransfer=%{time_starttransfer}\ntime_total=%{time_total}\n\n' -o /dev/null -s 'http://www.cnn.com/'

Where:
  • -w shows which timing properties to write out
  • -o /dev/null redirects the output of the request to /dev/null
  • -s tells cURL not to show a progress bar
  • http://www.cnn.com/ is the URL we are requesting

This results in the following output:

time_namelookup=0.029
time_appconnect=0.000
time_connect=0.095
time_redirect=0.000
time_pretransfer=0.095
time_starttransfer=0.166
time_total=0.530

From this, we can compute the time taken in each of the 4 phases as follows:

DNS Lookup = DNS Lookup (29 ms) - Start (0 ms) = 29 milliseconds
TCP Connection = Pre Transfer (95 ms) - DNS Lookup (29 ms) =  66 millseconds
Content Generation = Start Transfer (166 ms) - Pre Transfer (95 ms) =  71 milliseconds
Content Transfer = Total (530 ms) - Start Transfer (166 ms) = 364 milliseconds

HTTP Request to http://www.cnn.com/
HTTP Request Timings for http://www.cnn.com/
So there you have it!  Next time a request seems slow, you can use cURL to help find out why. You can write cron-invoked scripts that invoke the cURL command, collect the timing information, and send the results to Splunk, Amazon Cloudwatch, or your favorite logging framework to get pretty graphs of data over time.




Tuesday, April 17, 2012

Avoiding Spam Emails with Google Apps

Employees at my startup have started complaining that their outgoing emails were landing in their recipient's Spam folders.  It seems to happen more often with recipients that use Outlook. After a little research I discovered that our Google Apps and GoDaddy DNS setup was missing some key configurations: an SPF Record and DKIM Record.

After making the changes described below and waiting a few hours for DNS changes to propagate, the Spam problem has been resolved!

Sender Policy Framework (SPF) Records

An SPF record is a type of DNS record that identifies which mail servers are permitted to send email on behalf of your domain.   If an email message comes from a server other than the Google Apps mail servers listed in the SPF record, the recipient's mail server can reject it as spam.  More info.

Google gives these instructions for creating an SPF record. I followed those instructions which resulted in me adding a new TXT DNS record in GoDaddy:

Host:
@

TXT Value:
v=spf1 include:_spf.google.com ~all

DomainKeys Identified Mail (DKIM) Standard

You can add a digital "signature" to the header of mail messages sent from your domain.  Recipients can check the domain signature to verify that the message really comes from your domain and that it has not been changed along the way.   Google gives these instructions for getting your emails signed and verified.  I followed those instructions which resulted in me adding another TXT DNS record:

Host:
google._domainkey

TXT Value:
v=DKIM1; k=rsa; p=CIGfMA0GCSqGSIb2DQEBAQAAA4GNADCBiQHKgQCj+tnMQMGMn8NfHnpDmgPa7ICUKdXdyzTlkBglZKRfEtF9msn1v/TmHZEvWFFp3KiaL2Igs7K57l+n/QJlk8Aj9C9nTGmXnzm9BL2zOQQL/zxJh9qh22bnO8uf7tM7sGHxr3z7yIkpXzA96G0inqmNb2XztXKseV4dp5jXbow4+QIDAQAB

I used the following services to test if the above configurations were working properly:

OpenSPF.org

The OpenSPF site contains detailed instructions.  You basically send an email to spf-test@openspf.org. Your email will bounce but the bounce message will contain diagnostic information.  Before I made any changes, the bounced email contained:

SPF Tests: Mail-From Result="none": Mail From="ken@mystartup.com" HELO name="mail-vb0-f45.google.com" HELO Result="none" Remote IP="209.85.212.45"

After adding an SPF record, I got:

SPF Tests: Mail-From Result="pass": Mail From="ken@mystartup.com" HELO name="mail-ob0-f173.google.com" HELO Result="none" Remote IP="209.85.214.173"

After setting up DKIM, a new section appeared:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
       d=mystartup.com; s=google;
       h=mime-version:from:date:message-id:subject:to:content-type;
       bh=3N7Rc6NGlzUWcAIDAPH02fhMn6EQcMyaqN1zoCZpAj4=;
       b=CB2n+ZleGjSlLH23RAvhMu56NIZULnSbc3efghykaJpeYMK5xOH2HDqzkoIk7kUWLV
        3xfcPK/7sABiIzhmi+RzzPaOEpUvE8kqFd9SocB3dUYmmCauB2RQIXh7qOUFFV/HTDxR
        23jAtjJUNX4VcdbNsmedbSwKpE30NYF49kjEY=

Port25.com

For this one, you send an email to check-auth@verifier.port25.com.  If you SPF record and DKIM is setup properly, you should get a reply containing this summary:

==========================================================
Summary of Results
==========================================================
SPF check:          pass
DomainKeys check:   neutral
DKIM check:         pass
Sender-ID check:    pass
SpamAssassin check: ham

IsNotSpam.com

This is similar to the service from Port25.com.  You send an email to check@isnotspam.com and get a similar reply.

Wednesday, December 14, 2011

AutoScaling Amazon SQS Queue Processors

One of my favorite things about running servers in Amazon EC2 is the ability to use AutoScaling to automatically add and remove nodes as web traffic increases and decreases.  Not only does this generally save money, but it also helps prepare a system to handle traffic spikes.

Recently I had some fun setting up AutoScaling for a different use case -- a cluster of machines processing messages on an Amazon SQS queue.  The idea was to add and remove nodes as the number of visible messages on the queue fluctuated.  Again, this keeps costs lower by only running as many nodes as are necessary to process the current workload and handles workload spikes.

Here are sample steps and commands that you can use for setting up AutoScaling for SQS queue processors:

Create an AMI for a node that will process SQS messages.  The node that boots from this AMI should automatically launch one or more queue processing processes.  A user-data script may be useful for this.

Create a Launch Configuration for the queue processor node:

as-create-launch-config MyQueueConfig --image-id [INSERT YOUR AMI ID HERE] --instance-type c1.medium --key [INSERT YOUR KEYNAME HERE] --user-data [INSERT YOUR USER DATA SCRIPT HERE]

Create an AutoScaling group for queue processors:

as-create-auto-scaling-group MyQueueGroup --launch-configuration MyQueueConfig --availability-zones us-east-1b --min-size 1 --max-size 10

Create Policies that add/remove 1 node to/from the cluster.  They will be invoked when the number of messages on the queue grows excessively high or decreases to an acceptable level:

as-put-scaling-policy MyScaleUpPolicy -g MyQueueGroup --adjustment=1 --type ChangeInCapacity

as-put-scaling-policy MyScaleDownPolicy -g MyQueueGroup --adjustment=-1 --type ChangeInCapacity

Create Alarms to scale up/down when the number of messages on the queue grows excessively high or decreases to an acceptable level.  Use the Policy ARN's returned by the previous as-put-scaling-policy commands.

mon-put-metric-alarm --alarm-name MyHighMessagesAlarm --alarm-description "Scale up when number of messages on queue is high" --metric-name ApproximateNumberOfMessagesVisible --namespace AWS/SQS --statistic Average --period 60 --threshold 1000 --comparison-operator GreaterThanThreshold --dimensions QueueName=MyQueue --evaluation-periods 10 --alarm-actions [INSERT MyScaleUpPolicy ARN HERE]

mon-put-metric-alarm --alarm-name MyLowMessagesAlarm --alarm-description "Scale down when number of messages on queue is low" --metric-name ApproximateNumberOfMessagesVisible --namespace AWS/SQS --statistic Average --period 60 --threshold 100 --comparison-operator LessThanThreshold --dimensions QueueName=MyQueue --evaluation-periods 10 --alarm-actions [INSERT MyScaleDownPolicy ARN HERE]

In this example the Alarms cause the cluster to scale up when the number of visible messages on the queue remains above 1000 for 10 consecutive minutes and scale down when the number of visible messages falls below 100 for 10 consecutive minutes.

The Policies above adjust the amount of nodes in the cluster by a specific amount, but it is also possible to specify the adjustment in terms of percentages.  Using --adjustment 10 --type PercentChangeInCapacity would adjust the amount of nodes by 10 percent.

It also would have been possible to base scaling activities on other AWS/SQS metrics such as:
  • NumberOfMessagesSent
  • NumberOfMessagesReceived
  • NumberOfMessagesDeleted
  • NumberOfEmptyReceives
  • ApproximateNumberOfMessagesVisible
  • ApproximateNumberOfMessagesNotVisible
  • ApproximateNumberOfMessagesDelayed
  • SentMessageSize
Here are a few online references relevant to AutoScaling SQS queue processors: