You mentioned a queue depth of 1, I am assuming you are running a benchmark utility and limiting it to 1? I believe those results (330 iops vs 1200) are to be expected/normal.
Sync mode writes will not be acknowledged to your host by the storage until it is first acknowledged by the remote storage, thus incurring the full round trip latency per io. A queue depth of 1 is like two people tossing a single baseball back and forth, much time is consumed by the ball traveling in the air, and with only 1 ball, both players are idle while the ball travels. The farther the distance, the longer the air time. Change the queue depth to 32 and now you can have up to 32 balls in the air at once.
Might I suggest a different test. You indicated 1200 iops on a local test. Rerun the benchmark on the replicated lun increasing the queue depth gradually until you can hit that 1200 number, or it plateaus. That should give you a rough idea of how many IOs you can pump out before the first one returns, as dictated by your latency.
I'm not brocade expert, but what I have briefly read tonight is their extended SAN links have increased buffers/credits to keep the data flowing, but this will be negated by a queue depth of 1.
Cisco MDS has a feature specifically to address this issue with latency and remote writes.
http://www.cisco.com/en/US/prod/collate ... 4fd2b.htmlQuote:
FC-WA minimizes storage latency and improves the number of application transactions per second over long distances. It increases the distance of replication or reduces effective latency to improve performance during synchronous replication.
The improved performance results from a coordinated effort performed by the Storage Services Module local to the initiator and the Storage Services Module local to the target. The initiator Storage Services Module, bearing the host-connected intelligent port (HI-port), allows the initiator to send the data to be written well before the write command has been processed by the remote target, and an SCSI Transfer Ready message has had the time to travel back to start the data transfer in the traditional way. The exchange of information between the HI-port and the disk-connected intelligent port (DI-port) allows the transfer to begin earlier than in a traditional transfer. The procedure makes use of a set of buffers for temporarily storing the data as near to the DI-port as possible. The information between the HI-port and DI-port is piggybacked on the SCSI command and the SCSI Transfer Ready command, so there are no additional FC-WA-specific frames traveling on the SAN. Data integrity is maintained by the fact that the original message that states the correct execution disk side of the write operation (SCSI Status Good) is transferred from the disk to the host.
I hope this helps.