x
<<   <   Page 6 / 6
multithreaded 12/4/2012 | 11:51:17 PM
re: Avici, Riverstone Pick Processors >I don't think DSP is the right model for NPU. I
would agree that co-processors (lookups, security etc) can be thought of as DSPs.
My experience suggests that it takes more cycles
to receive and transmit a packet than processing it. So NPU is actually providing much more
than simply packet processing.

Totally agreed. DSP is for computing data and NPU is for moving packets. I think we can not get around this problem, i.e., there is a little computation for NPU than DSP.
multithreaded 12/4/2012 | 11:51:16 PM
re: Avici, Riverstone Pick Processors >Again these are not "standard risc cores".
The BIGGEST problem with multi-threading is how to
maintain packet order at Higher speeds.

Why this is hard to NPU?

If each packet has a time stamp given at entry, let's say 32bit. Can't you use the stamp to make the order right.

I know one can can also force each thread to send a signal between its next neighbouring thread. However that approach has some performance hit.

mrcasual 12/4/2012 | 11:51:15 PM
re: Avici, Riverstone Pick Processors Why this is hard to NPU?

If each packet has a time stamp given at entry, let's say 32bit. Can't you use the stamp to make the order right.

I know one can can also force each thread to send a signal between its next neighbouring thread. However that approach has some performance hit.


The problem is simple to explain and seems simple to solve at first but in practice is hard to do WITHOUT encurring a performance hit.

Once you get into the many scenarios for keeping (and not keeping) things in order you very quickly realise how hard it is.
mrcasual 12/4/2012 | 11:51:15 PM
re: Avici, Riverstone Pick Processors I am sure you are not working for IBM (me neither) since IBM will use PPC for their next generation of NPU :-)

I am really interested in this analysis. Have you published it somewhere?

I think you are right that moving bits is NO. 1 task for NPU. However I doubt using DSP can address the following issues:

-- Packet storing and retrieving
-- Table lookup
-- Classification
-- Traffic Management

I would like to know how a set of DSPs can address the above problems.


Sorry, it was an internal evaluation for a big company that I used to work for. I agree, IBM is in trouble with their PPC core path. Just ask any IBMer's off the record and they'll agree.

I think my DSP analogy is being taken too literally. Certainly no DSP I am aware of could ever do what a good NPU does. Again, it's just not optimised for the task. I was using it as a comparison to a standard RISC type architecture to show that data moving architectures are generally better.

The problem/benefit of NPUs is that they are so flexible that they are open to very varying descriptions.

Using your list of 4 bullets I would say that most NPU's are best suited for table lookup and classification. I'll add packet modifications to that list as well. Those are what programmable NPU's should be doing.

If by traffic management you mean scheduling/shaping/etc. then you are better off building a specific chip to handle those functions since they are not well suited to S/W implementations. Intel would disagree and say everything is programmable but the people who actually use these devices know that you just CANNOT write code to do scheduling at 10G speeds for any reasonable number of queues with any kind of hierarchy.
fifteenfifty 12/4/2012 | 11:51:01 PM
re: Avici, Riverstone Pick Processors
This problem is waaay more complicated than that. Once you allow packets to go out of order in the first place, putting Humptey Dumptey pieces back together is just as hard now as it was for the King. If a packet in the middle takes longer to process than others, how/when do you decided to stop waiting for it? Any scheme will add latency (something people bitch about) and will probably not be foolproof anyhow.
Since people smarter than the average poster on this board (uhm, just my opingion, even if it is true), have been thinking about this for eons, other solutions are to maintain order within a flow, with the hope that one flow of traffic will not exceed the processing power of one CPU. The problem with this is that one entity must be responsible for identifying flows, and this is now your bottleneck in the packet processing system.
<<   <   Page 6 / 6
HOME
Sign In
SEARCH
CLOSE
MORE
CLOSE