x
<<   <   Page 5 / 6   >   >>
dljvjbsl 12/4/2012 | 11:51:37 PM
re: Avici, Riverstone Pick Processors
Think of an NPU more like a DSP, i.e. it operates on data streams. You can then manage the flow of data, instructions, registers in a much more sensible way.


I am a beginner at this and am just asking this as a question to see if I understand the points that are being made.

Would this idea be similar, if only vaguely, to an event handler that feeds an interrupt queue. Interrupts are handled in sequence with hardware managing the threading of each interrupt. All communication is by asserting data which could trigger an event.

Again this is only asked as a means of obtaining clarification not as a bold assertion of validity.
jamesbond 12/4/2012 | 11:51:36 PM
re: Avici, Riverstone Pick Processors Any NPU that bears a resemblance to a "standard" CPU (caches, DMAs to/from bulk memory) is doomed be a low performance / high power (read IXP2800) device.
-----------------------------

Umm.. only the StrongARM core has cache. Memory access from Micro Engines go directly to
SDRAM or SRAM controller after some arbitration.

I am not sure what you mean by "DMA to/from bulk
memory". Where do you suggest packet data be
saved?

I don't think DSP is the right model for NPU. I
would agree that co-processors (lookups, security etc) can be thought of as DSPs.
My experience suggests that it takes more cycles
to receive and transmit a packet than processing it. So NPU is actually providing much more
than simply packet processing.

mrcasual 12/4/2012 | 11:51:36 PM
re: Avici, Riverstone Pick Processors Think of an NPU more like a DSP, i.e. it operates on data streams. You can then manage the flow of data, instructions, registers in a much more sensible way.


I am a beginner at this and am just asking this as a question to see if I understand the points that are being made.

Would this idea be similar, if only vaguely, to an event handler that feeds an interrupt queue. Interrupts are handled in sequence with hardware managing the threading of each interrupt. All communication is by asserting data which could trigger an event.


I was trying to draw an analogy with the DSP, i.e. that for NPUs the key is to try and keep data flowing in a pipeline thereby avoiding discrete, S/W controlled movement of data.

If I understand your analogy (dangerous ground here with an analogy feeding and analogy) the event handler feeds the interrupt queue with all the data it needs. The interrupt handler then processes each event in isolation of other events and when the interrupt handler exits for the task is complete. The H/W engine swaps threads and manages any coherency issues between events.

If that is what you meant, then I think you've got it.

The key in NPUs is that generally speaking packets need to exit the NPU in the same order that they entered. Given that all NPUs are in some way multi-threaded in order to get the performance they need maintaining order between parallel threads is a non-trivial matter.
lift 12/4/2012 | 11:51:28 PM
re: Avici, Riverstone Pick Processors
Any NPU that bears a resemblance to a "standard" CPU (caches, DMAs to/from bulk memory) is doomed be a low performance / high power (read IXP2800) device.
==================================

Doesn't IXP2800 have 16 1.4Ghz Strong ARM CPU cores ? That is far higher frequency than anybody else's NPU's. Higher power is understandable (Cv**2f), but not the performance.

There are s/w companies built around optimizing the performance of the IXP2800.



mrcasual 12/4/2012 | 11:51:25 PM
re: Avici, Riverstone Pick Processors Doesn't IXP2800 have 16 1.4Ghz Strong ARM CPU cores ? That is far higher frequency than anybody else's NPU's. Higher power is understandable (Cv**2f), but not the performance.

There are s/w companies built around optimizing the performance of the IXP2800.



That's exactly the point I (and others) were trying to make. Because "standard" RISC cores aren't well suited to packet processing, you have to run them really fast, which results in high power, and they are hell to program for high performance. Clock speed does not necessarily equal performance.

In a past life I was asked to do an analysis of using the fastest PPC available as an NPU. Needless to say, it didn't cut the mustard. Mainly because of the way you needed to move data into the chip and the way the CPU core accessed it once it was there. I did the same analysis using a Ti DSP whose core clock was at least half that of the PPC. It actually had better performance. Why? Because the way it accessed data was more efficient.

It's all about moving bits. RISC cores have been designed to run programs not process packets.

You can do "anything" with a processor, the question is how much and how fast.
jamesbond 12/4/2012 | 11:51:23 PM
re: Avici, Riverstone Pick Processors Doesn't IXP2800 have 16 1.4Ghz Strong ARM CPU cores ? That is far higher frequency than anybody else's NPU's. Higher power is understandable (Cv**2f), but not the performance.

------------------------------------------


No the Micro Engines are NOT StrongARM cores.
There is only ONE StrongARM core meant for
control plane or slow path type stuff.

Micro Engines are properitory risc cores
with properietory instruction set. Each ME
can run 4-16 threads. Context switching is
done by hardware. For e.g. when a thread
accesses memory, programmer can specify an
attribute that will cause automatic context
switch to another thread which is ready to
run while this thread waits for memory results.

Again these are not "standard risc cores".
The BIGGEST problem with multi-threading is how to
maintain packet order at Higher speeds.
multithreaded 12/4/2012 | 11:51:18 PM
re: Avici, Riverstone Pick Processors >The things you point out seem to be configurable items, rather than programmable items. What i mean is you can have programmable configuration options for the task switch trigger and buffer management strategy. I think the reality is there are not too many options for the things you point out.

I am confused here. Are you sugegsting that before a thread runs, we use INSTRUCTIONS to set the threshold on after so and so number of DMAs being issued, thread switching can occur?

Please note that that this property is decided on the thread base not on the program base. For those options that can be determined on a program base, configurable options make sense. Using instructions to set those thread options make programs run much slower.

>My experience was that i designed a full duplex 10 Gig network processor. At these speeds we really needed to provide a framework from which the SW could operate, otherwise the SW would not meet line rate. Too much flexibility would just lead to SW developers banging their head against a wall for no reason.

The same feeling here. I am glad that I am talking to a person that has the same experience as I had.

The issue here is how much is too much. I think that is our difference. For me I am favour to give user the control at the threading level.
multithreaded 12/4/2012 | 11:51:18 PM
re: Avici, Riverstone Pick Processors >Who ever said that anything has to be DMA'd on thread switches?

Assume an DMA is issued and there is nothing to do for the current thread, whar are you going to?


>Think of an NPU more like a DSP, i.e. it operates on data streams. You can then manage the flow of data, instructions, registers in a much more sensible way.

Even for DSP processors, a common trick is to use A/B buffers (two threads) to hide DMA latency. Therefore thread switching is widely used impliictly in DSP.


multithreaded 12/4/2012 | 11:51:17 PM
re: Avici, Riverstone Pick Processors >In a past life I was asked to do an analysis of using the fastest PPC available as an NPU. Needless to say, it didn't cut the mustard. Mainly because of the way you needed to move data into the chip and the way the CPU core accessed it once it was there. I did the same analysis using a Ti DSP whose core clock was at least half that of the PPC. It actually had better performance. Why? Because the way it accessed data was more efficient.

I am sure you are not working for IBM (me neither) since IBM will use PPC for their next generation of NPU :-)

I am really interested in this analysis. Have you published it somewhere?

I think you are right that moving bits is NO. 1 task for NPU. However I doubt using DSP can address the following issues:

-- Packet storing and retrieving
-- Table lookup
-- Classification
-- Traffic Management

I would like to know how a set of DSPs can address the above problems.
multithreaded 12/4/2012 | 11:51:17 PM
re: Avici, Riverstone Pick Processors >Doesn't IXP2800 have 16 1.4Ghz Strong ARM CPU cores ? That is far higher frequency than anybody else's NPU's. Higher power is understandable (Cv**2f), but not the performance.

I remember that IXP2800 has 16 Microengine + 1 StrongARM.

The multithreaded microengines are for data path and the StrongARm is for control path.
<<   <   Page 5 / 6   >   >>
HOME
Sign In
SEARCH
CLOSE
MORE
CLOSE