I have seen berak’s work on using transfer learning with squeezenet. My ideal network would have 16 input neurons and 8 output neurons. But, the input gets converted to 227 x 227. Would there be a better layer / network for my purpose?

maybe you can try to explain, what your prospective NN is for ?
what are you trying to solve, what would be the inputs / outputs here ?

transfer learning only works, if you’re in the same domain as the pretrained model (e.g. image classification),
so the cats & dogs example using squeezenet (which was trained on imagenet, already containing those classes) was a quite “low hanging fruit”

Perhaps I am going about all of this all wrong. What I’m doing is trying to teach the network the multiplication of unit octonions. I have 2 x 8 input floats per multiplication, and 1x8 output floats.

needs more / deeper layers, e.g. [16, 256, 128, 8]

this is a regression problem, not classification
(you want the output as close as possible to the trad_mul)
so sigmoid is the wrong activation function, as it stretches values to large pos/neg numbers. try LINEAR instead (sadly you cannot set this per layer here…)

your train input is some grid, but the test input is random, hmmm
(i’d go for all random, so you have an “infinite” amount of train data)

do this in batches, using the update flag. train a few thousand iterations with ~1000 random octonian pairs, test with a few hundred, distance between test results and trad_mul should give you a loss value.
rinse & repeat, loss should go down

then, wikipedia had a nice link to this paper – Deep octonion networks are a thing, hehe !
(not that i understood any of it …)
but if you scroll down to fig.1, you can see, that each of the 8 octonion components gets its own 2d input map !
(again, all of it for a very different purpose, and i dont think, you can use any of it for your multiplication problem)

well forget all advice above ;(
i made some own attempt at it, and could not get anything useful out of the ANN_MLP

problem is: the last layer should have IDENTITY activation, but you can set this for ALL layers only.
with IDENTITY, it predicts nan’s (BACKPROP) or just does not learn anything (RPROP,ANNEAL)
with SIGMOID_SYM, it predicts wrong output values.

imho, opencv’s ANN_MLP is the wrong tool for this problem
(there is also no regression test here, so noone thought of it)

if you still want to try a nn for this, rather move on to pytorch (maybe even on colab) or such, where you have better control over the network

You, sir, are a scholar and a gentleman. I thank you so much for all of your efforts. I’d still be in the stone age if it weren’t for you.

Well, what I am wondering is whether or not the network can have a time complexity that is less than the time complexity of the exact traditional multiplication function. For instance, for the traditional multiplication, it consists of 8 x 8 multiplications and just about the same amount of additions/subtractions. Now I will try it on pathions, which have a time complexity of O(32 x 32).

Your results are spectacular! Thank you!!!

The imagery associated with this project is as follows. It is a triangle mesh, and it’s rendered using OpenGL 4, Phong shading, and a shadow map. The mesh is cut in half, to show “internal” detail.

Good news. I used a replacement function for the traditional multiplication, and for pathions (num_components = 32), the timing is 40 seconds for traditional multiplication versus 12 seconds for the trained multiplication.

For reference, for octonions (num_components = 8), the timing is 10 seconds versus 10 seconds.

I’d say that it’s a success.

P.S. I wonder… is there a way to use non-normalized number types?

P.P.S. Is there any way to accelerate this entire script on the GPU?

def traditional_mul_replacement(in_a, in_b):
out = np.zeros([num_components], np.float32)
answer = 0
for i in range(num_components):
for j in range(num_components):
answer += in_a[i] * in_b[j];
return out.T;

that’s entirely unlikely to me.
to model unknown complexity, you usually have to come up with a far larger beast, than the original, exact solution
if you still get any gains from nn’s it’s probably due to better vectorization / optimization there (again, opinion !)

hey, i just looked and found, you’re doing all of it on vertices ?
(still marching on cubes, hey…)

for python, there are various gpu accelerators (e.g. numba). pytorch can also run on gpu

For quaternions both methods are equal in speed. This leads me to suspect that the AI method is performed on the GPU. For pathions, the AI method is faster by a factor of like 6.

I do like Marching Cubes LOL. The only drawback to the Python version is that it doesn’t perform iteration while calculating the vertex positions. I mean, it just does linear interpolation, which leads to blocky results. I have incentive to port my Marching Cubes code (based on Paul Bourke’s work) to Python LOL

up/downloads between cpu <–> gpu are costly.
anytime you do a .to(device) or .cpu() it’s doing a full copy
(in gpu mode, but all of this is a no-op in cpu mode)
so, if you want to profit from a gpu, keep your data there as long as you can, and try to express the ops using torch functions

Hi berak, I uploaded more scripts. Mc_gpu.py and Mc_cpu.py both generate obj files. However, I notice that using scikit marching cubes generates meshes that are scaled and translated. I am disappointed. I will be writing my own marching cubes API for Python.

P.S. I got rid of slice a and b altogether, avoiding the call to torch.cat — thank you again for all of your help, man!

I’m not terribly familiar with the lambda expressions in C++ and Python, and I’m a slow learner. Can one skip a for loop using a lambda expression? Sorry if that question doesn’t make sense.